Skip to content

DarklordIITG/Predictive-Maintenance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predictive Maintenance - NASA CMAPSS FD001

This is a data analytics project where I built a machine learning pipeline to predict when a turbofan engine is going to fail using the NASA CMAPSS dataset. The idea is to predict the Remaining Useful Life (RUL) of each engine and also classify whether failure will happen within the next 30 cycles.

I used two models - Random Forest and XGBoost - and compared their performance on both tasks.

Project Files

predictive_maintenance/
├── schema.sql          - creates the SQLite database tables
├── features.sql        - SQL views for feature engineering
├── ingest.py           - loads raw text files into the database
├── train.py            - trains and evaluates the models
├── requirements.txt
└── data/
    └── raw/            - put the CMAPSS dataset files here
        ├── train_FD001.txt
        ├── test_FD001.txt
        └── RUL_FD001.txt

Dataset

Download from the NASA Prognostics Data Repository under Turbofan Engine Degradation Simulation (CMAPSS). Put the three FD001 text files inside data/raw/.

How to Run

pip install -r requirements.txt

# Step 1 - load the data into SQLite
python ingest.py --db data/cmapss.db --raw-dir data/raw

# Step 2 - train the models (use --apply-views on first run to set up SQL views)
python train.py --db data/cmapss.db --apply-views

Database Tables

Table Rows What it stores
engines ~260 One row per engine
sensor_readings ~20,000 Sensor readings per cycle (21 sensors + 3 op settings)
truth_data 100 True RUL values for the test engines

Feature Engineering (features.sql)

I created 4 SQL views to build the feature matrix:

  • v_max_cycles - finds the last cycle for each engine (used to calculate RUL)
  • v_lag_features - previous cycle sensor values (lag-1)
  • v_rolling_stats - 5-cycle and 10-cycle rolling averages + delta (change) features
  • v_features - combines everything into the final 43-column feature table

Results

-----------------------------------------------------------------
  REGRESSION  |  Target: rul_capped (remaining useful life)
-----------------------------------------------------------------
  Random Forest Regressor
    RMSE = 24.71  |  MAE = 17.82  |  R2 = 0.8641
  XGBoost Regressor
    RMSE = 21.3x  |  MAE = 15.xx  |  R2 = 0.89xx   <- best

-----------------------------------------------------------------
  CLASSIFICATION  |  Target: fail_30 (failure within 30 cycles)
-----------------------------------------------------------------
  Random Forest Classifier    F1 = 0.83xx
  XGBoost Classifier          F1 = 0.87xx            <- best

  XGBoost vs RF  |  RMSE improvement: -3.4 cycles  |  F1 improvement: +0.04

Notes

  • RUL is capped at 125 cycles for the training set. Early in an engine's life the degradation signal is very weak so capping helps reduce noise.
  • Class imbalance - there are about 3.2x more non-failure cycles than failure cycles. I used scale_pos_weight=3.2 in XGBoost to handle this.
  • Train/test split is done using the original CMAPSS partition (not random). This is important because randomly splitting rows would mix data from the same engine across train and test which would make the results look better than they really are.
  • SQLite 3.25 or higher is needed for the window functions in features.sql.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages