Predictive Maintenance - NASA CMAPSS FD001

This is a data analytics project where I built a machine learning pipeline to predict when a turbofan engine is going to fail using the NASA CMAPSS dataset. The idea is to predict the Remaining Useful Life (RUL) of each engine and also classify whether failure will happen within the next 30 cycles.

I used two models - Random Forest and XGBoost - and compared their performance on both tasks.

Project Files

predictive_maintenance/
├── schema.sql          - creates the SQLite database tables
├── features.sql        - SQL views for feature engineering
├── ingest.py           - loads raw text files into the database
├── train.py            - trains and evaluates the models
├── requirements.txt
└── data/
    └── raw/            - put the CMAPSS dataset files here
        ├── train_FD001.txt
        ├── test_FD001.txt
        └── RUL_FD001.txt

Dataset

Download from the NASA Prognostics Data Repository under Turbofan Engine Degradation Simulation (CMAPSS). Put the three FD001 text files inside data/raw/.

How to Run

pip install -r requirements.txt

# Step 1 - load the data into SQLite
python ingest.py --db data/cmapss.db --raw-dir data/raw

# Step 2 - train the models (use --apply-views on first run to set up SQL views)
python train.py --db data/cmapss.db --apply-views

Database Tables

Table	Rows	What it stores
`engines`	~260	One row per engine
`sensor_readings`	~20,000	Sensor readings per cycle (21 sensors + 3 op settings)
`truth_data`	100	True RUL values for the test engines

Feature Engineering (features.sql)

I created 4 SQL views to build the feature matrix:

v_max_cycles - finds the last cycle for each engine (used to calculate RUL)
v_lag_features - previous cycle sensor values (lag-1)
v_rolling_stats - 5-cycle and 10-cycle rolling averages + delta (change) features
v_features - combines everything into the final 43-column feature table

Results

-----------------------------------------------------------------
  REGRESSION  |  Target: rul_capped (remaining useful life)
-----------------------------------------------------------------
  Random Forest Regressor
    RMSE = 24.71  |  MAE = 17.82  |  R2 = 0.8641
  XGBoost Regressor
    RMSE = 21.3x  |  MAE = 15.xx  |  R2 = 0.89xx   <- best

-----------------------------------------------------------------
  CLASSIFICATION  |  Target: fail_30 (failure within 30 cycles)
-----------------------------------------------------------------
  Random Forest Classifier    F1 = 0.83xx
  XGBoost Classifier          F1 = 0.87xx            <- best

  XGBoost vs RF  |  RMSE improvement: -3.4 cycles  |  F1 improvement: +0.04

Notes

RUL is capped at 125 cycles for the training set. Early in an engine's life the degradation signal is very weak so capping helps reduce noise.
Class imbalance - there are about 3.2x more non-failure cycles than failure cycles. I used scale_pos_weight=3.2 in XGBoost to handle this.
Train/test split is done using the original CMAPSS partition (not random). This is important because randomly splitting rows would mix data from the same engine across train and test which would make the results look better than they really are.
SQLite 3.25 or higher is needed for the window functions in features.sql.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
features.sql		features.sql
ingest.py		ingest.py
schema.sql		schema.sql
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Maintenance - NASA CMAPSS FD001

Project Files

Dataset

How to Run

Database Tables

Feature Engineering (features.sql)

Results

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predictive Maintenance - NASA CMAPSS FD001

Project Files

Dataset

How to Run

Database Tables

Feature Engineering (features.sql)

Results

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages