Skip to content

End-to-end MLOps project for predictive maintenance using engine sensor data. Includes data versioning on Hugging Face, MLflow experiment tracking, CI/CD with GitHub Actions, and Dockerized Streamlit deployment for real-time engine failure classification.

Notifications You must be signed in to change notification settings

ananttripathi/engine-predictive-maintenance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Predictive Maintenance – Engine Failure Classification (MLOps Project)

πŸŽ‰ Status: Complete - All pipeline steps successfully deployed!

This project builds an end‑to‑end predictive maintenance system for small and large engines using sensor data (RPM, pressures, temperatures) to classify whether an engine is healthy or requires maintenance.

The work is organized to satisfy the provided interim and final report rubrics, including:

  • βœ… Data registration on Hugging Face
  • βœ… Exploratory Data Analysis (EDA)
  • βœ… Data preparation and dataset versioning
  • βœ… Model building with experimentation tracking
  • βœ… Model deployment with Docker + Streamlit on Hugging Face Spaces
  • βœ… Automated GitHub Actions workflow

πŸ”— Live Resources

🌐 Live Application

πŸš€ Try the Live App

Interactive Streamlit application for real-time engine condition predictions with sensor visualizations.

πŸ€– Trained Model

πŸ“¦ View Model on Hugging Face

Trained Random Forest model with hyperparameter tuning, versioned on Hugging Face Model Hub.

πŸ“Š Dataset Repository

πŸ“ Access Dataset Repository

Version-controlled datasets including raw data and train/test splits.

πŸ’» GitHub Repository

πŸ”§ View Source Code

Complete source code, documentation, and CI/CD pipeline.

βš™οΈ GitHub Actions

πŸ”„ View Workflow Runs

Automated CI/CD pipeline with 4 sequential jobs for data registration, preparation, training, and deployment.


πŸ“ Repository Structure

mlops/
β”œβ”€β”€ data/                          # Raw and processed data
β”‚   β”œβ”€β”€ engine_data.csv           # Original engine sensor dataset
β”‚   └── processed/                # Train/test splits
β”‚       β”œβ”€β”€ train.csv
β”‚       └── test.csv
β”œβ”€β”€ notebooks/                     # EDA and experimentation notebooks
β”œβ”€β”€ src/                          # Main source code
β”‚   β”œβ”€β”€ config.py                 # Central configuration
β”‚   β”œβ”€β”€ data_register.py          # Register raw data to HF Dataset
β”‚   β”œβ”€β”€ data_prep.py              # Data cleaning and splitting
β”‚   β”œβ”€β”€ hf_data_utils.py          # HF Dataset Hub utilities
β”‚   β”œβ”€β”€ train.py                  # Model training with MLflow
β”‚   β”œβ”€β”€ hf_model_utils.py         # HF Model Hub utilities
β”‚   β”œβ”€β”€ inference.py              # Prediction utilities
β”‚   β”œβ”€β”€ app.py                    # Streamlit web application
β”‚   └── deploy_to_hf.py          # Deploy to HF Space
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── pipeline.yml           # CI/CD pipeline
β”œβ”€β”€ Dockerfile                    # Container definition for deployment
β”œβ”€β”€ requirements.txt              # Python dependencies
└── README.md                     # This file

Key Files

  • src/config.py – Central configuration (paths, Hugging Face repo names, MLflow config)
  • src/data_register.py – Registers raw dataset to Hugging Face Dataset Hub
  • src/data_prep.py – Loads data, cleans it, and creates train/test splits
  • src/train.py – Model training, hyperparameter tuning, MLflow logging
  • src/app.py – Streamlit web application for interactive predictions
  • src/deploy_to_hf.py – Deploys app to Hugging Face Space
  • .github/workflows/pipeline.yml – Automated CI/CD pipeline

πŸ”„ Pipeline Overview

The MLOps pipeline consists of 6 stages, automated via GitHub Actions:

1. Data Registration βœ…

2. Exploratory Data Analysis (EDA)

  • Script: src/eda.py (or use notebooks)
  • Action: Performs data overview, univariate/bivariate/multivariate analysis
  • Output: Visualizations and insights about engine health patterns

3. Data Preparation βœ…

  • Script: src/data_prep.py
  • Action: Cleans data, creates train/test splits, uploads to dataset repo
  • Output: data/train.csv and data/test.csv in dataset repo

4. Model Building + Experiment Tracking βœ…

5. Deployment & Hosting βœ…

  • App: src/app.py - Streamlit web application
  • Container: Dockerfile - Container definition
  • Script: src/deploy_to_hf.py - Deploys to Hugging Face Space
  • Output: Live app at ananttripathiak/engine-maintenance-space

6. GitHub Actions Workflow βœ…

  • File: .github/workflows/pipeline.yml
  • Jobs:
    1. register-dataset β†’ runs src/data_register.py
    2. data-prep β†’ runs src/data_prep.py
    3. model-training β†’ runs src/train.py
    4. deploy-hosting β†’ runs src/deploy_to_hf.py
  • View Runs: GitHub Actions

πŸš€ Quick Start

Local Development

  1. Clone the repository:

    git clone https://github.com/ananttripathi/engine-predictive-maintenance.git
    cd engine-predictive-maintenance
  2. Set up virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements.txt
  3. Run pipeline steps:

    # Register data
    python src/data_register.py
    
    # Prepare data
    python src/data_prep.py
    
    # Train model
    python src/train.py
    
    # Run app locally
    streamlit run src/app.py

Automated Pipeline

The pipeline runs automatically on every push to main branch via GitHub Actions. View workflow runs at: πŸ”„ GitHub Actions


πŸ“Š Technologies Used

  • Python 3.10 - Programming language
  • scikit-learn - Machine learning (Random Forest)
  • MLflow - Experiment tracking and model registry
  • Hugging Face Hub - Dataset, model, and space hosting
  • Streamlit - Web application framework
  • Docker - Containerization
  • GitHub Actions - CI/CD automation
  • Plotly - Interactive visualizations

βš™οΈ Configuration

Current Configuration

This project is configured with:

For New Users: Setup Instructions

1. Hugging Face Configuration

Update src/config.py with your Hugging Face username:

HF_DATASET_REPO = os.getenv("HF_DATASET_REPO", "ananttripathiak/engine-maintenance-dataset")
HF_MODEL_REPO = os.getenv("HF_MODEL_REPO", "ananttripathiak/engine-maintenance-model")
HF_SPACE_REPO = os.getenv("HF_SPACE_REPO", "ananttripathiak/engine-maintenance-space")

Or set environment variables:

export HF_TOKEN="hf_your_token_here"
export HF_DATASET_REPO="ananttripathiak/engine-maintenance-dataset"
export HF_MODEL_REPO="ananttripathiak/engine-maintenance-model"
export HF_SPACE_REPO="ananttripathiak/engine-maintenance-space"

2. GitHub Repository Configuration

A. Create GitHub Repository:

  1. Create a new repository on GitHub (e.g., engine-predictive-maintenance)
  2. Push this mlops folder to it:
    git init
    git add .
    git commit -m "Initial commit: Predictive maintenance MLOps pipeline"
    git remote add origin https://github.com/your-username/engine-predictive-maintenance.git
    git push -u origin main

B. Add GitHub Secrets: Go to your GitHub repo β†’ Settings β†’ Secrets and variables β†’ Actions β†’ New repository secret

Add these 4 secrets:

  • HF_TOKEN – Your Hugging Face access token (from https://huggingface.co/settings/tokens)
  • HF_DATASET_REPO – e.g., ananttripathiak/engine-maintenance-dataset
  • HF_MODEL_REPO – e.g., ananttripathiak/engine-maintenance-model
  • HF_SPACE_REPO – e.g., ananttripathiak/engine-maintenance-space

πŸ“– For detailed setup instructions, see CONFIGURATION_GUIDE.md

About

End-to-end MLOps project for predictive maintenance using engine sensor data. Includes data versioning on Hugging Face, MLflow experiment tracking, CI/CD with GitHub Actions, and Dockerized Streamlit deployment for real-time engine failure classification.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •