ML CI/CD Showcase

A professional showcase project demonstrating ML/MLOps skills with a unified CI/CD pipeline for multiple model types.

🆕 Monitoring with Prometheus & Grafana

Quick Start

# Start monitoring stack
make monitoring-up

# Access services
open http://localhost:8000/docs    # API documentation
open http://localhost:9090         # Prometheus
open http://localhost:3000         # Grafana (admin/admin)

# Generate test traffic
python scripts/generate_traffic.py

Architecture

ML API (FastAPI)
    ↓ /metrics
Prometheus (scraping)
    ↓ queries
Grafana (dashboards)

See Monitoring Guide for details.

🎯 Project Overview

This project demonstrates:

✅ Unified ML Framework: Abstract base class for consistent model handling
✅ Two Model Types:
- CNN Classifier (MNIST, ~50K parameters)
- RAG System (ChromaDB + Claude)
✅ Complete CI/CD Pipeline: Automated testing, validation, and deployment
✅ Code Quality: Black, Flake8, MyPy, pytest with >80% coverage
✅ Containerization: Multi-stage Docker builds
✅ Performance Monitoring: Automated benchmarks and thresholds

Perfect for: Showcasing MLOps skills in job applications for ML Engineer / MLOps roles.

🏗️ Architecture

┌─────────────────────────────────────────────────┐
│           BaseMLModel (Abstract)                │
│  - train()  - predict()  - evaluate()           │
│  - save()   - load()     - get_metrics()        │
└──────────────┬──────────────────────┬───────────┘
               │                      │
       ┌───────▼────────┐    ┌───────▼──────────┐
       │ CNNClassifier  │    │   RAGSystem      │
       │                │    │                  │
       │ • TinyConvNet  │    │ • SentenceT5     │
       │ • MNIST        │    │ • ChromaDB       │
       │ • PyTorch      │    │ • Claude API     │
       └────────────────┘    └──────────────────┘

Key Design Principle

Unified Interface → Both models implement the same abstract base class, allowing the CI/CD pipeline to handle them identically.

🚀 Quick Start

Prerequisites

Python 3.10+
Anthropic API key (for RAG system)

Installation

# Clone repository
git clone https://github.com/ChengYuChuan/ml-cicd-showcase.git
cd ml-cicd-showcase

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# For development
pip install -r requirements-dev.txt

# Set up API key
echo "ANTHROPIC_API_KEY=your_key_here" > .env

Quick Test

# Run all tests
pytest tests/ -v

# Run specific model tests
pytest tests/test_cnn.py -v
pytest tests/test_rag.py -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

💻 Usage Examples

CNN Classifier

from src.models.cnn_classifier import CNNClassifier
from src.config import CNNConfig

# Configure and train
config = CNNConfig(num_epochs=3, batch_size=64)
model = CNNClassifier(config)

# Train on MNIST
metrics = model.train()
print(f"Accuracy: {metrics['accuracy']:.4f}")

# Save model
model.save_model("models/cnn_mnist.pth")

# Make predictions
import torch
sample = torch.randn(1, 1, 28, 28)
prediction = model.predict(sample)

RAG System

from src.models.rag_system import RAGSystem
from src.config import RAGConfig

# Initialize RAG
config = RAGConfig()
rag = RAGSystem(config)

# Ingest documents
documents = [
    "Python is a programming language.",
    "Machine learning uses data to improve.",
]
rag.ingest_documents(documents)

# Query the system
result = rag.predict("What is Python?")
print(f"Answer: {result['answer']}")
print(f"Context: {result['context']}")

🧪 Testing

Test Structure

tests/
├── conftest.py          # Shared fixtures
├── test_cnn.py          # CNN-specific tests
├── test_rag.py          # RAG-specific tests
└── test_integration.py  # Cross-model tests

Running Tests

# All tests
pytest tests/ -v

# Specific test file
pytest tests/test_cnn.py -v

# With coverage
pytest tests/ --cov=src --cov-report=term-missing

# Skip slow tests
pytest tests/ -m "not slow"

# Run only fast tests in CI
pytest tests/ -m "not slow" --maxfail=1

🔄 CI/CD Pipeline

Pipeline Stages

Code Quality (< 2 min)
- Black formatting
- Flake8 linting
- MyPy type checking
- isort import sorting
CNN Tests (< 5 min)
- Unit tests
- Training validation
- Performance thresholds (>85% accuracy)
RAG Tests (< 3 min)
- Unit tests
- Retrieval quality validation
- API integration tests
Integration Tests (< 5 min)
- Cross-model compatibility
- End-to-end workflows
- Coverage report generation
Docker Build (< 3 min)
- Multi-stage build
- Image testing
- Artifact upload
Benchmarks (< 5 min)
- Performance metrics
- Latency measurements
- Model size validation

Performance Thresholds

Model	Metric	Threshold	Current
CNN	Accuracy	>85%	~95%
CNN	Latency	<100ms	~15ms
CNN	Size	<1MB	~0.2MB
RAG	Precision	>30%	~60%
RAG	Latency	<5s	~2s

🐳 Docker Usage

Build and Run

# Build all stages
docker-compose build

# Development environment
docker-compose run ml-dev

# Run tests in Docker
docker-compose run ml-test

# Production deployment
docker-compose up ml-prod

Manual Docker Commands

# Build image
docker build -t ml-cicd-showcase .

# Run tests
docker run --rm ml-cicd-showcase pytest tests/ -v

# Interactive shell
docker run -it --rm ml-cicd-showcase /bin/bash

📊 Model Performance

CNN Classifier

Architecture: TinyConvNet (~50K parameters)
Dataset: MNIST (60K train, 10K test)
Training Time: ~3 min (3 epochs, CPU)
Test Accuracy: ~95%
Inference: ~15ms per image
Model Size: 0.2MB

RAG System

Embedding: all-MiniLM-L6-v2 (80MB)
Vector DB: ChromaDB (local)
LLM: Claude Sonnet 4
Retrieval: ~30ms per query
Generation: ~2s per answer
Precision: ~60% on test queries

📁 Project Structure

ml-cicd-showcase/
├── .github/
│   └── workflows/
│       └── ci.yml              # CI/CD pipeline
├── src/
│   ├── models/
│   │   ├── base_model.py       # Abstract base class
│   │   ├── cnn_classifier.py   # CNN implementation
│   │   └── rag_system.py       # RAG implementation
│   ├── utils/
│   │   └── metrics.py          # Utility functions
│   └── config.py               # Configuration management
├── tests/
│   ├── conftest.py             # pytest fixtures
│   ├── test_cnn.py             # CNN tests
│   ├── test_rag.py             # RAG tests
│   └── test_integration.py     # Integration tests
├── data/
│   ├── sample_images/          # Sample data
│   └── knowledge_base/         # RAG documents
├── models/                     # Saved models
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml
├── pyproject.toml              # Modern Python config
├── requirements.txt            # Core dependencies
├── requirements-dev.txt        # Dev dependencies
└── README.md

🎓 Key Learnings & Design Decisions

1. Unified Interface Pattern

Why: Different model types (CNN vs RAG) can use the same CI/CD pipeline How: Abstract BaseMLModel class with standardized methods Benefit: Easy to add new models without changing infrastructure

2. Lightweight Models for CI

Why: GitHub Actions has limited compute and time How: TinyConvNet (<50K params) instead of ResNet18 (11M params) Benefit: Tests complete in <5 minutes instead of >30 minutes

3. Modular Configuration

Why: Different environments need different settings How: Dataclass-based configs (CNNConfig, RAGConfig) Benefit: Type-safe, easy to test, environment-specific

4. Comprehensive Testing

Why: Catch bugs before deployment How: Unit, integration, and performance tests with >80% coverage Benefit: Confidence in code quality and model performance

🛠️ Development

Setup Pre-commit Hooks

pre-commit install
pre-commit run --all-files

Code Formatting

# Format code
black src/ tests/
isort src/ tests/

# Check formatting
black --check src/ tests/
flake8 src/ tests/
mypy src/

Adding a New Model

Create new model class inheriting from BaseMLModel
Implement required methods: train(), predict(), evaluate()
Add configuration dataclass in config.py
Create test file in tests/
Pipeline automatically handles it! 🎉

📈 Future Enhancements

Add DVC for data/model versioning
Implement model registry (MLflow)
Add more model types (transformer, GNN)
Set up automated deployment
Add performance monitoring dashboard
Implement A/B testing framework

🤝 Contributing

This is a showcase project for job applications. If you'd like to suggest improvements:

Fork the repository
Create a feature branch
Make your changes
Ensure tests pass
Submit a pull request

📄 License

MIT License - see LICENSE file for details.

👤 Author

Your Name

💡 For Recruiters / Hiring Managers

This project demonstrates:

✅ MLOps Best Practices: CI/CD, testing, containerization, monitoring ✅ Software Engineering: Clean code, design patterns, type hints, documentation ✅ ML Knowledge: Model selection, training, evaluation, optimization ✅ Production Ready: Error handling, logging, performance monitoring ✅ Modern Tools: GitHub Actions, Docker, pytest, type hints, pre-commit hooks

Time Investment: ~1 week (as planned) ✓ Lines of Code: ~2000+ (production quality) Test Coverage: >80% Documentation: Comprehensive

Tech Stack:

Languages: Python 3.10
ML Frameworks: PyTorch, Sentence Transformers
Vector DB: ChromaDB
LLM: Claude (Anthropic)
Testing: pytest, pytest-cov
CI/CD: GitHub Actions
Containerization: Docker, docker-compose
Code Quality: Black, Flake8, MyPy, isort, pre-commit

Questions? Feel free to reach out or open an issue!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
data		data
docs		docs
models		models
monitoring		monitoring
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
check_monitoring.sh		check_monitoring.sh
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
pre-commit-config.yaml		pre-commit-config.yaml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-monitoring.txt		requirements-monitoring.txt
requirements.txt		requirements.txt
serve.py		serve.py
setup.bat		setup.bat
setup.sh		setup.sh
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

ML CI/CD Showcase

🆕 Monitoring with Prometheus & Grafana

Quick Start

Architecture

🎯 Project Overview

🏗️ Architecture

Key Design Principle

🚀 Quick Start

Prerequisites

Installation

Quick Test

💻 Usage Examples

CNN Classifier

RAG System

🧪 Testing

Test Structure

Running Tests

🔄 CI/CD Pipeline

Pipeline Stages

Performance Thresholds

🐳 Docker Usage

Build and Run

Manual Docker Commands

📊 Model Performance

CNN Classifier

RAG System

📁 Project Structure

🎓 Key Learnings & Design Decisions

1. Unified Interface Pattern

2. Lightweight Models for CI

3. Modular Configuration

4. Comprehensive Testing

🛠️ Development

Setup Pre-commit Hooks

Code Formatting

Adding a New Model

📈 Future Enhancements

🤝 Contributing

📄 License

👤 Author

💡 For Recruiters / Hiring Managers

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages