A professional showcase project demonstrating ML/MLOps skills with a unified CI/CD pipeline for multiple model types.
# Start monitoring stack
make monitoring-up
# Access services
open http://localhost:8000/docs # API documentation
open http://localhost:9090 # Prometheus
open http://localhost:3000 # Grafana (admin/admin)
# Generate test traffic
python scripts/generate_traffic.pyML API (FastAPI)
↓ /metrics
Prometheus (scraping)
↓ queries
Grafana (dashboards)
See Monitoring Guide for details.
This project demonstrates:
- ✅ Unified ML Framework: Abstract base class for consistent model handling
- ✅ Two Model Types:
- CNN Classifier (MNIST, ~50K parameters)
- RAG System (ChromaDB + Claude)
- ✅ Complete CI/CD Pipeline: Automated testing, validation, and deployment
- ✅ Code Quality: Black, Flake8, MyPy, pytest with >80% coverage
- ✅ Containerization: Multi-stage Docker builds
- ✅ Performance Monitoring: Automated benchmarks and thresholds
Perfect for: Showcasing MLOps skills in job applications for ML Engineer / MLOps roles.
┌─────────────────────────────────────────────────┐
│ BaseMLModel (Abstract) │
│ - train() - predict() - evaluate() │
│ - save() - load() - get_metrics() │
└──────────────┬──────────────────────┬───────────┘
│ │
┌───────▼────────┐ ┌───────▼──────────┐
│ CNNClassifier │ │ RAGSystem │
│ │ │ │
│ • TinyConvNet │ │ • SentenceT5 │
│ • MNIST │ │ • ChromaDB │
│ • PyTorch │ │ • Claude API │
└────────────────┘ └──────────────────┘
Unified Interface → Both models implement the same abstract base class, allowing the CI/CD pipeline to handle them identically.
- Python 3.10+
- Anthropic API key (for RAG system)
# Clone repository
git clone https://github.com/ChengYuChuan/ml-cicd-showcase.git
cd ml-cicd-showcase
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# For development
pip install -r requirements-dev.txt
# Set up API key
echo "ANTHROPIC_API_KEY=your_key_here" > .env# Run all tests
pytest tests/ -v
# Run specific model tests
pytest tests/test_cnn.py -v
pytest tests/test_rag.py -v
# Run with coverage
pytest tests/ --cov=src --cov-report=htmlfrom src.models.cnn_classifier import CNNClassifier
from src.config import CNNConfig
# Configure and train
config = CNNConfig(num_epochs=3, batch_size=64)
model = CNNClassifier(config)
# Train on MNIST
metrics = model.train()
print(f"Accuracy: {metrics['accuracy']:.4f}")
# Save model
model.save_model("models/cnn_mnist.pth")
# Make predictions
import torch
sample = torch.randn(1, 1, 28, 28)
prediction = model.predict(sample)from src.models.rag_system import RAGSystem
from src.config import RAGConfig
# Initialize RAG
config = RAGConfig()
rag = RAGSystem(config)
# Ingest documents
documents = [
"Python is a programming language.",
"Machine learning uses data to improve.",
]
rag.ingest_documents(documents)
# Query the system
result = rag.predict("What is Python?")
print(f"Answer: {result['answer']}")
print(f"Context: {result['context']}")tests/
├── conftest.py # Shared fixtures
├── test_cnn.py # CNN-specific tests
├── test_rag.py # RAG-specific tests
└── test_integration.py # Cross-model tests
# All tests
pytest tests/ -v
# Specific test file
pytest tests/test_cnn.py -v
# With coverage
pytest tests/ --cov=src --cov-report=term-missing
# Skip slow tests
pytest tests/ -m "not slow"
# Run only fast tests in CI
pytest tests/ -m "not slow" --maxfail=1-
Code Quality (< 2 min)
- Black formatting
- Flake8 linting
- MyPy type checking
- isort import sorting
-
CNN Tests (< 5 min)
- Unit tests
- Training validation
- Performance thresholds (>85% accuracy)
-
RAG Tests (< 3 min)
- Unit tests
- Retrieval quality validation
- API integration tests
-
Integration Tests (< 5 min)
- Cross-model compatibility
- End-to-end workflows
- Coverage report generation
-
Docker Build (< 3 min)
- Multi-stage build
- Image testing
- Artifact upload
-
Benchmarks (< 5 min)
- Performance metrics
- Latency measurements
- Model size validation
| Model | Metric | Threshold | Current |
|---|---|---|---|
| CNN | Accuracy | >85% | ~95% |
| CNN | Latency | <100ms | ~15ms |
| CNN | Size | <1MB | ~0.2MB |
| RAG | Precision | >30% | ~60% |
| RAG | Latency | <5s | ~2s |
# Build all stages
docker-compose build
# Development environment
docker-compose run ml-dev
# Run tests in Docker
docker-compose run ml-test
# Production deployment
docker-compose up ml-prod# Build image
docker build -t ml-cicd-showcase .
# Run tests
docker run --rm ml-cicd-showcase pytest tests/ -v
# Interactive shell
docker run -it --rm ml-cicd-showcase /bin/bash- Architecture: TinyConvNet (~50K parameters)
- Dataset: MNIST (60K train, 10K test)
- Training Time: ~3 min (3 epochs, CPU)
- Test Accuracy: ~95%
- Inference: ~15ms per image
- Model Size: 0.2MB
- Embedding: all-MiniLM-L6-v2 (80MB)
- Vector DB: ChromaDB (local)
- LLM: Claude Sonnet 4
- Retrieval: ~30ms per query
- Generation: ~2s per answer
- Precision: ~60% on test queries
ml-cicd-showcase/
├── .github/
│ └── workflows/
│ └── ci.yml # CI/CD pipeline
├── src/
│ ├── models/
│ │ ├── base_model.py # Abstract base class
│ │ ├── cnn_classifier.py # CNN implementation
│ │ └── rag_system.py # RAG implementation
│ ├── utils/
│ │ └── metrics.py # Utility functions
│ └── config.py # Configuration management
├── tests/
│ ├── conftest.py # pytest fixtures
│ ├── test_cnn.py # CNN tests
│ ├── test_rag.py # RAG tests
│ └── test_integration.py # Integration tests
├── data/
│ ├── sample_images/ # Sample data
│ └── knowledge_base/ # RAG documents
├── models/ # Saved models
├── docker/
│ ├── Dockerfile
│ └── docker-compose.yml
├── pyproject.toml # Modern Python config
├── requirements.txt # Core dependencies
├── requirements-dev.txt # Dev dependencies
└── README.md
Why: Different model types (CNN vs RAG) can use the same CI/CD pipeline
How: Abstract BaseMLModel class with standardized methods
Benefit: Easy to add new models without changing infrastructure
Why: GitHub Actions has limited compute and time How: TinyConvNet (<50K params) instead of ResNet18 (11M params) Benefit: Tests complete in <5 minutes instead of >30 minutes
Why: Different environments need different settings How: Dataclass-based configs (CNNConfig, RAGConfig) Benefit: Type-safe, easy to test, environment-specific
Why: Catch bugs before deployment How: Unit, integration, and performance tests with >80% coverage Benefit: Confidence in code quality and model performance
pre-commit install
pre-commit run --all-files# Format code
black src/ tests/
isort src/ tests/
# Check formatting
black --check src/ tests/
flake8 src/ tests/
mypy src/- Create new model class inheriting from
BaseMLModel - Implement required methods:
train(),predict(),evaluate() - Add configuration dataclass in
config.py - Create test file in
tests/ - Pipeline automatically handles it! 🎉
- Add DVC for data/model versioning
- Implement model registry (MLflow)
- Add more model types (transformer, GNN)
- Set up automated deployment
- Add performance monitoring dashboard
- Implement A/B testing framework
This is a showcase project for job applications. If you'd like to suggest improvements:
- Fork the repository
- Create a feature branch
- Make your changes
- Ensure tests pass
- Submit a pull request
MIT License - see LICENSE file for details.
Your Name
- GitHub: @ChengYuChuan
- LinkedIn: ChengYuChuan
- Email: ChengYuChuan82@gmail.com
This project demonstrates:
✅ MLOps Best Practices: CI/CD, testing, containerization, monitoring ✅ Software Engineering: Clean code, design patterns, type hints, documentation ✅ ML Knowledge: Model selection, training, evaluation, optimization ✅ Production Ready: Error handling, logging, performance monitoring ✅ Modern Tools: GitHub Actions, Docker, pytest, type hints, pre-commit hooks
Time Investment: ~1 week (as planned) ✓ Lines of Code: ~2000+ (production quality) Test Coverage: >80% Documentation: Comprehensive
Tech Stack:
- Languages: Python 3.10
- ML Frameworks: PyTorch, Sentence Transformers
- Vector DB: ChromaDB
- LLM: Claude (Anthropic)
- Testing: pytest, pytest-cov
- CI/CD: GitHub Actions
- Containerization: Docker, docker-compose
- Code Quality: Black, Flake8, MyPy, isort, pre-commit
Questions? Feel free to reach out or open an issue!