A Machine Learning based web application that predicts the tumor stage of breast cancer using clinical and genetic features. The system is built using Scikit-Learn, Flask, and Render, and provides an interactive web interface for making predictions.
π https://breast-cancer-prediction-system-zc36.onrender.com/
π https://github.com/Kirisaki00/Breast_Cancer_Prediction_System
This project demonstrates a complete end-to-end machine learning workflow:
- Data preprocessing and feature selection
- Training a machine learning classification model
- Exporting the trained model
- Building a Flask API for predictions
- Creating a web interface for user input
- Deploying the application on the cloud
Users can input clinical and biological features and the system predicts the breast cancer tumor stage with probability scores.
- Predicts Breast Cancer Tumor Stage
- Uses clinical and gene expression data
- Displays prediction confidence
- Shows probability breakdown for each tumor stage
- Interactive web UI for easy input
- Fully deployed ML web application
- Python
- Scikit-Learn
- Pandas
- NumPy
- Flask
- Gunicorn
- HTML
- CSS
- JavaScript
- Render
The model used in this project is:
Random Forest Classifier
Random Forest works well for tabular medical datasets and provides robust predictions by combining multiple decision trees.
The model uses the following features:
- Nottingham Prognostic Index
- Tumor Size
- Lymph Nodes Examined Positive
- Chemotherapy
- Hormone Therapy
- Neoplasm Histologic Grade
- Radio Therapy
- Age at Diagnosis
- ER Status
- HER2 Status
- Menopausal State
- AURKA Gene Expression
These features are converted into a structured input vector and passed into the trained model for prediction.
The web interface allows users to:
- Enter patient clinical information
- Select receptor status
- Provide treatment history
- Run prediction
- View tumor stage and probability breakdown
Example prediction output:
-
Predicted Stage: Stage II
-
Model Confidence: 62.6%
-
Probability Breakdown:
- Stage I
- Stage II
- Stage III
Breast_Cancer_Prediction_System
β
βββ app.py
βββ model.pkl
βββ scaler.pkl
βββ requirements.txt
β
βββ templates
β βββ index.html
β
βββ Dataset
β βββ METABRIC_RNA_Mutation.csv
β
βββ BreastCancerPrediction.ipynb
Clone the repository
git clone https://github.com/Kirisaki00/Breast_Cancer_Prediction_System.git
cd Breast_Cancer_Prediction_System
Create virtual environment
python -m venv venv
source venv/bin/activate
Install dependencies
pip install -r requirements.txt
Start the Flask server
python app.py
Open in browser
http://localhost:5000
GET /healthcheck
Response example
{
"status": "ok",
"model": "RandomForestClassifier"
}
POST /predict
Example request
{
"nottingham_prognostic_index":3.2,
"tumor_size":25,
"lymph_nodes_examined_positive":2,
"chemotherapy":1,
"hormone_therapy":1,
"neoplasm_histologic_grade":2,
"radio_therapy":1,
"age_at_diagnosis":55,
"er_status":"Positive",
"her2_status":"Negative",
"inferred_menopausal_state":"Post",
"aurka":3.4
}
This application is deployed using Render.
Deployment steps:
- Push the project to GitHub
- Connect the repository to Render
- Configure build command
pip install -r requirements.txt
- Configure start command
gunicorn app:app
Render automatically redeploys the application whenever new changes are pushed to GitHub.
The model is trained using the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) dataset containing clinical and genomic information from breast cancer patients.
- Add model explainability (SHAP values)
- Improve UI with modern frameworks
- Add multiple ML models for comparison
- Add Docker containerization
- Implement CI/CD pipeline
- Add prediction visualizations
Kirisaki