📌 Short Description
CardioCare develops a predictive machine learning model to identify individuals at risk of heart disease using clinical and behavioral health indicators. It demonstrates a complete end-to-end workflow—from exploratory data analysis and feature engineering to model training, evaluation, and API deployment with FastAPI.
This project highlights data science, machine learning, and deployment skills through a healthcare-focused classification problem. Beyond notebooks, the trained model is wrapped in a FastAPI application with Pydantic validation and deployed as a REST API, making it usable in real-world scenarios.
👉 You can go through the project files for detailed instructions on usage and testing.
- Loaded structured health data with pandas.
- Performed descriptive analysis (
.info(),.describe()) to understand data distribution. - Summarized missing values and identified patterns.
- Removed records with missing values.
- Standardized features for training.
- Visualized relationships between risk factors and heart disease (gender, age groups, BMI, glucose, smoking, diabetes).
- Tools: matplotlib, seaborn.
- Applied correlation analysis to identify top predictors.
- Applied SMOTE to balance dataset classes and improve fairness.
- Trained and compared two models:
- 🌲 Random Forest Classifier
- 📍 K-Nearest Neighbors (KNN)
- Evaluated models using:
- Accuracy Score
- Confusion Matrix
- Precision, Recall, F1-Score
- Saved trained model using pickle.
- Built a FastAPI app with:
/predict/endpoint for predictions- Pydantic validation for clean input handling
- Tested the API with Python requests.
- Data Wrangling & Analysis: pandas, numpy
- Statistical Visualization: matplotlib, seaborn
- Exploratory Data Analysis (EDA) & Feature Engineering
- Class Imbalance Handling: SMOTE
- Machine Learning Classification: Random Forest, KNN (scikit-learn)
- Model Evaluation: Accuracy, Precision, Recall, F1-Score
- Deployment & API Development: FastAPI, Pydantic, Pickle, Docker
- API Testing: requests
- Python 3.11
- Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, imblearn, FastAPI, Pydantic, requests
EDA_and_HeartDisease_prediction.ipynb→ Jupyter Notebook containing EDA, feature engineering, model training, and evaluationmodel_api.py→ FastAPI application for serving the trained ML model as a REST APItesting.py→ Client script to test API endpoints with sample input dataDockerfile→ Docker configuration file to containerize the FastAPI applicationmodel.pkl→ Serialized trained machine learning modeldataset.csv→ Raw dataset used for training and analysis-
- Other supporting files → Pickled trained model, dataset.
🔗 Note: Please check the repository files for detailed instructions on how to run the project and test the API.