Heart failure is a leading cause of mortality worldwide, with an estimated 64.3 million people affected globally. Early prediction of adverse outcomes using routinely collected clinical data can substantially improve patient management and resource allocation. This project applies both supervised and unsupervised machine learning techniques to the Heart Failure Clinical Records dataset (Chicco & Jurman, 2020) to predict patient mortality. We implement and compare two deep learning classifiers — a Convolutional Neural Network (CNN) and a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) — alongside two unsupervised approaches — K-Means clustering and a custom Self-Organizing Map (SOM). A lightweight Random Forest model is also trained and exported for real-time inference through an interactive Streamlit web application.
- Background
- Dataset
- Methodology
- Results
- Interactive Demo
- Repository Structure
- Installation
- Usage
- References
Cardiovascular diseases account for approximately 31% of all global deaths (WHO, 2021). Heart failure specifically occurs when the heart cannot pump sufficient blood to meet the body's needs. Timely prediction of patient outcomes following a heart failure episode is critical for clinical decision-making.
Machine learning has shown significant promise in this domain. Chicco & Jurman (2020) demonstrated that a simple set of 12 clinical features, routinely measured during follow-up visits, could predict patient survival with high accuracy. This project builds upon that foundation by:
- Conducting rigorous exploratory data analysis with statistical feature selection
- Implementing and comparing multiple deep learning architectures
- Applying unsupervised clustering methods to uncover latent patient subgroups
- Deploying the best-performing model in an accessible web interface
Source: UCI Machine Learning Repository Citation: Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making, 20(1), 1–16. https://doi.org/10.1186/s12911-020-1023-5
| Property | Value |
|---|---|
| Instances | 299 |
| Features | 12 clinical predictors + 1 target |
| Missing Values | None |
| Class Balance | 67.9% survived / 32.1% deceased |
| Follow-up Period | 4–285 days |
| Feature | Type | Unit | Description |
|---|---|---|---|
age |
Numeric | years | Patient age |
anaemia |
Binary | — | Decrease of red blood cells (haemoglobin) |
creatinine_phosphokinase |
Numeric | mcg/L | Level of CPK enzyme in blood |
diabetes |
Binary | — | Presence of diabetes |
ejection_fraction |
Numeric | % | Percentage of blood leaving the heart per contraction |
high_blood_pressure |
Binary | — | Presence of hypertension |
platelets |
Numeric | kiloplatelets/mL | Platelet count in blood |
serum_creatinine |
Numeric | mg/dL | Creatinine level in blood serum |
serum_sodium |
Numeric | mEq/L | Sodium level in blood serum |
sex |
Binary | — | Biological sex (0=female, 1=male) |
smoking |
Binary | — | Smoking status |
time |
Numeric | days | Follow-up period duration |
DEATH_EVENT |
Binary | — | Target: 1 = deceased, 0 = survived |
- Unit Normalization: Platelet counts converted to kiloplatelets/mL for consistency
- Feature Renaming:
creatinine_phosphokinase→CPKfor readability - Scaling: StandardScaler (zero mean, unit variance) for neural network inputs; MinMaxScaler for SOM
- Class Imbalance: SMOTE (Synthetic Minority Over-sampling Technique) applied to training set only
Chi-square independence tests assessed the statistical association between binary features and the target variable (DEATH_EVENT). Features with p > 0.05 (sex, high blood pressure, diabetes) were identified as less discriminative, though all 12 features were retained for the full models to avoid information loss.
The input features are reshaped into a 1D sequence (12 × 1) and processed through three Conv1D layers with increasing filter sizes (64, 128, 256), interleaved with MaxPooling1D and Dropout (rate = 0.25) layers. A final dense layer with softmax activation produces class probabilities.
| Hyperparameter | Value |
|---|---|
| Activation (Conv) | Sigmoid |
| Optimizer | Adam |
| Loss | Binary cross-entropy |
| Epochs | 1000 (EarlyStopping, patience=30) |
| Batch Size | 32 |
The input is reshaped to a 3D tensor for recurrent processing. A single LSTM layer (64 units) captures temporal dependencies across the feature sequence, followed by dropout and dense layers.
| Hyperparameter | Value |
|---|---|
| LSTM Units | 64 |
| Activation (LSTM) | Tanh |
| Optimizer | Adam |
| Validation Split | 20% |
| Epochs | 1000 (EarlyStopping, patience=30) |
Unsupervised partitioning into k=2 clusters (corresponding to survival outcomes) using Euclidean distance. Cluster quality evaluated via Silhouette Score and Adjusted Rand Index.
A custom implementation of a 25×25 competitive learning network. Weights are updated iteratively using decaying learning rates and neighborhood radii. After training, each neuron is labeled by majority vote from mapped training samples, enabling classification of unseen patients.
| Hyperparameter | Value |
|---|---|
| Grid Size | 25 × 25 |
| Max Learning Rate | 0.4 |
| Max Neighborhood Distance | 4 |
| Training Steps | 150,001 |
A 300-tree Random Forest classifier trained on the full feature set with SMOTE-balanced data. Serialized via joblib for use in the Streamlit application.
| Model | Train Accuracy | Test Accuracy | Notes |
|---|---|---|---|
| CNN | 88.76% | 73.3% | Overfitting observed |
| LSTM-RNN | 90.00% | 72.2% | Overfitting observed |
| Random Forest | ~98% | ~83% | Best supervised model |
| SOM | — | 96.32% | Unsupervised; competitive labeling |
| Metric | Value |
|---|---|
| Silhouette Score | 0.801 |
| Adjusted Rand Index | Computed |
| Within-Cluster Sum of Squares | Computed |
- SOM achieved the highest classification accuracy (96.32%), demonstrating that unsupervised topological learning can outperform supervised deep learning on this dataset.
- Both CNN and RNN overfit significantly — training accuracy exceeded test accuracy by ~15–18 percentage points — suggesting the 299-sample dataset is insufficient for complex deep learning architectures without stronger regularization or augmentation.
- K-Means produced well-separated clusters (Silhouette = 0.801), indicating that the 12 clinical features encode genuinely distinct patient subpopulations.
- Feature selection via chi-square identified
time(follow-up duration) andserum_creatinineas the most predictive features, consistent with Chicco & Jurman (2020).
A Streamlit web application is included for real-time prediction:
- Input patient clinical values manually or generate a random synthetic patient
- View survival probability with a confidence gauge
- Inspect feature importance rankings
- Compare against population statistics
Heart_Failure_Classification/
├── Heart_Failure_Classification.ipynb # Main analysis notebook
├── app.py # Streamlit prediction UI
├── train_model.py # Script to train & save Random Forest model
├── heart_failure_clinical_records_dataset.csv
├── requirements.txt
├── README.md
└── models/
└── rf_heart_failure.pkl # Saved Random Forest model (generated by train_model.py)
# Clone the repository
git clone https://github.com/Abhi183/Heart_Failure_Classification.git
cd Heart_Failure_Classification
# Create virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtjupyter notebook Heart_Failure_Classification.ipynbpython train_model.pyThis generates models/rf_heart_failure.pkl.
streamlit run app.pyNavigate to http://localhost:8501 in your browser.
-
Chicco, D., & Jurman, G. (2020). Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making, 20(1), 1–16. https://doi.org/10.1186/s12911-020-1023-5
-
Dua, D., & Graff, C. (2019). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
-
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.
-
Kohonen, T. (1990). The Self-Organizing Map. Proceedings of the IEEE, 78(9), 1464–1480.
-
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
This project is licensed under the MIT License. Dataset: Original dataset licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
DSDA 385 — Abhishek Shekhar
