Skip to content

Deezpa/ML-Credit-Scoring-ThinFile-Consumers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML-Credit-Scoring-ThinFile-Consumers

🎓 Live Research Repository – PhD Thesis Implementation | Credit Scoring with ML & Fairness Evaluation By Dr. Deepa Shukla, 2025 This repository presents the datasets, models, and fairness-aware algorithms developed as part of my PhD thesis.

📢 Explore the live project site:
👉 ML-Credit-Scoring-ThinFile-Consumers GitHub Page

View Documentation

📘 Machine Learning for Credit Scoring of Thin-File Consumers

Final PhD Thesis Implementation – Chapter 5

DOI
License: MIT
Open In Colab


🎓 Project Overview

This repository contains the complete implementation of Chapter 5 of my PhD thesis, titled:
"Machine Learning Algorithms for Credit Scoring of Thin-File Consumers: A Fairness-Aware Evaluation"

It addresses the challenge of evaluating creditworthiness for thin-file consumers—individuals with limited or no traditional credit history—using machine learning models, alternative data, and fairness-aware methodologies.


ML-Credit-Scoring-ThinFile-Consumers/ │ ├── notebooks/ # Jupyter notebooks: training, SHAP/LIME, fairness ├── results/ # Visualizations, model outputs, fairness plots ├── tables/ # Tables 5.1–5.6 in CSV format ├── thesis_demo_colab.ipynb# Google Colab-compatible demo ├── LICENSE # MIT License ├── README.md # You’re here ├── CITATION.cff # Citation metadata


📊 Dataset

A synthetic dataset was created and published as part of this thesis to support reproducibility and fairness research:

📘 Dataset Title: Synthetic Credit Score of Thin-File Consumers
📅 Published: May 2024
📁 Hosted on: Harvard Dataverse
🔗 Permanent DOI: https://doi.org/10.7910/DVN/6MLVVI 📂 Format: CSV with metadata following FAIR principles
🔍 Contents: Credit scoring features (traditional + alternative), synthetic labels, demographic proxies

📚 Citation

If you use this dataset in your research, please cite as:

@dataset{shukla_2024_synthetic,
  author       = {Deepa Shukla},
  title        = {Synthetic Credit Score of Thin-File Consumers},
  year         = 2024,
  doi          = {10.7910/DVN/6MLVVI},
  url          = {https://doi.org/10.7910/DVN/6MLVVI}
}

---

## 🧠 Models Implemented

| Model                     | Description                              |
|--------------------------|------------------------------------------|
| Logistic Regression       | Linear baseline                          |
| Decision Tree             | Interpretable tree-based learner         |
| Random Forest             | Ensemble method for robust performance   |
| Gradient Boosting         | Fine-grained boosting classifier         |
| Support Vector Machine    | Margin-based nonlinear classifier        |
| Deep Neural Network (DNN) | Multi-layer perceptron-based classifier  |

---

## ⚖️ Fairness-Aware Evaluation

This work integrates fairness into machine learning model evaluation through:

- Bias Mitigation Pipelines  
- Pre/Post Fairness Score Comparison  
- Fairness–Accuracy Tradeoff Visualizations  
- SHAP & LIME Interpretability Tools

---

## 🗂️ Repository Structure


---

## 📈 Results Summary (Thesis Chapter 5)

| Table No. | Description                                                |
|-----------|------------------------------------------------------------|
| 5.1       | AUC-ROC Scores for Traditional Models                      |
| 5.2       | Performance on Alternative Data Features                   |
| 5.3       | Model Comparison (AUC, F1)                                 |
| 5.4       | Fairness Metrics across Models                             |
| 5.5       | Bias Scores (Before vs After Mitigation)                  |
| 5.6       | Fairness-aware Model Performance (AUC, Bias %)            |

All tables are reproducible via linked notebooks and available in `/tables/`.

---

## 📚 How to Cite

Please cite this repository and dataset using the following format:

```bibtex
@dataset{shukla_2024_synthetic,
  author       = {Deepa Shukla},
  title        = {Synthetic Credit Score of Thin-File Consumers},
  year         = 2024,
  doi          = {10.7910/DVN/6MLVVI},
  url          = {https://doi.org/10.7910/DVN/6MLVVI}
}

@misc{shukla_2025_repo,
  author       = {Deepa Shukla},
  title        = {ML-Credit-Scoring-ThinFile-Consumers},
  year         = 2025,
  howpublished = {\url{https://github.com/Deezpa/ML-Credit-Scoring-ThinFile-Consumers}},
  note         = {Version 1.0 – Final Thesis Implementation}
}

About

This repository presents the datasets, models, and fairness-aware algorithms developed as part of my PhD thesis.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors