I'm a self-taught Data Scientist based in Navi Mumbai, focused on building ML systems where predictions are reliable, explainable, and directly useful for decisions β not just accurate on a leaderboard.
My work centres around a consistent theme: probabilistic modeling, uncertainty quantification, and calibrated predictions that translate into real business logic.
- Probabilistic modeling β calibrated probabilities over hard classifications
- Uncertainty quantification β prediction intervals, confidence estimation
- Explainability β SHAP-based model transparency for regulated domains
- Time-series forecasting β demand forecasting with feature engineering
- Simulation β Monte Carlo methods for season-level uncertainty
End-to-end credit risk pipeline predicting loan default probability on the Home Credit dataset.
- Platt Scaling calibration reducing ECE from 0.041 β 0.004
- Risk bucketing (Low / Medium / High / Very High) aligned with lending policy
- SHAP explainability for individual applicant decisions and regulatory transparency
PythonScikit-learnXGBoostSHAP
Hourly electricity demand forecasting on real EIA grid data (Texas, 2018β2023).
- Time-series feature engineering: lag features, rolling stats, cyclical encoding
- XGBoost achieving 2.40% MAPE β 48% improvement over seasonal naive baseline
- Weather integration via Open-Meteo API
PythonXGBoostScikit-learnPandas
Probabilistic match outcome modeling with explicit focus on draw modeling.
- Calibrated Home / Draw / Away probabilities using Platt Scaling
- Expected Points (xPts) league table from match-level probabilities
- 10,000 Monte Carlo season simulations for title, top-4, and relegation probabilities
PythonXGBoostMonte Carlo Simulation