NYU Courant · CDS '27 — B.A. in Data Science and Mathematics, Minor in Media, Culture, and Communication
My work spans prediction, interpretability, and safety — from establishing survival models in production to building agentic guardrails for LLM outputs. Currently focused on vision-language models and behavioral cloning for autonomous driving.
Incoming Data Scientist Intern @ ByteDance (Summer 2026).
Previously Business Intelligence Intern @ Shopee — built survival analysis models & ETL pipelines on Hive data (Summer 2025).
Languages: Python, SQL, Java
Frameworks: PyTorch, scikit-learn, Hugging Face, LangGraph
Models: BERT, LLaMA, CNN, Transformer, DeepFM, CoxPH, DeepHit, XGBoost, LightGBM
Techniques: Behavior Cloning, LoRA Fine-tuning, SHAP Explainability, Survival Analysis, Causal Inference, A/B Testing
Tools: Git, Hive, Streamlit, Power BI, Tableau
| Project | Description |
|---|---|
| DetoxiGuard | Agentic LLM guardrail that classifies toxic content and iteratively rewrites flagged outputs Fine-tuned BERT & LLaMA (LoRA), recall-optimized ensemble with per-label threshold tuning. · Planning to submit to EMNLP 2026 SRW |
| Insurance Cost Predictor | Streamlit app predicting medical insurance costs with two-stage routing pipeline MLP, Mixture Density Network from scratch, quantile regression |
| Credit Card Fraud Detection | End-to-end fraud detection on 284K imbalanced transactions (~10% AUPRC improvement) Compared LR, RF, XGBoost, LightGBM across resampling strategies; benchmarked Isolation Forest & Autoencoder |
| Click-Through Rate Prediction | CTR prediction on 4M-row Avazu dataset with 8.7% LogLoss reduction LightGBM, XGBoost, DeepFM (PyTorch), weighted ensemble, SHAP analysis & calibration (ECE = 0.003) |