Skip to content
View yinruide's full-sized avatar

Highlights

  • Pro

Block or report yinruide

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
yinruide/README.md

Hi, I'm Ruide 👋

NYU Courant · CDS '27 — B.A. in Data Science and Mathematics, Minor in Media, Culture, and Communication

My work spans prediction, interpretability, and safety — from establishing survival models in production to building agentic guardrails for LLM outputs. Currently focused on vision-language models and behavioral cloning for autonomous driving.

Incoming Data Scientist Intern @ ByteDance (Summer 2026).
Previously Business Intelligence Intern @ Shopee — built survival analysis models & ETL pipelines on Hive data (Summer 2025).

Skills & Tools

Languages: Python, SQL, Java
Frameworks: PyTorch, scikit-learn, Hugging Face, LangGraph
Models: BERT, LLaMA, CNN, Transformer, DeepFM, CoxPH, DeepHit, XGBoost, LightGBM
Techniques: Behavior Cloning, LoRA Fine-tuning, SHAP Explainability, Survival Analysis, Causal Inference, A/B Testing
Tools: Git, Hive, Streamlit, Power BI, Tableau

Featured Projects

Project Description
DetoxiGuard Agentic LLM guardrail that classifies toxic content and iteratively rewrites flagged outputs
Fine-tuned BERT & LLaMA (LoRA), recall-optimized ensemble with per-label threshold tuning.
· Planning to submit to EMNLP 2026 SRW
Insurance Cost Predictor Streamlit app predicting medical insurance costs with two-stage routing pipeline
MLP, Mixture Density Network from scratch, quantile regression
Credit Card Fraud Detection End-to-end fraud detection on 284K imbalanced transactions (~10% AUPRC improvement)
Compared LR, RF, XGBoost, LightGBM across resampling strategies; benchmarked Isolation Forest & Autoencoder
Click-Through Rate Prediction CTR prediction on 4M-row Avazu dataset with 8.7% LogLoss reduction
LightGBM, XGBoost, DeepFM (PyTorch), weighted ensemble, SHAP analysis & calibration (ECE = 0.003)

Connect

📫 ry2406@nyu.edu · ruideyin147@gmail.com

🔗 linkedin.com/in/ruideyin

Popular repositories Loading

  1. DetoxiGuard DetoxiGuard Public

    An Agentic LLM System for Toxic Comment Detection and Correction

    Jupyter Notebook 3

  2. Insurance-Cost-Predictor Insurance-Cost-Predictor Public

    A Streamlit web app that predicts annual medical insurance costs using ensemble methods, neural networks, and a from-scratch Mixture Density Network.

    Jupyter Notebook 2

  3. Click-Through-Rate-Prediction Click-Through-Rate-Prediction Public

    Predicting ad click-through rates using machine learning on the Avazu dataset

    Jupyter Notebook

  4. Credit-Card-Fraud-Detection Credit-Card-Fraud-Detection Public

    Credit card fraud detection using supervised ML, unsupervised anomaly detection, and SHAP explainability on highly imbalanced data.

    Jupyter Notebook

  5. yinruide yinruide Public

    My GitHub profile

  6. gaze-bc-warmup gaze-bc-warmup Public

    Systematic comparison of gaze integration strategies. Built as groundwork toward computational models of visual attention.

    Python