Gen AI Review & Practice
- /streamlit - example dashboards, quick ML classifier for flowers with sklearn
- /tokenization - intro to tokenization, dividing paragraphs down with nltk
- /stemming : stemming words with nltk porterStemmer and regexpstemmer
- /lemmatization : superiorer, always returns valid word, no partial words, maintains more of original word meaning
- /neural_networks - comparative studies of neural network design with TensorFlow/Keras
- /basic_nn : architecture comparison study - compares 5 different network architectures (shallow, deep, wide, narrow) to understand complexity vs performance trade-offs
- /regression_nn : loss function comparison study - compares MSE, MAE, and Huber loss functions to understand robustness to outliers and error handling
- Uses synthetic data for controlled experiments, focuses on systematic comparisons and decision-making insights rather than single-model tutorials
- /clustering - comparative study of unsupervised clustering algorithms with scikit-learn
- /clustering_comparison : algorithm comparison study - compares K-means, DBSCAN, and Agglomerative clustering on diverse synthetic datasets (spherical, non-spherical, concentric, varying density)
- Evaluates performance using Adjusted Rand Index and Silhouette Score to understand when each algorithm excels
- Demonstrates how cluster shape and data characteristics determine algorithm effectiveness, providing decision-making guidance for real-world applications
- /dimensionality_reduction - advanced comparative study of dimensionality reduction techniques with unique analyses
- /dimensionality_reduction_comparison : comprehensive comparison study - compares PCA, t-SNE, and UMAP on diverse synthetic datasets (high-dimensional, non-linear, manifolds)
- Unique advanced analyses not found in standard online tutorials:
- Downstream task performance: Tests how well each reduced space works for KNN classification, measuring performance retention - critical for production use cases where reduced dimensions are used for downstream ML tasks
- Noise sensitivity degradation: Systematically tests how each technique degrades with increasing noise levels (0.0 to 0.5), showing robustness characteristics and failure modes
- Cluster separability preservation: Quantitative metric measuring inter-cluster to intra-cluster distance ratios in reduced space - shows how well clusters remain separated after reduction
- Parameter sensitivity exploration: Creates systematic parameter sweeps for t-SNE (perplexity) and UMAP (n_neighbors) to understand tuning requirements and sensitivity
- Information-theoretic metrics: Entropy-based analysis of data distribution uniformity in reduced space
- Evaluates performance using comprehensive metrics: trustworthiness (local structure), distance correlation (global structure), cluster separability, downstream performance retention, explained variance ratio (PCA), entropy score, and runtime
- Tests on diverse datasets: high-dimensional spherical clusters, non-linear moons, concentric circles, Swiss roll manifolds, and linear structures with noise
- Demonstrates how data structure (linear vs non-linear, high-dimensional vs low-dimensional) determines technique effectiveness
- Generates multiple visualizations: main comparison plots, metrics charts, noise sensitivity curves, and parameter sensitivity plots
- Provides actionable insights for real-world applications: when to use PCA (linear, fast, interpretable), t-SNE (visualization, local structure), or UMAP (non-linear with global structure balance)
- Focuses on systematic comparisons and decision-making guidance rather than single-technique tutorials
- /active_learning - label-efficiency–focused active learning study
- /active_learning_simulation : compares multiple query strategies (random, least-confidence, margin, and diversity-aware uncertainty) on a synthetic dataset with informative, redundant, and spurious features
- Emphasizes label efficiency rather than just final accuracy, computing area under the learning curve (AULC) and the number of labels needed to reach a target fraction of fully supervised performance
- Uses a simple k-center–style step to make uncertainty sampling diversity-aware, preventing the model from repeatedly querying near-duplicate points
- Focuses on reasoning about when active learning actually saves labels vs when random sampling is already competitive
- /model_evaluation - decision-focused study of model quality under drift and asymmetric costs
- /model_evaluation : evaluation workflow that trains on a base era and evaluates on multiple drifted eras (prevalence shift, feature + noise shift)
- Compares Logistic Regression vs a Calibrated Random Forest using a rich metric set (accuracy, precision, recall, F1, ROC AUC, PR AUC, Brier score)
- Connects metrics to deployment decisions via lightweight decision curve analysis with explicit misclassification cost ratios (false negatives more expensive than false positives)
- Makes drift visible by plotting how discrimination and calibration metrics move across eras, highlighting when a model is robust vs brittle to distribution shift
- Emphasizes a production mindset: choosing both model and threshold based on business costs and expected future data, not just a single held-out test split
- /optimization_dynamics - training-dynamics–focused study of optimizers and learning efficiency
- /optimizer_dynamics : compares SGD+momentum, RMSprop, Adam, and AdamW-style optimization on a shared network and synthetic dataset
- Introduces learning-efficiency metrics such as area under the validation accuracy curve (AULC) and time-to-competence (epochs to reach a target validation accuracy)
- Visualizes generalization gap trajectories (train–validation accuracy difference) and a time-to-competence heatmap across optimizers and learning rates
- Emphasizes reasoning about speed vs stability vs final performance when choosing optimization settings, rather than picking a single “best” optimizer
- /selective_prediction - abstention-focused study of selective prediction under distribution shift
- /selective_prediction : compares multiple confidence policies (max-probability, margin, and negative-entropy) using risk-coverage frontiers
- Chooses thresholds by deployment utility (accuracy-coverage trade-off with abstention cost) rather than one-shot accuracy
- Tests threshold transfer robustness across drifted eras (clean, feature-shift, prevalence+noise shift)
- Introduces Coverage Stability Index, Threshold Transfer Regret, and Coverage Shock Index to quantify fixed-threshold reliability and failure modes under shift
- /shortcut_robustness - tabular study of shortcut learning and spurious correlations when a cheap cue breaks at deployment time
- /shortcut_robustness : synthetic binary classification with an appended shortcut feature that is strongly aligned with the label on train / ID test but ruptured OOD (e.g. (P(s=y)) drops to chance)
- Contrasts ERM vs inverse-frequency–weighted logistic regression (over
(y, shortcut)cells) vs an oracle trained without the shortcut; reports shortcut L2 share, shortcut-to-core tilt, and comparison to an LDA reference on the same scaled features - Surfaces worst (y, shortcut) subgroup accuracy on OOD, a spurious rupture curve (accuracy vs test-time shortcut fidelity), mitigation lift toward oracle OOD performance, and a regularization path for how (C) moves weight off the shortcut
- Focuses on interpretable linear attribution and operational failure modes (high ID accuracy masking OOD collapse), not CNN vision benchmarks
- /leakage_audit - tabular audit of train/test data leakage and whether detectors recover the metric gap
- /leakage_audit : injects known leakage archetypes (label proxy, label–feature cross term, duplicate/echo channel) into synthetic tabular data with a temporal holdout
- Compares four detectors (
corr,mutual_info, per-featureadv_auc,perm_shock) and reports Validation Inflation Index (VII), Leakage Recovery Efficiency (LRE), leaky-column rank, and detector concordance (Kendall tau) - Highlights that high VII exposes inflated CV, while silent leakage can show near-zero VII yet still produce detector disagreement — motivating audit workflows beyond a single adversarial AUC check
- Saves inflation and recovery heatmap figures; focuses on detection + remediation quality, not qualitative leakage checklists alone