Skip to content

alscotty/gen_ai

Repository files navigation

gen_ai

Gen AI Review & Practice

Tooling

  • /streamlit - example dashboards, quick ML classifier for flowers with sklearn
  • /tokenization - intro to tokenization, dividing paragraphs down with nltk
    • /stemming : stemming words with nltk porterStemmer and regexpstemmer
    • /lemmatization : superiorer, always returns valid word, no partial words, maintains more of original word meaning
  • /neural_networks - comparative studies of neural network design with TensorFlow/Keras
    • /basic_nn : architecture comparison study - compares 5 different network architectures (shallow, deep, wide, narrow) to understand complexity vs performance trade-offs
    • /regression_nn : loss function comparison study - compares MSE, MAE, and Huber loss functions to understand robustness to outliers and error handling
    • Uses synthetic data for controlled experiments, focuses on systematic comparisons and decision-making insights rather than single-model tutorials
  • /clustering - comparative study of unsupervised clustering algorithms with scikit-learn
    • /clustering_comparison : algorithm comparison study - compares K-means, DBSCAN, and Agglomerative clustering on diverse synthetic datasets (spherical, non-spherical, concentric, varying density)
    • Evaluates performance using Adjusted Rand Index and Silhouette Score to understand when each algorithm excels
    • Demonstrates how cluster shape and data characteristics determine algorithm effectiveness, providing decision-making guidance for real-world applications
  • /dimensionality_reduction - advanced comparative study of dimensionality reduction techniques with unique analyses
    • /dimensionality_reduction_comparison : comprehensive comparison study - compares PCA, t-SNE, and UMAP on diverse synthetic datasets (high-dimensional, non-linear, manifolds)
    • Unique advanced analyses not found in standard online tutorials:
      • Downstream task performance: Tests how well each reduced space works for KNN classification, measuring performance retention - critical for production use cases where reduced dimensions are used for downstream ML tasks
      • Noise sensitivity degradation: Systematically tests how each technique degrades with increasing noise levels (0.0 to 0.5), showing robustness characteristics and failure modes
      • Cluster separability preservation: Quantitative metric measuring inter-cluster to intra-cluster distance ratios in reduced space - shows how well clusters remain separated after reduction
      • Parameter sensitivity exploration: Creates systematic parameter sweeps for t-SNE (perplexity) and UMAP (n_neighbors) to understand tuning requirements and sensitivity
      • Information-theoretic metrics: Entropy-based analysis of data distribution uniformity in reduced space
    • Evaluates performance using comprehensive metrics: trustworthiness (local structure), distance correlation (global structure), cluster separability, downstream performance retention, explained variance ratio (PCA), entropy score, and runtime
    • Tests on diverse datasets: high-dimensional spherical clusters, non-linear moons, concentric circles, Swiss roll manifolds, and linear structures with noise
    • Demonstrates how data structure (linear vs non-linear, high-dimensional vs low-dimensional) determines technique effectiveness
    • Generates multiple visualizations: main comparison plots, metrics charts, noise sensitivity curves, and parameter sensitivity plots
    • Provides actionable insights for real-world applications: when to use PCA (linear, fast, interpretable), t-SNE (visualization, local structure), or UMAP (non-linear with global structure balance)
    • Focuses on systematic comparisons and decision-making guidance rather than single-technique tutorials
  • /active_learning - label-efficiency–focused active learning study
    • /active_learning_simulation : compares multiple query strategies (random, least-confidence, margin, and diversity-aware uncertainty) on a synthetic dataset with informative, redundant, and spurious features
    • Emphasizes label efficiency rather than just final accuracy, computing area under the learning curve (AULC) and the number of labels needed to reach a target fraction of fully supervised performance
    • Uses a simple k-center–style step to make uncertainty sampling diversity-aware, preventing the model from repeatedly querying near-duplicate points
    • Focuses on reasoning about when active learning actually saves labels vs when random sampling is already competitive
  • /model_evaluation - decision-focused study of model quality under drift and asymmetric costs
    • /model_evaluation : evaluation workflow that trains on a base era and evaluates on multiple drifted eras (prevalence shift, feature + noise shift)
    • Compares Logistic Regression vs a Calibrated Random Forest using a rich metric set (accuracy, precision, recall, F1, ROC AUC, PR AUC, Brier score)
    • Connects metrics to deployment decisions via lightweight decision curve analysis with explicit misclassification cost ratios (false negatives more expensive than false positives)
    • Makes drift visible by plotting how discrimination and calibration metrics move across eras, highlighting when a model is robust vs brittle to distribution shift
    • Emphasizes a production mindset: choosing both model and threshold based on business costs and expected future data, not just a single held-out test split
  • /optimization_dynamics - training-dynamics–focused study of optimizers and learning efficiency
    • /optimizer_dynamics : compares SGD+momentum, RMSprop, Adam, and AdamW-style optimization on a shared network and synthetic dataset
    • Introduces learning-efficiency metrics such as area under the validation accuracy curve (AULC) and time-to-competence (epochs to reach a target validation accuracy)
    • Visualizes generalization gap trajectories (train–validation accuracy difference) and a time-to-competence heatmap across optimizers and learning rates
    • Emphasizes reasoning about speed vs stability vs final performance when choosing optimization settings, rather than picking a single “best” optimizer
  • /selective_prediction - abstention-focused study of selective prediction under distribution shift
    • /selective_prediction : compares multiple confidence policies (max-probability, margin, and negative-entropy) using risk-coverage frontiers
    • Chooses thresholds by deployment utility (accuracy-coverage trade-off with abstention cost) rather than one-shot accuracy
    • Tests threshold transfer robustness across drifted eras (clean, feature-shift, prevalence+noise shift)
    • Introduces Coverage Stability Index, Threshold Transfer Regret, and Coverage Shock Index to quantify fixed-threshold reliability and failure modes under shift
  • /shortcut_robustness - tabular study of shortcut learning and spurious correlations when a cheap cue breaks at deployment time
    • /shortcut_robustness : synthetic binary classification with an appended shortcut feature that is strongly aligned with the label on train / ID test but ruptured OOD (e.g. (P(s=y)) drops to chance)
    • Contrasts ERM vs inverse-frequency–weighted logistic regression (over (y, shortcut) cells) vs an oracle trained without the shortcut; reports shortcut L2 share, shortcut-to-core tilt, and comparison to an LDA reference on the same scaled features
    • Surfaces worst (y, shortcut) subgroup accuracy on OOD, a spurious rupture curve (accuracy vs test-time shortcut fidelity), mitigation lift toward oracle OOD performance, and a regularization path for how (C) moves weight off the shortcut
    • Focuses on interpretable linear attribution and operational failure modes (high ID accuracy masking OOD collapse), not CNN vision benchmarks
  • /leakage_audit - tabular audit of train/test data leakage and whether detectors recover the metric gap
    • /leakage_audit : injects known leakage archetypes (label proxy, label–feature cross term, duplicate/echo channel) into synthetic tabular data with a temporal holdout
    • Compares four detectors (corr, mutual_info, per-feature adv_auc, perm_shock) and reports Validation Inflation Index (VII), Leakage Recovery Efficiency (LRE), leaky-column rank, and detector concordance (Kendall tau)
    • Highlights that high VII exposes inflated CV, while silent leakage can show near-zero VII yet still produce detector disagreement — motivating audit workflows beyond a single adversarial AUC check
    • Saves inflation and recovery heatmap figures; focuses on detection + remediation quality, not qualitative leakage checklists alone

About

genAI review & practice

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors