π London
- β±οΈ Sokoban Speedrun - The fastest recipe to teach Qwen3 Sokoban wins.
- π― Target Policy Optimization - Turn GRPO into distribution matching
- π RamenGPT - Training GPT with a single GPU
- π€ Agentic Uncertainty - Measuring SWE agent uncertainty
- ποΈ ReasoningGym - 100+ RL environments for LLM RLVR
- π¬ PySpur - A visual playground for agentic workflows
- ποΈββοΈ No Train No Gain - Training BERT and T5 models
- π§ SIN - Causal inference with embedded treatments
- βοΈ LAWA - LAtest Weight Averaging
- πͺ WASAM - Weight-Averaged Sharpness-Aware Minimization
- π« PAML - Probabilistic Active Meta Learning



