Skip to content

Add PR #672 baseline (Cosine TTT30, 1.0781 BPB)#1

Closed
dhruvjatkar wants to merge 1 commit intomainfrom
worktree-agent-a36af41a
Closed

Add PR #672 baseline (Cosine TTT30, 1.0781 BPB)#1
dhruvjatkar wants to merge 1 commit intomainfrom
worktree-agent-a36af41a

Conversation

@dhruvjatkar
Copy link
Copy Markdown
Owner

Summary

Test plan

  • python3 -m py_compile passes on train_gpt.py
  • README.md has correct metadata
  • No pycache or build artifacts committed

🤖 Generated with Claude Code

…line reference

Fetched train_gpt.py verbatim from upstream openai/parameter-golf PR openai#672
which achieves 1.0781 BPB (3-seed mean, std=0.0041) using TTT_EPOCHS=30
with cosine TTT schedule. This replaces 1.1194 as the baseline to beat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar pushed a commit that referenced this pull request Mar 25, 2026
PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future
improvements must be orthogonal to TTT. This update:
- Sets 1.0781 BPB (PR openai#672) as the new target to beat
- Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2,
  SwiGLU #3, Muon-VS #4, aggressive quant #5, MASA openai#6,
  depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8
- Deprioritizes TTT-related directions already exploited by PR openai#672
- Collapses ~1000 lines of stale Round 0-3.9 session logs into a
  concise historical summary
- Removes resolved blockers (flash_attn, SSH hangs, local runtime)
- Adds fresh Round 1 section with 5 submitted experiments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dhruvjatkar pushed a commit that referenced this pull request Mar 25, 2026
PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future
improvements must be orthogonal to TTT. This update:
- Sets 1.0781 BPB (PR openai#672) as the new target to beat
- Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2,
  SwiGLU #3, Muon-VS #4, aggressive quant #5, MASA openai#6,
  depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8
- Deprioritizes TTT-related directions already exploited by PR openai#672
- Collapses ~1000 lines of stale Round 0-3.9 session logs into a
  concise historical summary
- Removes resolved blockers (flash_attn, SSH hangs, local runtime)
- Adds fresh Round 1 section with 5 submitted experiments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dhruvjatkar
Copy link
Copy Markdown
Owner Author

Merged directly to main via cherry-pick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant