Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# MergedTop3_v3 clean H100 rerun

This folder contains a clean single-seed `track_10min_16mb` submission based on a merged top-stack recipe:

- 11 layers
- XSA on the last 4 layers
- EMA
- 3x MLP
- SmearGate
- BigramHash with 2048 buckets
- mixed int6 quantization with zstd
- sequence length 2048
- Muon/AdamW weight decay 0.04
- sliding-window eval with stride 64
- Partial RoPE with `ROPE_DIMS=16`
- layerwise LN scaling
- GPTQ-lite clip search
- `WARMDOWN_ITERS=3500`

## Clean run result

Fresh uninterrupted `8x H100` run completed on 2026-03-25 with:

- `step_stop=5347`
- `train_time=580.213s`
- `final_int6_roundtrip_exact val_loss=1.96565872`
- `final_int6_roundtrip_exact val_bpb=1.16417381`
- `eval_time=44.398s`
- `bytes_model_int6_zstd=15,562,277`
- `bytes_code=72,924`
- `bytes_total=15,635,201`

This run stayed under both required caps:

- training time `< 600s`
- evaluation time `< 600s`
- artifact size `< 16,000,000`

## Files

- `train_gpt.py`
- `README.md`
- `submission.json`
- `train_seed1337.log`
- `requirements.txt`

## Notes

- This is a clean single-seed run, not a multi-seed statistical record claim.
- `train_seed1337.log` is the original remote run log recovered after the run.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
zstandard>=0.23.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"author": "Hael",
"github_id": "hesong0222-dev",
"name": "MergedTop3_v3 clean H100 rerun",
"blurb": "Fresh uninterrupted 8xH100 rerun of the merged top-stack recipe: March 20 backbone, March 21 Partial RoPE and LN scale, and March 22 GPTQ-lite clip search plus warmdown 3500. This records the clean single-seed run completed on March 25, 2026.",
"date": "2026-03-25T00:00:00Z",
"track": "track_10min_16mb",
"val_loss": 1.96565872,
"val_bpb": 1.16417381,
"step_stop": 5347,
"wallclock_seconds": 580.213,
"eval_time_seconds": 44.398,
"bytes_total": 15635201,
"bytes_model_int6_zstd": 15562277,
"bytes_code": 72924
}
Loading