138M ChatML training stack for Apple Silicon using MLX.
Canonical remote branch: main. Historical legacy-default state is preserved at tag archive/origin-master-2026-03-20.
The repo now treats one path as first-class:
clean Quality2K continuation -> explicitly approved pinned checkpoint -> v19 align/full/repair SFT curriculum
Active entrypoints:
scripts/build_pretrain_quality2k.pyscripts/run_pretrain_quality2k_terminal.shscripts/audit_dense_mainline.pyscripts/review_plain_generation.pyscripts/select_quality2k_checkpoint.pyscripts/pin_quality2k_checkpoint.pyscripts/build_sft_v19_release.pyscripts/run_sft_release.pyscripts/run_sft_release_v19.pyscripts/run_multiturn_coherence_eval.py(fixed multi-turn transcript suite; see SFT Runbook)
Research branch entrypoints:
scripts/extend_tokenizer_with_vm_tokens.pyscripts/build_vm_pilot_dataset.pyscripts/init_vm_from_dense.pyscripts/extend_tokenizer_with_wasm_tokens.pyscripts/normalize_local_docs.pyscripts/build_wasm_subset_corpus.pyscripts/build_wasm80m_pretrain_corpus.pyscripts/build_wasm80m_sft_corpora.pyscripts/run_wasm80m_pretrain.pyscripts/run_wasm80m_sft.pyscripts/eval_wasm80m.py
Historical probe-era and experimental material is retained only as archived reference. See Archive Notes.
Historical dense shims:
scripts/build_sft_v18_release.pyscripts/run_sft_release_v18.pyscripts/run_sft_release_v18_terminal.shThese remain compatibility shims only and are non-authoritative for release decisions.
The WASM80m scripts listed under “Research branch entrypoints” are a parallel tokenizer/model line (docs/wasm80m_runbook.md); they are not part of finishing dense 138M v19 chat.
The only architecture on the release path is the dense 138M line. Experimental dense_vm and dense_wasm80m work are isolated to separate branch/config families and do not share checkpoint compatibility with the dense mainline.
- Preserved raw pretrain base:
checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl - Active continuation config:
configs/pretrain_mlx_138m_quality2k.yaml - Active continuation outputs:
checkpoints/pretrain_mlx_138m_quality2k - Canonical SFT handoff:
checkpoints/pretrain_mlx_138m_quality2k/selected_for_sft.pkl - active v19 SFT configs:
configs/sft_release_v19_align.yamlconfigs/sft_release_v19_full.yamlconfigs/sft_release_v19_repair.yaml
- Canonical chat/eval starting checkpoint (repair stage, step 50):
checkpoints/sft_release_v19_repair/sft_step_50.pkl - Symlink pin for that artifact (used by
scripts/eval_release_candidate.pyby default):checkpoints/sft_release_v19_repair/selected_for_future_work.pkl— must resolve to the same file assft_step_50.pklwhen the pin is current; metadata lives inselected_for_future_work.json. - Eval commands, gate CLI, release bundle, and optional MLX smoke tests: docs/eval.md. Pin promotion,
raw_replyvsreply, andgate_report.jsonretention: docs/sft_runbook.md (sections after Candidate Eval). - Mainline pin metadata for approved selections includes lineage fields:
run_id,source_checkpoint,selected_step,gate_report_path,manifest_hash, andmainline_valid.
python -m venv .venv
source .venv/bin/activate
pip install -e .
PYTHONPATH=src python scripts/setup_verification.pyBuild the curated continuation corpus:
source .venv/bin/activate
PYTHONPATH=src python scripts/build_pretrain_quality2k.pyThe active 138M continuation runtime contract is:
context: 2048 tokensdropout: 0.0compile: truecompile_granularity: microbatchprecision: bfloat16micro_batch_size: 1grad_accum_steps: 16gradient_checkpointing: false
Run the continuation from Terminal:
cd /Users/admin/Downloads/VSCode/AnarchoBot
./scripts/run_pretrain_quality2k_terminal.shStart a fresh continuation explicitly:
cd /Users/admin/Downloads/VSCode/AnarchoBot
./scripts/run_pretrain_quality2k_terminal.sh --clean-runMonitor the run:
source .venv/bin/activate
PYTHONPATH=src python scripts/metrics_window.py \
--log-dir checkpoints/pretrain_mlx_138m_quality2k/logs \
--config configs/pretrain_mlx_138m_quality2k.yamlValidate the staged continuation checkpoints before extending the run:
source .venv/bin/activate
PYTHONPATH=src python scripts/validate_mainline_training.py grad-coverage \
--config configs/pretrain_mlx_138m_quality2k.yaml \
--checkpoint checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl
PYTHONPATH=src python scripts/validate_mainline_training.py checkpoint-diff \
--config configs/pretrain_mlx_138m_quality2k.yaml \
--start-checkpoint checkpoints/pretrain_mlx_138m_chatml/mlx_step_130000.pkl \
--end-checkpoint checkpoints/pretrain_mlx_138m_quality2k/mlx_step_11000.pklFor the completed 12000 continuation run, the preserved candidate pool is 8000, 9000, 10000, 11000, and 12000. Earlier checkpoints rotated out under ckpt_keep: 5.
Select the checkpoint with the deterministic continuation handoff rule:
source .venv/bin/activate
PYTHONPATH=src python scripts/select_quality2k_checkpoint.py \
--manifest examples/quality2k_selection_manifest.json \
--print-pin-commandThe selector uses held-out perplexity with earliest-step tie-break, and only blocks candidates for checkpoint-diff failure, non-finite/missing perplexity, or catastrophic plain-generation regression versus the base review.
Pin the chosen continuation checkpoint only after the clean rerun validations pass:
source .venv/bin/activate
PYTHONPATH=src python scripts/pin_quality2k_checkpoint.py \
--checkpoint checkpoints/pretrain_mlx_138m_quality2k/mlx_step_11000.pkl \
--mainline-valid \
--artifact-role mainline_candidate \
--validation-basis "base grad coverage + compile parity passed; checkpoint diff passed; held-out perplexity won preserved 8000-12000 pool; no catastrophic plain-generation regression vs base"Export a Hugging Face token at runtime before rebuilding the canonical natural-chat slice:
export HF_TOKEN=...Build the v19 SFT corpora:
source .venv/bin/activate
PYTHONPATH=src python scripts/build_sft_v19_release.py --clean-outputThe standalone builder writes reports/sft_v19_release_build/build_summary.json. The shared runner writes per-run build reports under reports/sft_v19_release_builds/<run_id>/build_summary.json.
The latest validated v19 run reported these manifest counts:
- align:
3600examples - release:
22571examples - eval:
1600examples - repair:
2912examples from3000selected repair rows after shard filtering
The shared runner now validates manifest_examples against these bands:
- align:
3000-5000 - release:
20000-28000 - eval:
>=1280 - repair:
2500-3500
Run the v19 curriculum:
cd /Users/admin/Downloads/VSCode/AnarchoBot
PYTHONPATH=src .venv/bin/python scripts/run_sft_release_v19.pyDefault v19 release controls include:
- dual-track raw/guarded gating
- rewrite-rate cap (
<=0.15by default) - one bounded repair extension window (
+25once) before final failure
selected_for_sft.pkl is now blocked from the canonical SFT path unless its sibling metadata file marks it mainline_valid: true.
Run the static dense-mainline audit at any time without touching training:
source .venv/bin/activate
PYTHONPATH=src python scripts/audit_dense_mainline.py \
--json-output reports/pretrain_quality2k_review/static_dense_audit.jsonsource .venv/bin/activate
pip install pytest
PYTHONPATH=src pytestOptional MLX checkpoint smoke tests (loads weights on GPU, uses checkpoints/sft_release_v19_repair/sft_step_50.pkl unless ANARCHOBOT_CANONICAL_CKPT is set):
ANARCHOBOT_RUN_MLX_TESTS=1 PYTHONPATH=src pytest -m mlx_checkpoint tests/test_canonical_checkpoint.pyRepo-tracked content is source, prompts, configs, tests, docs, and curated evidence.
Runtime artifacts are intentionally untracked:
- continuation checkpoints
- generated shard directories
- runtime reports
- transient build JSONL/message dumps
Preserved historical evidence lives under legacy_evidence/.