Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
3383 commits
Select commit Hold shift + click to select a range
1bfafd1
test(native): grid-sync feasibility probe — Metal co-schedules up to …
Snider Jun 23, 2026
d2013d6
perf(native): megakernel pattern proven — two gemvs + in-kernel grid …
Snider Jun 23, 2026
34d7aab
perf(native): 4-bit gemv block — inlined affine dequant, token-identi…
Snider Jun 23, 2026
43d077b
perf(native): 2-stage 4-bit FFN megakernel — whole SwiGLU MLP in one …
Snider Jun 23, 2026
9b5582f
perf(native): simd-cooperative qgemv block — and it disproved my FFN …
Snider Jun 23, 2026
40bf59c
perf(native): within-layer op-cost instrument — decode is dispatch-co…
Snider Jun 24, 2026
0804c6e
perf(native): MoE decode instrument — native 26B-A4B runs 7.8 tok/s, …
Snider Jun 24, 2026
7798414
perf(native): model-agnostic decode probe — finds the 12B/31B dense m…
Snider Jun 24, 2026
81ac57a
test(native): replay-only decode bench — proves the per-token replay …
Snider Jun 24, 2026
af4801e
perf(native): per-layer kvHeads in the ICB attention recorder (founda…
Snider Jun 24, 2026
dda2feb
perf(native): per-layer kvHeads quant-recorder + cache plumbing — byt…
Snider Jun 24, 2026
a9580f2
perf(native): PROVEN — cross-TG megakernel coherency works on macOS 2…
Snider Jun 24, 2026
f1c5bb2
perf(native): FFN megakernel REALISED — atomic handoff + device-scope…
Snider Jun 24, 2026
472384f
perf(native): attention megakernel — whole attention half in one disp…
Snider Jun 24, 2026
0072f25
perf(native): FULL-LAYER megakernel — whole decode layer in ONE dispa…
Snider Jun 24, 2026
03ce917
test(native): context-scaling decode instrument — sizes the long-cont…
Snider Jun 24, 2026
6868d28
perf(native): wire 2-pass sdpa_vector for long-context KV (token-iden…
Snider Jun 24, 2026
cf8bef3
perf(native): route live decode SDPA to 2-pass past the long-context …
Snider Jun 24, 2026
3a3f94a
fix(native): open the ICB fast path to non-uniform kvHeads (12B/31B M…
Snider Jun 24, 2026
b2dc1a9
refactor(native): unify decode-op emit behind a dispatchSink (1/N: RM…
Snider Jun 24, 2026
55de890
refactor(native): unify RMSNorm family behind dispatchSink (2/N)
Snider Jun 24, 2026
aafb9df
refactor(native): unify binary op behind dispatchSink + make it zero-…
Snider Jun 24, 2026
0b20893
refactor(native): fold the fused gelu into emitBinary (4/N)
Snider Jun 24, 2026
70ea174
refactor(native): unify RoPE behind dispatchSink (5/N — first varying…
Snider Jun 24, 2026
fd32a10
refactor(native): unify fused QK-norm-rope behind dispatchSink (6/N)
Snider Jun 24, 2026
5098f45
refactor(native): unify SDPA behind dispatchSink (7/N — closes the or…
Snider Jun 24, 2026
75decb3
refactor(native): unify 4-bit qmv behind dispatchSink (8/N — the comm…
Snider Jun 24, 2026
7e88416
refactor(native): unify bf16 tiled gemv behind dispatchSink (9/N)
Snider Jun 24, 2026
90996e2
refactor(native): unify the non-arch ICB recorders' gemv/qmv (10/N — …
Snider Jun 24, 2026
2507bb5
fix(native): stream the sampled decode path (generate timer read 0.000s)
Snider Jun 25, 2026
f7c5bc2
docs(native): handoff map — streaming fixed, sampled-ICB next, known …
Snider Jun 25, 2026
da81995
advance native engine parity
Snider Jun 28, 2026
11af58f
native: restore KV block metadata and sliding windows
Snider Jun 28, 2026
ec1662c
native: restore KV head snapshots
Snider Jun 28, 2026
228661b
native: restore KV exact trusted prefixes
Snider Jun 28, 2026
3c0b818
native: restore raw KV layer slab order
Snider Jun 28, 2026
2cab381
native: route full-vocab TopP sampling
Snider Jun 28, 2026
833a83f
native: restore float32 KV slabs
Snider Jun 28, 2026
932c234
native: preserve sliding kv token offsets
Snider Jun 28, 2026
ce1790a
native: restore sliding kv block tails
Snider Jun 28, 2026
c13dc97
native: validate fixed kv block metadata
Snider Jun 28, 2026
9bad95f
native: avoid prompt and kv state copies
Snider Jun 28, 2026
807f165
native: retain sampled prompt cache suffix
Snider Jun 28, 2026
cbd93d6
native: warm prompt cache with retained logits
Snider Jun 28, 2026
f90ef28
native: replay sampled prompt cache logits
Snider Jun 28, 2026
a367843
native: bridge root kv snapshots
Snider Jun 28, 2026
5a9a07f
native: stream root kv blocks
Snider Jun 28, 2026
c1c34ca
docs: keep native goal tracker compact
Snider Jun 28, 2026
73813f2
native: trust root kv block prefixes
Snider Jun 28, 2026
8f68c4c
native: restore all-trusted kv block prefixes
Snider Jun 28, 2026
927d090
native: honour kv block prefix tokens
Snider Jun 28, 2026
847dec1
native: slice kv block prefix restores
Snider Jun 28, 2026
9e921d7
native: chain sampled gpu tail inputs
Snider Jun 28, 2026
35688ff
native: pipeline sampled gpu tail
Snider Jun 28, 2026
846393f
native: prewarm pipelined peer icb
Snider Jun 28, 2026
6b931a4
native: start sampled cache logits on gpu tail
Snider Jun 28, 2026
c9fa426
native: start cache logits on gpu tail
Snider Jun 28, 2026
0db8dc3
native: prefill prompt cache prefixes on gpu
Snider Jun 28, 2026
dff0445
native: warm prompt cache final token on gpu
Snider Jun 28, 2026
ddbb658
native: prefill retained prompts on gpu
Snider Jun 28, 2026
cd1f87e
docs: add tracker hygiene top rule
Snider Jun 28, 2026
3682e5c
native: prefill retained tokens through gpu inputs
Snider Jun 28, 2026
c5947db
native: track resident ids after session generate
Snider Jun 28, 2026
306925f
native: add no-copy vproj head rms into path
Snider Jun 28, 2026
745db33
native: add no-copy rms qmv into path
Snider Jun 28, 2026
52dc768
native: add no-copy qknorm rope into path
Snider Jun 28, 2026
bb4a97e
native: add no-copy rms residual into path
Snider Jun 28, 2026
4390457
native: add no-copy sdpa into paths
Snider Jun 28, 2026
ea5d6ef
native: add no-copy bf16 steel matmul into path
Snider Jun 28, 2026
60239d9
native: add no-copy quant embed gather into path
Snider Jun 28, 2026
3a04e27
docs: tighten tracker hygiene rule
Snider Jun 28, 2026
9dfb283
native: add no-copy rope into paths
Snider Jun 28, 2026
dba1668
native: add no-copy bf16 unary into path
Snider Jun 28, 2026
b50ba49
native: add no-copy bf16 scalar into path
Snider Jun 28, 2026
10a626e
native: add no-copy bf16 binary into paths
Snider Jun 28, 2026
109cfb2
native: add no-copy bf16 tanh into path
Snider Jun 28, 2026
71f6140
native: add no-copy bf16 gelu into path
Snider Jun 28, 2026
93b33c9
native: add no-copy bf16 const into path
Snider Jun 28, 2026
aac0e53
native: add no-copy bf16 gelu gate into path
Snider Jun 28, 2026
5cb7442
native: add no-copy bf16 mlp block into path
Snider Jun 28, 2026
1ee424d
native: add no-copy bf16 attention block into path
Snider Jun 28, 2026
89d76bc
native: add no-copy bf16 decode layer into path
Snider Jun 28, 2026
0864594
native: add reusable decode forward outputs
Snider Jun 28, 2026
9ea7cbc
native: add reusable bf16 arch outputs
Snider Jun 28, 2026
2ef6186
native: expose reusable backend decode outputs
Snider Jun 28, 2026
c61e0e3
native: add reusable quant arch outputs
Snider Jun 28, 2026
b51f47d
native: add reusable icb decode outputs
Snider Jun 28, 2026
3c0011e
native: add reusable bf16 matvec outputs
Snider Jun 28, 2026
f549647
native: add reusable float qmv outputs
Snider Jun 28, 2026
161a10e
native: add reusable float matvec outputs
Snider Jun 28, 2026
b6909ff
native: add reusable float steel gemm outputs
Snider Jun 28, 2026
4f5438b
native: add reusable float softmax outputs
Snider Jun 28, 2026
2e848d9
native: reuse mtp attention intermediates
Snider Jun 28, 2026
9226251
native: add reusable mtp attention scratch
Snider Jun 28, 2026
9b34bb1
native: reuse audio attention intermediates
Snider Jun 28, 2026
b3249c0
native: add no-copy bf16 layernorm into
Snider Jun 28, 2026
c716500
native: add no-copy f32 layernorm into
Snider Jun 28, 2026
61f74c6
native: write unary into caller output
Snider Jun 28, 2026
b5e714c
native: write softmax into caller output
Snider Jun 28, 2026
6057230
native: write binary into caller output
Snider Jun 28, 2026
96a3bf4
native: add no-copy f32 rmsnorm into
Snider Jun 28, 2026
8209946
docs(goal): keep tracker compact
Snider Jun 28, 2026
36df015
native: write matvec into caller output
Snider Jun 28, 2026
16e66d5
native: write f32 matmul into caller output
Snider Jun 28, 2026
fac154a
native: write f32 nt matmul into caller output
Snider Jun 29, 2026
41c3be7
native: write split-k matmul into caller output
Snider Jun 29, 2026
4af742f
native: add no-copy f32 rope into
Snider Jun 29, 2026
b2f392e
native: write lm head into caller logits
Snider Jun 29, 2026
1f267e6
native: reuse bf16 generate embedding buffer
Snider Jun 29, 2026
962c0ea
native: reuse dense prefill embeddings
Snider Jun 29, 2026
8abe935
native: reuse prompt cache embeddings
Snider Jun 29, 2026
2c81a16
docs: tighten tracker rule
Snider Jun 29, 2026
c41f06a
native: reuse mtp batch embeddings
Snider Jun 29, 2026
f4a2530
native: reuse training capture embeddings
Snider Jun 29, 2026
156975d
native: write mlp transform into no-copy output
Snider Jun 29, 2026
e12eaa2
native: add no-copy quant mlp transform output
Snider Jun 29, 2026
b5a4bd0
native: add no-copy quant mega output
Snider Jun 29, 2026
17794a0
docs: forbid tracker progress notes
Snider Jun 29, 2026
7b61b71
native: add no-copy quant moe output
Snider Jun 29, 2026
0b608d8
native: add no-copy bf16 moe output
Snider Jun 29, 2026
9439d26
native: add no-copy fused moe experts output
Snider Jun 29, 2026
d8b4d52
native: add no-copy quant moe experts output
Snider Jun 29, 2026
c22f0be
native: add no-copy bf16 moe experts output
Snider Jun 29, 2026
8a5fccb
native: write moe decode output directly
Snider Jun 29, 2026
23a1612
native: add no-copy rms qmv output
Snider Jun 29, 2026
34553fb
native: add no-copy vproj output
Snider Jun 29, 2026
f0268dd
native: add no-copy qknorm rope output
Snider Jun 29, 2026
daea858
docs: make tracker edits opt-in
Snider Jun 29, 2026
b3bd99d
native: write head logits into caller output
Snider Jun 29, 2026
a7978a2
docs: keep tracker files compact
Snider Jun 29, 2026
71c881b
native: add no-copy decode layer icb output
Snider Jun 29, 2026
8d55983
native: add no-copy decode token icb output
Snider Jun 29, 2026
73f50eb
native: add no-copy decode step output
Snider Jun 29, 2026
0ab1a61
native: add no-copy attention step output
Snider Jun 29, 2026
02a5dcf
native: add no-copy step kv cache views
Snider Jun 29, 2026
7a15252
native: pool decode forward scratch buffers
Snider Jun 29, 2026
f163fd2
native: pool quant decode forward buffers
Snider Jun 29, 2026
0cb68a8
docs: harden tracker progress rule
Snider Jun 29, 2026
77475ab
native: add icb output reuse APIs
Snider Jun 29, 2026
d5cd5ab
native: bind icb outputs no-copy
Snider Jun 29, 2026
12a1159
native: bind arch icb outputs no-copy
Snider Jun 29, 2026
dc5c589
native: reuse quant arch icb replay scratch
Snider Jun 29, 2026
7306b15
native: write icb retained hidden directly
Snider Jun 29, 2026
6997631
native: write sampled icb hidden directly
Snider Jun 29, 2026
eb31e2b
native: write gpu input icb hidden directly
Snider Jun 29, 2026
8ecf506
native: write chained final hidden directly
Snider Jun 29, 2026
8af8649
native: write sampled chained hidden directly
Snider Jun 29, 2026
4f0b210
native: write pipelined hidden directly
Snider Jun 29, 2026
393b778
native: write gpu prefill hidden directly
Snider Jun 29, 2026
209122b
native: restore raw float kv slabs
Snider Jun 29, 2026
cc185e2
inference improvments
Snider Jul 1, 2026
aaa4286
perf(native): bound sliding-window ICB KV cache to the window (AX-11)
Snider Jul 1, 2026
8f483f8
chore(go-mlx): re-vendor external/ to current go-inference + core.Res…
Snider Jul 1, 2026
166a7cc
chore(go-mlx): remove dead go/tests package (420 LOC)
Snider Jul 1, 2026
fe5f983
native: add diffusion and assistant gguf parity
Snider Jul 2, 2026
85a83d1
mlx: wire native multimodal cli surface
Snider Jul 2, 2026
40dd8ec
chore(deps): bump external/go-inference 39ee78e -> dfd6a44 — the ten-…
Snider Jul 2, 2026
46f92f6
refactor(hf): consume dappco.re/go/inference/hf — shared client/cache…
Snider Jul 2, 2026
733a60e
refactor(gguf): delegate pure data shapes + path/normalise helpers on…
Snider Jul 2, 2026
fa13121
refactor(merge): alias types/consts onto inference/merge, test fixtur…
Snider Jul 2, 2026
fb32609
chore(deps): bump external/go-inference dfd6a44 -> e4a7175 — the two …
Snider Jul 2, 2026
bee2752
chore(deps): add external/go-store submodule + go.work entry — align …
Snider Jul 2, 2026
ecf8043
refactor(ebook): consume inference/modelmgmt — 571 -> 127 production …
Snider Jul 2, 2026
3b3c3f8
refactor(train,distill,grpo): delegate loss/reward/checkpoint/benchma…
Snider Jul 2, 2026
411dafc
refactor(merge): delete merge_copy.go — pack-copy helpers route throu…
Snider Jul 2, 2026
e0107d7
refactor(gguf): delete the nine local quantise kernels — route throug…
Snider Jul 2, 2026
c1abee1
docs(safetensors): record the divergence rationale vs inference/safet…
Snider Jul 2, 2026
8c29662
refactor(pkg/scheme): thin compatibility layer over inference/scheme …
Snider Jul 2, 2026
6c2100e
refactor(lora,artifact): delegate inspection + export onto inference;…
Snider Jul 2, 2026
82243bc
goal: refresh the native worklist — rebase line done, session finding…
Snider Jul 2, 2026
7bbb7e6
style: gofmt the five pre-existing strays (train, safetensors, gguf t…
Snider Jul 2, 2026
b679dde
feat(generate): chat-frame the -state turn loop — option A, with -raw…
Snider Jul 2, 2026
7f0021a
chore(deps): bump external/go-inference e4a7175 -> 8e5456b — Generato…
Snider Jul 2, 2026
22ecfd4
refactor(ebook,safetensors,kv): retire the colophon patch + float16 b…
Snider Jul 2, 2026
89b8f5c
fix(gemma4): greedy MTP boundary reforged through the plain decode gr…
Snider Jul 2, 2026
47486f9
style: gofmt two pre-existing strays in gemma4 (chat bench + vision l…
Snider Jul 2, 2026
a145084
docs: post-DRY reality sweep — phantom root APIs corrected, engine/sh…
Snider Jul 2, 2026
7ff2302
feat(generate): wire -think through the -state turn loop — both lanes
Snider Jul 2, 2026
db74d63
chore(deps): bump external/go-inference 8e5456b -> 0a16f1d — the cove…
Snider Jul 2, 2026
571e676
native: advance engine parity and resource paths
Snider Jul 3, 2026
c8c8207
native: support split vision position embeddings
Snider Jul 3, 2026
c05c676
native: add raw vision patch conv path
Snider Jul 3, 2026
73decc7
native: route image chat through raw pixels
Snider Jul 3, 2026
7b7f3a3
native: adapt assistant mtp low-accept path
Snider Jul 3, 2026
95623f7
native: stream assistant low-accept fallback
Snider Jul 3, 2026
05c2e90
docs(native): keep native tracker compact
Snider Jul 3, 2026
1837d31
native: fallback sampled assistant on low accept
Snider Jul 3, 2026
be2d269
refactor(native): EmbedScale joins the declared arch — GenerateBF16 a…
Snider Jul 3, 2026
920d18c
refactor(native): gated_delta_backend.go — the ability names the file…
Snider Jul 3, 2026
871f45d
refactor(native,model): the assistant is an ability, not a gemma4 — e…
Snider Jul 3, 2026
2e36ba5
fix(mtp): draft-query rope at the trained position + quantised assist…
Snider Jul 3, 2026
145d683
fix(native): MTP acceptance 0% → 50% — double-normed draft seed + sam…
Snider Jul 3, 2026
e2c0a09
test(native): quant-lane MTP reproducer — probes convict the quant se…
Snider Jul 3, 2026
e394b13
fix(native): quant-target MTP 0% acceptance — ICB sessions zeroed the…
Snider Jul 3, 2026
8ebf128
style(native): gofmt two straggler test files
Snider Jul 3, 2026
7f591cf
test(metal/gemma4): audit sweep — 15 test-layer defects fixed against…
Snider Jul 3, 2026
2533939
test(model): audit sweep — 40+ tests close the untested-subsystem and…
Snider Jul 3, 2026
442ac99
fix(native): ICB session restore wrote the dormant paged cache — save…
Snider Jul 3, 2026
3c6f672
test(native): audit sweep — hermetic suite de-cgo'd, mixed files spli…
Snider Jul 3, 2026
1b8bd74
test(kv,probe,profile): audit sweep — non-asserting test fixed + two …
Snider Jul 3, 2026
6aec27b
test(adapter): audit sweep — stream short-circuit contract actually p…
Snider Jul 3, 2026
0d9b970
style(probe): gofmt straggler influx_test.go
Snider Jul 3, 2026
aca31ba
fix(gguf): MXFP4/NVFP4 tensor-type IDs off by one — conflated ggml_ty…
Snider Jul 3, 2026
1858633
chore(deps): bump external/go-inference — shared gguf MXFP4/NVFP4 twi…
Snider Jul 3, 2026
721f4ce
style(metal): gofmt straggler cache_compaction.go
Snider Jul 3, 2026
12f84fc
style: gofmt sweep — 19 pre-existing stragglers across pkg/metal and …
Snider Jul 3, 2026
4b91669
test(session): real-model reserialize instrument — capture→restore→ca…
Snider Jul 3, 2026
190e310
style(session): gofmt stragglers coverage_test.go + session_test.go
Snider Jul 3, 2026
504e395
test(mlx,cmd): final audit lane — tautology, two content-blind CLI te…
Snider Jul 3, 2026
f588b3a
style: gofmt stragglers — 9 in the final audit lane + 3 out-of-lane
Snider Jul 3, 2026
be79952
docs: go-mlx → go-inference migration map (audit campaign deliverable)
Snider Jul 3, 2026
2c4474d
perf(native): batched prefill for the E-family — PLE + shared-KV join…
Snider Jul 3, 2026
f14539a
diag(native): step GPU-span, dispatch-count and per-layer span instru…
Snider Jul 3, 2026
d527e2e
perf(native): parallel two-pass paged SDPA — the decode-vs-context co…
Snider Jul 3, 2026
ce603bf
perf(native): 8-simdgroup paged SDPA pass 1 — native decode reaches m…
Snider Jul 3, 2026
55c984a
perf(native): multi-turn appends ride the batched prefill — 6m28s -> …
Snider Jul 3, 2026
3bbd42b
feat(native): MTP batched verify engages PLE archs (E2B/E4B)
Snider Jul 3, 2026
ffad396
perf(native): drop the MTP greedy reforge tax — adopt the verify boun…
Snider Jul 3, 2026
959c002
perf(native): fold the batched pass's MLP across rows — one weight sw…
Snider Jul 3, 2026
8348841
perf(native): fold the batched pass's attention projections across rows
Snider Jul 3, 2026
cf5aa86
perf(native): multi-query causal SDPA — K rows' attention in one disp…
Snider Jul 3, 2026
fc2f629
perf(native): batched-rows QK-norm+rope — K rope dispatches fold into…
Snider Jul 3, 2026
62a4a96
perf(native): rows-batched layer epilogues — the PLE gate chain folds…
Snider Jul 3, 2026
2df8078
perf(native): true GEMM fold — steel_gemm_fused for the large-row pro…
Snider Jul 3, 2026
faf464f
perf(native): deferred-landing ring lane — the staged sliding tail ba…
Snider Jul 3, 2026
3fd659e
perf(native): GPU trace instrument for the batched pass + steel thres…
Snider Jul 3, 2026
3e1768e
perf(native): batched PLE slab builder — 512 per-token CB round-trips…
Snider Jul 3, 2026
3537a6b
perf(native): PLE slab fully on-GPU — gather and relayout kernels
Snider Jul 3, 2026
480ad67
perf(native): steel GEMM threadblock swizzle — mlx's own tile-walk tu…
Snider Jul 3, 2026
8f49316
perf(native): wrap-crossing deferred chunks — the skinny tail chunk dies
Snider Jul 3, 2026
3db41d3
fix(native): per-layer K==V projection — 12B-4bit garbage decode (#254)
Snider Jul 4, 2026
6090dac
test(native): 12B cross-engine instruments — per-step, weight audit, …
Snider Jul 4, 2026
ce3e922
fix(native): global proportional rope — unfold the arch base before t…
Snider Jul 4, 2026
de23fe2
chore(deps): bump external/go-inference to 3d4eb6a — engine-merge Tie…
Snider Jul 4, 2026
7ce076e
refactor(elimination): ten ported folders deleted — go-mlx consumes t…
Snider Jul 4, 2026
491e838
feat(native): pkg/native satisfies the inference KV-state contracts (…
Snider Jul 4, 2026
0c06f46
refactor(elimination): agent consumed from go-inference/state/agent —…
Snider Jul 4, 2026
f2497d9
refactor(session): re-home onto the inference session contract; delet…
Snider Jul 4, 2026
41ca3ee
chore(deps): bump external/go-inference — engine-neutral session cont…
Snider Jul 4, 2026
5970c08
fix(metal): driver cache registrations forward the planner width — pi…
Snider Jul 4, 2026
69726a8
feat(hip): land the rescued go-rocm engine as the pkg/hip quarantine …
Snider Jul 4, 2026
93eca7b
Merge quarantine/hip — go-rocm lands as pkg/hip (Tier 4 quarantine en…
Snider Jul 4, 2026
7bc57af
test(metal): GC guard allow-lists the pkg/hip quarantine benchmark
Snider Jul 4, 2026
83d9898
test(hip): go-inference boundary walk enforces engine imports only
Snider Jul 4, 2026
e73cbf9
reseed(pkg/hip): bring go-rocm@a2f0380 engine + migrate to modern cor…
Snider Jul 4, 2026
a08119e
Merge quarantine/hip-reseed — recovered go-rocm engine reseeds pkg/hi…
Snider Jul 4, 2026
7a526b7
reseed(pkg/hip): wire compat handlers to go-inference provider/{anthr…
Snider Jul 4, 2026
7a0a208
test(hip): bring in 5 excluded linux test-support files + Result-idio…
Snider Jul 4, 2026
b37a304
test(hip): migrate stale tuple-signature call sites to core.Result
Snider Jul 4, 2026
9b09e76
Merge hip/test-core-result — linux test binary compiles on core.Result
Snider Jul 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
15 changes: 2 additions & 13 deletions .codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,11 @@ coverage:
status:
project:
default:
target: 80%
threshold: 1%
target: 98%
threshold: 8%
patch:
default:
target: 70%

ignore:
# Hardware/native runtime paths need a separate Metal-backed integration gate.
- "go/*_darwin.go"
- "go/register_metal.go"
- "go/internal/metal/**"

# Adapter shells and sidecars are tested, but not part of the core library gate.
- "go/training.go"
- "go/mlxlm/**"
- "go/pkg/daemon/**"
- "go/pkg/memvid/cli/**"
- "go/cmd/**"
- "go/tests/**"
12 changes: 0 additions & 12 deletions .forgejo/workflows/security-scan.yml

This file was deleted.

27 changes: 0 additions & 27 deletions .forgejo/workflows/test.yml

This file was deleted.

20 changes: 17 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,18 +1,27 @@
# Build artifacts
build/
bin/
*.dylib
*.so
*.a

# `go build ./go/cmd/mlx/` without -o lands the binary at repo root.
# Convention is `go build -o bin/mlx` (bin/ already ignored above);
# this catches the shortcut form too.
/mlx

# CMake
CMakeCache.txt
CMakeFiles/
cmake_install.cmake
Makefile

# CMake install output (keep headers for Go module consumers)
dist/*
!dist/include/
# CMake install output
dist/

# Local Go build/test shortcuts
/go/mlx
/*.test

# IDE
.idea/
Expand All @@ -22,6 +31,11 @@ dist/*
# macOS
.DS_Store

# lthn/desktop frontend dist — copied at build time by
# scripts/make-app-bundle.sh, embedded in cmd/mlx via go:embed.
# Single source of truth lives in lthn/desktop/frontend/.
go/cmd/mlx/frontend/dist/

# Knowledge base
KB/
.core/
Expand Down
23 changes: 23 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,26 @@
path = external/go-io
url = https://github.com/dappcore/go-io.git
branch = dev
[submodule "external/go-ai"]
path = external/go-ai
url = https://github.com/dappcore/go-ai.git
branch = dev
[submodule "external/go-ml"]
path = external/go-ml
url = https://github.com/dappcore/go-ml.git
branch = dev
[submodule "external/go-cgo"]
path = external/go-cgo
url = https://github.com/dappcore/go-cgo.git
branch = dev
[submodule "external/go-i18n"]
path = external/go-i18n
url = https://github.com/dappcore/go-i18n.git
branch = dev
[submodule "external/go-log"]
path = external/go-log
url = https://github.com/dappcore/go-log.git
[submodule "external/go-store"]
path = external/go-store
url = https://github.com/dappcore/go-store.git
branch = dev
69 changes: 7 additions & 62 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,11 @@
# go-mlx Agent Guide

This repository provides Go bindings, model loaders, and a research-grade
training pipeline for MLX on Apple Silicon. Module: `dappco.re/go/mlx`.

## Layout (post Mantis #1241)

All Go code lives under `go/`:

- `go/` — root package: public model, tokenizer, compute, training, eval,
distill, GRPO, hf-fit, merge, gguf-quantize, kv-snapshot, lora-fuse APIs
- `go/internal/metal/` — CGO boundary to `mlx-c`; do not move CGO code out
- `go/mlxlm/` — subprocess backend for Python `mlx-lm` (CGO-free; build tag
`nomlxlm` removes it)
- `go/cmd/violet/` and `go/pkg/daemon/` — local Violet Unix-socket sidecar
- `cpp/` — C++ side companion (CLion-side worktree)
- `lib/mlx/` — upstream MLX submodule pinned at `v0.30.1`
- `patches/` — local patches against `lib/mlx` (manual apply only)
- `docs/`, `examples/` — markdown documentation and per-feature usage examples

## Platform Boundaries
Rule -2: Do not use `GOAL.md` or tracker files to record progress, savings, benchmark notes, proof, status, or "what got faster". Only remove completed task lines unless the user explicitly asks for tracker-file edits in the current turn.

Files that need the native MLX runtime use `//go:build darwin && arm64`.
Unsupported builds compile against the `*_stub.go` files and a stub
`MetalAvailable() bool` that returns false. Do not move CGO code out of
`go/internal/metal/`.
Rule -1: `GOAL.md` and tracker files are compact task queues, not progress logs. Never read, add, preserve, or update tracking, proof, benchmark, savings, performance, status, changelog, or "what got faster" notes there unless the user explicitly asks for that tracker-file edit in the current turn; report evidence in chat or commits instead.

## Conventions

- UK English in code, comments, and docs (colour, organisation, behaviour)
- SPDX header on every new file: `// SPDX-Licence-Identifier: EUPL-1.2`
- Conventional commits: `type(scope): description` — scopes include `metal`,
`api`, `mlxlm`, `cpp`, `docs`, `repo`, `deps`
- Co-Author trailer: `Co-Authored-By: Virgil <virgil@lethean.io>`
- Use `dappco.re/go` core helpers for fmt, errors, JSON, filesystem, path,
env, byte buffers, and string ops — do not import the wrapped stdlib
packages directly

## Test Patterns

Tests are file-aware. Public functions and methods in `foo.go` have their
Good, Bad, and Ugly triplets in `foo_test.go`, and runnable examples in
`foo_example_test.go`. Native tests skip only when the local machine lacks
the required Metal runtime or test model assets. Keep examples small and
checkable so they document the public API without requiring heavyweight
model downloads.

## Sandboxing Notes

Before handing off, run the repository gates from the brief with `GOWORK=off`.
On sandboxed systems, set `GOCACHE` to a writable directory such as
`/tmp/codex-go-mlx-cache` so Go can compile without touching the user
cache. If the sandbox cannot resolve the bundled `mlx.metallib`, apply
`patches/mlx-metallib-path.patch` inside `lib/mlx` to enable the
`MLX_METALLIB_PATH` env-var override (not auto-applied).
# go-mlx Agent Guide

## What's Inside the Public Surface
Module `dappco.re/go/mlx`; Go lives in `go/`.

Beyond the inference path, the root package owns the research-grade
pipeline: knowledge distillation (`RunKnowledgeDistillation`), GRPO
(`RunGRPOReasoningTraining`), dataset eval (`RunModelEval`), LoRA fusion
(`FuseLoRAIntoModelPack`), model merging (`MergeModelPacks`), native
GGUF quantisation (`QuantizeModelPackToGGUF`), KV snapshots
(`KVSnapshot.Save` / `LoadKVSnapshot`), and HuggingFace Hub metadata
(`HuggingFaceModelSource`). See `docs/` and `examples/` for full
walkthroughs.
- Native: `go/pkg/native` stays CGO-free; CGO stays in `go/internal/metal`; `darwin && arm64`, macOS 26+.
- Test env: repo `go.work`; `GOCACHE=/private/tmp/codex-go-mlx-cache`; `MLX_METALLIB_PATH=/Users/snider/Code/core/go-mlx/dist/lib/mlx.metallib`; `-ldflags "-extldflags=-mmacosx-version-min=26.0"`.
- Style/tests: UK English; EUPL SPDX on new files; use `dappco.re/go` helpers; file-aware tests; native skips only for missing runtime/assets.
68 changes: 50 additions & 18 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,30 +12,61 @@ Implements the `inference.Backend` and `inference.TextModel` interfaces from `da

**darwin/arm64 only.** All CGO files carry `//go:build darwin && arm64`. A stub provides `MetalAvailable() bool` returning false on every other platform. Files that need the native MLX runtime use the build tag; non-Darwin builds compile against `*_stub.go` files.

## Build & Test
## How to use this repo — drive `task`, never hand-roll the build

**This repo is driven by [Taskfile.yml](Taskfile.yml) (`go-task`). Run `task <target>` — do NOT reconstruct the build with bare `go build` / `go test` and manual env exports.** The Taskfile already bakes in everything the build needs:

- `GOCACHE` (default `/private/tmp/codex-go-mlx-cache` — the repo's shared build cache; not a "codex env", just the default)
- `GO_DARWIN_LDFLAGS = -extldflags=-mmacosx-version-min=26.0` (Metal 4 floor — the build fails without it)
- `MLX_METALLIB_PATH = {{.ROOT_DIR}}/dist/lib/mlx.metallib`
- `-tags metal_runtime` on the test path

Hand-exporting those yourself is how you get a subtly-wrong build and waste a session. `task fmt; go-task --list` shows every target. The Go module lives in `go/`; the Taskfile `dir: go` handles the `cd` for you.

```bash
# Build mlx-c C library (required on fresh checkout, ~2min on M3 Ultra)
git submodule update --init --recursive
go generate ./...
# Build the binary (self-contained — embeds the gzipped metallib). Output: bin/lthn-mlx
task build:lthn

# Run all tests
go test ./...
# Compile the native engine's OWN fused Metal kernels (router-topk, q4 lm-head argmax, bf16)
# -> dist/lib/lthn_kernels.metallib, loaded beside mlx.metallib. Build after editing pkg/native/kernels/*.metal.
task build:kernels

# Run a single test
go test -run TestRMSNorm_Good ./go/internal/metal/
# Run the whole Go suite (metal_runtime tag + ldflags, on the GPU)
task test

# Run benchmarks
go test -bench=. -benchtime=2s ./go/internal/metal/
# fmt + vet + test
task qa

# Lint
golangci-lint run ./...
# Real coverage figure (metal_runtime + model_eval tags) -> /tmp/go-mlx-coverage.out
task cov

# C++ side (standalone lib/mlx build + the kernel-bridge tests). First run cold-builds MLX ~15 min.
task test:cpp # vendored MLX suite
task test:cpp:kernels # go-mlx's own Metal-kernel bridges

# Clean
task clean
```

**Fresh checkout:** `git submodule update --init --recursive`, then build the MLX C library + metallib (`go generate ./...` from `go/`, or the CMake path in `docs/build.md`) so `dist/lib/mlx.metallib` exists before `task build:lthn` can embed it. `dist/lib/` is gitignored (rebuilt per checkout); `dist/include/` headers are committed for module consumers.

### The two engines, and trying them

| Engine | Path | How to run |
|--------|------|-----------|
| **cgo metal** (mature: MTP, paged KV, cache modes) | `go/internal/metal/` | `./bin/lthn-mlx generate <model-dir>` / `serve` |
| **no-cgo native** (`pkg/native` + `pkg/model` — the contract engine, current focus) | `go/pkg/native/` | add `-native`: `./bin/lthn-mlx generate -native …` |

```bash
# Try the native engine end-to-end (greedy, deterministic). Model = a gemma4 HF/MLX snapshot dir.
./bin/lthn-mlx generate -native -temp 0 -max-tokens 64 -prompt "Explain RoPE in one sentence." \
~/.cache/huggingface/hub/models--mlx-community--gemma-4-e2b-it-4bit/snapshots/*/

# Clean rebuild (if dist/ is stale)
rm -rf build dist && go generate ./...
# Serve it over the OpenAI/Anthropic/Ollama API, then curl /v1/chat/completions
./bin/lthn-mlx serve -native <model-dir> # default :11434
```

The compiled libraries (`dist/lib/`) are gitignored and must be rebuilt on each fresh checkout. Headers in `dist/include/` are committed for Go module consumers. On sandboxed systems, set `GOCACHE` to a writable directory such as `/tmp/codex-go-mlx-cache`.
`generate -native` does NOT yet support `-trace` or `-state` (cgo-engine only); those exit 2. A run-quick smoke of the served engine also exists as the `lethean-lem` skill (`lem.sh smoke e2b`), but for *driving the repo itself* prefer `task` + the binary directly.

## Repository Layout

Expand All @@ -44,17 +75,18 @@ After Mantis #1241, all Go code lives under `go/`:
```
go/ Go module root (dappco.re/go/mlx)
*.go Public root API: model, tokenizer, compute, training, eval, distill, GRPO, hf-fit, merge, gguf-quantize, kv-snapshot, lora-fuse
cmd/mlx/ CLI tool (built with `-o core-mlx`; consumers rename: lthn-mlx)
cmd/violet/ Unix-socket sidecar daemon
internal/metal/ All CGO code (mlx-c bindings)
mlxlm/ CGO-free Python subprocess backend
pkg/daemon/ Daemon implementation
pkg/memvid/ Memvid storage CLI
pkg/memvid/ Deprecated State codec compatibility shim
tests/ Integration tests
cpp/ C++ side (CLion-side companion)
docs/ Markdown documentation
examples/ Per-feature usage examples (markdown)
external/ Vendored core libraries
lib/mlx/ Upstream mlx submodule (pinned at v0.30.1)
lib/mlx/ Upstream mlx submodule (pinned at v0.31.1)
patches/ Local patches to lib/mlx (not auto-applied)
```

Expand Down Expand Up @@ -127,7 +159,7 @@ Architecture is detected from `config.json` (`model_type`) for safetensors and f

## Submodule Patches

`lib/mlx` is pinned at upstream tag `v0.30.1`. Local patches that we do not upstream live in `patches/` as standalone diff files (e.g. `patches/mlx-metallib-path.patch` for the `MLX_METALLIB_PATH` env-var override). Patches are not auto-applied — run them inside the submodule manually when their function is needed:
`lib/mlx` is pinned at upstream tag `v0.31.1`. Local patches that we do not upstream live in `patches/` as standalone diff files (e.g. `patches/mlx-metallib-path.patch` for the `MLX_METALLIB_PATH` env-var override). Patches are not auto-applied — run them inside the submodule manually when their function is needed:

```bash
git -C lib/mlx apply ../../patches/mlx-metallib-path.patch
Expand Down
Loading