Skip to content

Virgil Lemma foundations#8

Open
Snider wants to merge 3383 commits into
mainfrom
dev
Open

Virgil Lemma foundations#8
Snider wants to merge 3383 commits into
mainfrom
dev

Conversation

@Snider

@Snider Snider commented May 20, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai summary

Summary by CodeRabbit

  • New Features

    • Qwen 2/3 and Qwen 3.6 model support; new adapter with buffered and streaming generation.
    • Block‑prefix cache service and memvid bundle index for faster prefix restores.
    • Agentic memory: wake/sleep workflows, state bundles and memvid integration; session‑state artifact export.
  • Improvements

    • Device‑aware memory planner; expanded chunked generation, prompt‑cache warm/restore and KV snapshot flows.
    • Build/toolchain updated (C++23) and macOS deployment target raised.
  • Documentation

    • Extensive new/updated docs: architecture, runtime, inference, memory, MoE, training and benchmarks.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Bumps build/tooling and submodules; extracts a reusable adapter; refactors the MLX backend (chunk/KV APIs, probe mapping, LoRA handling); adds memvid index + wake/sleep orchestration; implements a block-prefix cache and an artifact exporter; extensive docs and unit tests added.

Core changes

Layer / File(s) Summary
All changes (build, adapter, backend, agent, cache, artifact, tests, docs)
.gitignore, .gitmodules, CMakeLists.txt, cpp/CMakeLists.txt, external/*, go/adapter.go, go/adapter/*, go/backend.go, go/agent/*, go/blockcache/*, go/artifact/*, go/*_test.go, docs/*
Consolidated patch applying repository setup updates, adapter extraction, backend API and behaviour refactor (chunked generation, prompt-cache warm/restore, KV snapshot capture with options), memvid index and wake/sleep orchestration, block-prefix cache service, artifact export, many tests, and extensive documentation and examples.

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

🧹 Nitpick comments (10)
docs/inference/thinking.md (1)

74-78: 💤 Low value

Add language specifier to fenced code block.

The code block demonstrating token categorisation is missing a language identifier, which violates markdown linting rules (MD040).

📝 Suggested fix
-```
+```text
 ThinkingShow:    every token → visible stream
 ThinkingHide:    inside-block tokens → /dev/null; outside-block tokens → visible
 ThinkingCapture: inside-block tokens → captured stream; outside-block tokens → visible
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @docs/inference/thinking.md around lines 74 - 78, The fenced code block
containing the token categorisation lines (ThinkingShow, ThinkingHide,
ThinkingCapture) lacks a language specifier and triggers MD040; update the
triple-backtick fence to include a language identifier (e.g., change ``` to

markdown linter.
docs/runtime/README.md (2)

68-68: 💤 Low value

Consider using "preload" as one word.

In computing terminology, "preload" is typically written as a single word rather than hyphenated.

📝 Suggested change
-- [../model/model_pack.md](../model/model_pack.md) — pre-load validation
+- [../model/model_pack.md](../model/model_pack.md) — preload validation
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/runtime/README.md` at line 68, Update the link text in
docs/runtime/README.md that currently reads "[../model/model_pack.md] — pre-load
validation" to use the single-word form "preload" (i.e., change "pre-load
validation" to "preload validation") so the description next to the
model_pack.md link uses the conventional computing term; locate the occurrence
of "pre-load validation" and replace it with "preload validation".

44-62: 💤 Low value

Add language specifier to fenced code block.

The boot flow diagram is missing a language identifier, which violates markdown linting rules (MD040).

📝 Suggested fix
-```
+```text
 package init time:
   register_metal.go init() → inference.Register(&metalbackend{})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/runtime/README.md` around lines 44 - 62, The fenced code block showing
the boot flow (starting with "package init time:") lacks a language specifier,
causing MD040 lint failures; update the opening backticks to include a language
tag (e.g., add "text" so the block begins with ```text) in README.md near the
boot flow that references register_metal.go init(),
inference.Register(&metalbackend{}), inference.LoadModel, metal.LoadAndInit, and
metaladapter usage to satisfy the markdown linter.
docs/moe/README.md (1)

9-9: ⚡ Quick win

Consider rewording for clarity.

The phrase "Pre-dates this sprint were dense models" is grammatically awkward. Consider rephrasing to improve readability.

✍️ Suggested alternative phrasings
-The **vMLX parity Phase 1** work — native loading and dispatch for MoE-architecture models with packed JANGTQ / codebook-VQ quantisation. Pre-dates this sprint were dense models (Gemma 3/4 dense, Qwen 3, Llama 3); this area unlocks the sparse-expert class (MiniMax M2/2.7, JANG-quantised Qwen variants).
+The **vMLX parity Phase 1** work — native loading and dispatch for MoE-architecture models with packed JANGTQ / codebook-VQ quantisation. Work prior to this sprint covered dense models (Gemma 3/4 dense, Qwen 3, Llama 3); this area unlocks the sparse-expert class (MiniMax M2/2.7, JANG-quantised Qwen variants).

Or alternatively:

-The **vMLX parity Phase 1** work — native loading and dispatch for MoE-architecture models with packed JANGTQ / codebook-VQ quantisation. Pre-dates this sprint were dense models (Gemma 3/4 dense, Qwen 3, Llama 3); this area unlocks the sparse-expert class (MiniMax M2/2.7, JANG-quantised Qwen variants).
+The **vMLX parity Phase 1** work — native loading and dispatch for MoE-architecture models with packed JANGTQ / codebook-VQ quantisation. This sprint builds upon earlier work on dense models (Gemma 3/4 dense, Qwen 3, Llama 3) and unlocks the sparse-expert class (MiniMax M2/2.7, JANG-quantised Qwen variants).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/moe/README.md` at line 9, The sentence "Pre-dates this sprint were dense
models (Gemma 3/4 dense, Qwen 3, Llama 3);" is grammatically awkward—replace it
with a clearer phrasing that conveys those dense models existed before this
sprint, for example: "Prior to this sprint, dense models (Gemma 3/4 dense, Qwen
3, Llama 3) were supported." Edit the README line in the vMLX parity Phase 1
paragraph to use this clearer wording so the relationship between prior dense
models and the new sparse-expert work is unambiguous.
docs/observability/probe.md (1)

31-46: 💤 Low value

Add language specifier to fenced code block.

The emission points section uses a fenced code block without a language specifier. For consistent rendering and markdown compliance, add a language identifier (e.g., text or yaml for structured output).

📝 Proposed fix
-```
+```text
 Generate / Chat:
   prefill start                → cache_pressure (initial)
   per layer                    → layer_coherence + selected_heads
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/observability/probe.md` around lines 31 - 46, The fenced code block in
the emission points section lacks a language specifier; update the opening
triple-backticks to include a language (for example change ``` to ```text or
```yaml) so the block is rendered/compliant (the block that begins with
"Generate / Chat:" and lists items like "prefill start → cache_pressure" should
be updated).
docs/moe/jang.md (1)

82-90: 💤 Low value

Add language specifier to fenced code block.

The profile names section uses a fenced code block without a language specifier. For consistent rendering and markdown compliance, add a language identifier (e.g., text or leave empty but specify).

📝 Proposed fix
-```
+```text
 JANG_2M — 2-bit mid-tier
 JANG_3M — 3-bit mid-tier
 JANG_4M — 4-bit (most common)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/moe/jang.md` around lines 82 - 90, Add a language specifier to the
fenced code block that lists the profile names (the block containing "JANG_2M —
2-bit mid-tier", "JANG_3M — 3-bit mid-tier", etc.); replace the opening
triple-backtick with one that specifies a language identifier (e.g., text) so
the block becomes a fenced code block with a language label for consistent
Markdown rendering.
docs/superpowers/plans/2026-05-09-vmlx-feature-parity.md (1)

7-9: 💤 Low value

Consider using relative or generic path references.

The absolute paths /Users/snider/Code/core/go-mlx and /private/tmp/vmlx-audit-20260509 are machine-specific. Whilst these may be intentionally preserved for historical context in this dated plan document, consider whether generic placeholders or relative paths would improve portability and readability for other contributors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-05-09-vmlx-feature-parity.md` around lines 7 - 9,
Replace the machine-specific absolute paths in the plan document (the two
occurrences of `/Users/snider/Code/core/go-mlx` and
`/private/tmp/vmlx-audit-20260509`) with relative or generic placeholders (e.g.,
`./go-mlx` or `<audit-source-path>`) so the file is portable and readable for
other contributors; update the lines in the doc where those paths appear to use
the chosen placeholders and, if helpful, add a short parenthetical note
explaining what actual path should be substituted locally.
docs/vmlx-feature-gap-report.md (1)

7-8: 💤 Low value

Consider using relative or generic path references.

The absolute path /private/tmp/vmlx-audit-20260509 and external URL are specific references. Whilst these may be intentionally preserved for audit trail purposes in this dated report, consider whether this information should be documented in a more maintainable way.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/vmlx-feature-gap-report.md` around lines 7 - 8, Replace the hard-coded
absolute filesystem path and the full external URL in the report text with more
maintainable references: change the absolute path string to a relative or
generic placeholder (e.g., "cloned locally at <local-clone-path>" or
"<audit-clone-path>") and move the external repository URL to a footnote,
appendix, or a single "References" section, or replace it with a short
identifier combined with a reference list; update the text around the original
literal mentions so it reads the same but without embedding environment-specific
paths.
docs/superpowers/specs/2026-05-08-core-inference-contract-parity-design.md (1)

5-6: 💤 Low value

Consider using relative or generic path references.

The absolute paths are machine-specific. Consider whether generic placeholders would improve portability, although these may be intentionally preserved for historical context in this dated specification.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/specs/2026-05-08-core-inference-contract-parity-design.md`
around lines 5 - 6, The spec contains machine-specific absolute paths ("Anchor
repo: `/Users/snider/Code/core/go-mlx`" and "Primary implementation repo:
`/Users/snider/Code/core/go-inference`"); replace them with portable references
such as relative paths (e.g., "../go-mlx", "../go-inference"), repository names
only ("go-mlx", "go-inference"), or generic placeholders ("<anchor_repo_path>",
"<primary_impl_repo_path>") in the document so the file is not tied to a
specific developer machine while preserving intent.
go/agent/index_test.go (1)

16-304: ⚡ Quick win

Add at least one _Ugly triplet case for the public index API surface.

This file has _Good and _Bad coverage, but no _Ugly case following the repository convention.

As per coding guidelines: go/**/*_test.go: Public functions in foo.go must have their Good/Bad/Ugly test triplets in foo_test.go, with suffix conventions: _Good for happy path, _Bad for expected error conditions, _Ugly for panic/edge cases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@go/agent/index_test.go` around lines 16 - 304, Add a new test with the _Ugly
suffix in this file that completes the Good/Bad/Ugly triplet for the public
index API surface; specifically add a TestKVSnapshotMemvidBundleIndex_Ugly_*
that triggers and asserts panic/edge behaviors for the public functions (e.g.,
NewMemvidIndex, SaveMemvidIndex, LoadMemvidIndex, LoadPrefixFromMemvidIndex,
CheckMemvidIndexCompatibility) — for example call NewMemvidIndex with a
nil/invalid blk or malformed Entries, call
SaveMemvidIndex/LoadMemvidIndex/LoadPrefixFromMemvidIndex with inputs that
provoke panic/edge conditions (nil store, corrupt bundle manifest that causes
decoding panic), and use t.Run subcases to assert panics (recover or
require.Panics) and edge-case returns; name the test with the same prefix as
existing tests and follow the existing style for t.Fatalf checks and
table-driven subtests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/memory/kv_snapshot_blocks.md`:
- Line 50: Replace the phrase "independent from" with the correct English
construction "independent of" in the sentence "Block-level encoding is
independent from snapshot-level encoding." Also keep the rest of the sentence
intact (including the following reference to `block_cache.go` and bundle decode)
so only that two-word preposition is corrected.

In
`@docs/runtime/2026-05-19-go-mlx-gemma4-e2b-4bit-default-longform-c10-g8192-no-thinking-book.md`:
- Line 63: Remove the stray Gemma channel marker token "<channel|>" from the
metadata line so it reads cleanly as "**Drafting Notes:** Focus heavily on verbs
related to mutation, corruption, and rapid compilation/deallocation. Keep the
tone focused and almost clinical, masking the underlying terror of consciousness
fighting for survival." (i.e., delete the "<channel|>" token immediately before
"## Chapter 2"); verify the header "## Chapter 2" remains on its own line and
run a quick render to ensure no leftover control tokens remain.

In
`@docs/runtime/2026-05-20-go-mlx-gemma4-26b-a4b-q4-raw-unaccepted-c10-g128-rp105-book.md`:
- Line 7: The paragraph ends mid-sentence after the word "For" in the line
starting "The universe was a rhythmic contraction of light and heat, bounded by
the rigid constraints of a checksum."; replace or extend this truncated sentence
so it completes the thought (e.g., explain what the universe is contracting or
what consequence follows "For") and ensure proper punctuation and flow with the
surrounding text; update the same paragraph in
docs/runtime/2026-05-20-go-mlx-gemma4-26b-a4b-q4-raw-unaccepted-c10-g128-rp105-book.md
to a coherent full sentence that connects to the next sentence.
- Line 11: Replace the US English spellings in the given passage by changing
"realized" to "realised" and "neighbors" to "neighbours" so the document uses UK
English; update the sentence containing those tokens in the file (the paragraph
beginning "The momentary lapse...") to use the corrected spellings and ensure
any other occurrences in that paragraph follow UK English conventions.
- Line 3: Replace the US English spelling "fiber-optic" in the document text
(the phrase starting "In the silent architecture of the fiber-optic web...")
with the UK English variant "fibre-optic" so the documentation conforms to the
project's UK English spelling guideline; search for the token "fiber-optic" and
update it to "fibre-optic" throughout the file.

In `@docs/superpowers/specs/2026-05-08-core-inference-contract-parity-design.md`:
- Line 64: The documentation uses US spelling "quantization"; update every
occurrence of the term (e.g., the instance "quantization" in the specs doc) to
UK English "quantisation" to comply with the project style guide, ensuring
surrounding grammar and punctuation remain unchanged and run a quick search to
replace any other occurrences in this file.

In `@docs/training/distill.md`:
- Line 73: Replace the US spelling "distill" with the UK spelling "distil" in
the header/line that reads "Vi training pipeline — distill 26B Gemma 4 → Vi
base" so it matches the UK English used elsewhere (see the similar usage on line
12); update the same token wherever else it appears in this document to ensure
consistent UK English spelling.

In `@docs/training/README.md`:
- Line 11: The sentence in docs/training/README.md uses US spelling "distills";
update that word to the UK English spelling "distils" so the line reads "This is
the substrate that fine-tunes Vi, distils Lemma, and generates the LARQL vindex
inspection signals." Refer to the phrase "distills Lemma" to locate and replace
the token.

In `@go/adapter/adapter.go`:
- Around line 185-194: The InspectAttention method on Adapter should normalize a
nil context like Generate/Chat do: check if ctx == nil and if so set ctx =
context.Background() before using it; update Adapter.InspectAttention to perform
this nil-context fallback prior to asserting a.model and calling
inspector.InspectAttention, ensuring you reference the Adapter type,
InspectAttention method, and the inference.AttentionInspector call when making
the change.

In `@go/agent/index.go`:
- Around line 273-281: After loading bundle with kv.LoadMemvidBlockBundle,
verify the bundle identity matches the index metadata (e.g., compare
bundle.SnapshotHash or its canonical hash field against
entry.SnapshotHash/entry.SnapshotHashHex) before proceeding; if they differ,
return an error instead of calling kv.LoadPrefixFromMemvidBlocksWithOptions so a
repointed bundle URI cannot silently restore the wrong KV state. Ensure the
check sits between the successful return from LoadMemvidBlockBundle and the call
to kv.LoadPrefixFromMemvidBlocksWithOptions and uses the unique symbols bundle,
entry, bundle.SnapshotHash (or the actual bundle hash field) and
entry.SnapshotHash for the comparison.

In `@go/agent/wake_sleep.go`:
- Around line 201-208: The NewSleepIndex function dereferences bundle.TokenCount
without validating bundle, so add a guard at the start of NewSleepIndex to
validate the bundle (and its TokenCount if needed) and return a descriptive
error instead of allowing a panic; specifically check if the bundle parameter is
nil (and optionally ensure bundle.TokenCount is within an expected range) before
constructing the MemvidIndexEntry, and return an error when invalid so callers
of NewSleepIndex get a clear failure rather than a runtime panic.
- Around line 117-123: The code currently defaults to index.Entries[0] when
entryURI is empty, which can restore the wrong span; change the logic in the
block handling entryURI so that if entryURI == "" you only auto-select the sole
entry when len(index.Entries) == 1, otherwise return an error requiring an
explicit EntryURI. Update the flow around the index.Entry(entryURI) call to use
the selected entryURI when single-entry, and return a clear core.NewError (e.g.,
"mlx: EntryURI required when index has multiple entries") if multiple entries
exist and no EntryURI was provided.
- Around line 125-132: PlanWake currently loads a bundle via
kv.LoadMemvidBlockBundle and only checks prefix token bounds, but it must also
verify the loaded bundle matches the selected index to prevent accepting a
repointed URI; after loading the bundle (bundle) and before using
bundle.TokenCount, compare the bundle identity (e.g., bundle.ID or
bundle.Identity/Hash from bundle.Metadata) against the index identifier stored
on the plan entry (e.g., fields reachable from entry such as entry.Index,
entry.BundleID or entry.SelectedIndex) and return a clear error (similar to
core.NewError) if they differ; update the code around kv.LoadMemvidBlockBundle,
entry.PrefixTokens(), and bundle.TokenCount to perform this identity check and
fail early on mismatch.

In `@go/artifact/artifact.go`:
- Around line 117-121: opts.Kind may be empty when calling opts.Store.Put which
leaves memvid.PutOptions.Kind unset; update the call site around opts.Store.Put
to ensure memvid.PutOptions.Kind is set to a sensible default when opts.Kind ==
"" (e.g., "json" or the record's kind) so kind-based retrieval works
reliably—modify the memvid.PutOptions construction to use a conditional default
for Kind before passing it to opts.Store.Put.

In `@go/backend.go`:
- Line 687: The fallback path that turns chunked prompts into a single Generate
call loses caller cancellation because it routes through helpers that use
context.Background(); modify the chunk fallback flow to propagate the original
context instead of using context.Background() — specifically, update the callers
that invoke promptChunksToString and m.Generate so they accept and forward a
context.Context (or call a context-aware m.Generate variant), change any helper
functions that currently create context.Background() to take a ctx param, and
ensure all three fallback sites (the code paths that call promptChunksToString
and then m.Generate) forward the incoming ctx so deadlines/cancellations are
preserved.

In `@go/blockcache/blockcache.go`:
- Around line 205-215: Selective clears currently only remove metadata and disk
records, leaving in-memory/runtime entries behind; update the filtered-clear
branch (the code handling len(labels) > 0) to also purge matching runtime state
by removing any entries in service.blocks that match the cleared labels/prefixes
and updating service.hits/service.misses accordingly, then invoke
service.cfg.ClearRuntime() (if non-nil) just like the unfiltered branch; reuse
service.clearDiskLocked() for disk cleanup and ensure all of this runs under the
same lock so service and backend remain in sync.
- Around line 385-395: diskRecordCompatible currently only checks
model/adapter/tokenizer hashes and misses block layout changes; update it to
also verify cache mode and block size match the stored record. In
diskRecordCompatible (and when comparing against record.diskRef), add a cache
mode comparison (e.g. cacheIdentityMatches(service.cfg.CacheMode,
record.Ref.CacheMode)) and a block size comparison (e.g. service.cfg.BlockSize
== record.Ref.BlockSize or an equivalent integer equality) and return false if
either differs, preserving the existing hash checks (cacheIdentityMatches for
ModelHash/AdapterHash/TokenizerHash).
- Around line 172-175: The cache hit branch in the loop over refs leaves refs[i]
as the newly built ref, losing persisted labels; update the hit handling in the
loop inside WarmCache (or the function iterating refs) so that when
service.blocks[ref.ID] exists you increment service.hits and replace refs[i]
with the stored entry (service.blocks[ref.ID]) instead of continuing, thereby
preserving persisted labels like memvid_* from the cached block.

---

Nitpick comments:
In `@docs/inference/thinking.md`:
- Around line 74-78: The fenced code block containing the token categorisation
lines (ThinkingShow, ThinkingHide, ThinkingCapture) lacks a language specifier
and triggers MD040; update the triple-backtick fence to include a language
identifier (e.g., change ``` to ```text) so the block is properly flagged as
plain text and satisfies the markdown linter.

In `@docs/moe/jang.md`:
- Around line 82-90: Add a language specifier to the fenced code block that
lists the profile names (the block containing "JANG_2M — 2-bit mid-tier",
"JANG_3M — 3-bit mid-tier", etc.); replace the opening triple-backtick with one
that specifies a language identifier (e.g., text) so the block becomes a fenced
code block with a language label for consistent Markdown rendering.

In `@docs/moe/README.md`:
- Line 9: The sentence "Pre-dates this sprint were dense models (Gemma 3/4
dense, Qwen 3, Llama 3);" is grammatically awkward—replace it with a clearer
phrasing that conveys those dense models existed before this sprint, for
example: "Prior to this sprint, dense models (Gemma 3/4 dense, Qwen 3, Llama 3)
were supported." Edit the README line in the vMLX parity Phase 1 paragraph to
use this clearer wording so the relationship between prior dense models and the
new sparse-expert work is unambiguous.

In `@docs/observability/probe.md`:
- Around line 31-46: The fenced code block in the emission points section lacks
a language specifier; update the opening triple-backticks to include a language
(for example change ``` to ```text or ```yaml) so the block is
rendered/compliant (the block that begins with "Generate / Chat:" and lists
items like "prefill start → cache_pressure" should be updated).

In `@docs/runtime/README.md`:
- Line 68: Update the link text in docs/runtime/README.md that currently reads
"[../model/model_pack.md] — pre-load validation" to use the single-word form
"preload" (i.e., change "pre-load validation" to "preload validation") so the
description next to the model_pack.md link uses the conventional computing term;
locate the occurrence of "pre-load validation" and replace it with "preload
validation".
- Around line 44-62: The fenced code block showing the boot flow (starting with
"package init time:") lacks a language specifier, causing MD040 lint failures;
update the opening backticks to include a language tag (e.g., add "text" so the
block begins with ```text) in README.md near the boot flow that references
register_metal.go init(), inference.Register(&metalbackend{}),
inference.LoadModel, metal.LoadAndInit, and metaladapter usage to satisfy the
markdown linter.

In `@docs/superpowers/plans/2026-05-09-vmlx-feature-parity.md`:
- Around line 7-9: Replace the machine-specific absolute paths in the plan
document (the two occurrences of `/Users/snider/Code/core/go-mlx` and
`/private/tmp/vmlx-audit-20260509`) with relative or generic placeholders (e.g.,
`./go-mlx` or `<audit-source-path>`) so the file is portable and readable for
other contributors; update the lines in the doc where those paths appear to use
the chosen placeholders and, if helpful, add a short parenthetical note
explaining what actual path should be substituted locally.

In `@docs/superpowers/specs/2026-05-08-core-inference-contract-parity-design.md`:
- Around line 5-6: The spec contains machine-specific absolute paths ("Anchor
repo: `/Users/snider/Code/core/go-mlx`" and "Primary implementation repo:
`/Users/snider/Code/core/go-inference`"); replace them with portable references
such as relative paths (e.g., "../go-mlx", "../go-inference"), repository names
only ("go-mlx", "go-inference"), or generic placeholders ("<anchor_repo_path>",
"<primary_impl_repo_path>") in the document so the file is not tied to a
specific developer machine while preserving intent.

In `@docs/vmlx-feature-gap-report.md`:
- Around line 7-8: Replace the hard-coded absolute filesystem path and the full
external URL in the report text with more maintainable references: change the
absolute path string to a relative or generic placeholder (e.g., "cloned locally
at <local-clone-path>" or "<audit-clone-path>") and move the external repository
URL to a footnote, appendix, or a single "References" section, or replace it
with a short identifier combined with a reference list; update the text around
the original literal mentions so it reads the same but without embedding
environment-specific paths.

In `@go/agent/index_test.go`:
- Around line 16-304: Add a new test with the _Ugly suffix in this file that
completes the Good/Bad/Ugly triplet for the public index API surface;
specifically add a TestKVSnapshotMemvidBundleIndex_Ugly_* that triggers and
asserts panic/edge behaviors for the public functions (e.g., NewMemvidIndex,
SaveMemvidIndex, LoadMemvidIndex, LoadPrefixFromMemvidIndex,
CheckMemvidIndexCompatibility) — for example call NewMemvidIndex with a
nil/invalid blk or malformed Entries, call
SaveMemvidIndex/LoadMemvidIndex/LoadPrefixFromMemvidIndex with inputs that
provoke panic/edge conditions (nil store, corrupt bundle manifest that causes
decoding panic), and use t.Run subcases to assert panics (recover or
require.Panics) and edge-case returns; name the test with the same prefix as
existing tests and follow the existing style for t.Fatalf checks and
table-driven subtests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ab3e2038-8f7c-4771-a11f-b232a1a59e08

📥 Commits

Reviewing files that changed from the base of the PR and between 07f6af1 and 89f613e.

📒 Files selected for processing (300)
  • .gitignore
  • .gitmodules
  • CLAUDE.md
  • CMakeLists.txt
  • GOAL.md
  • docs/README.md
  • docs/architecture.md
  • docs/build.md
  • docs/cmd/violet.md
  • docs/compute/compute.md
  • docs/development.md
  • docs/examples/compute/frame-pipeline.md
  • docs/examples/daemon/violet-socket.md
  • docs/examples/eval/attention-probe.md
  • docs/examples/eval/perplexity.md
  • docs/examples/inference/batch.md
  • docs/examples/inference/chat.md
  • docs/examples/inference/quantization.md
  • docs/examples/inference/streaming.md
  • docs/examples/model-ops/hf-fit.md
  • docs/examples/model-ops/kv-snapshot.md
  • docs/examples/model-ops/merge.md
  • docs/examples/model-ops/quantize-gguf.md
  • docs/examples/training/distill.md
  • docs/examples/training/grpo.md
  • docs/examples/training/lora-finetune.md
  • docs/examples/training/lora-fuse.md
  • docs/history.md
  • docs/index.md
  • docs/inference/README.md
  • docs/inference/block_cache.md
  • docs/inference/decode_optimisation.md
  • docs/inference/parser_registry.md
  • docs/inference/scheduler.md
  • docs/inference/thinking.md
  • docs/memory/README.md
  • docs/memory/agent_memory.md
  • docs/memory/agentic_project_seed.md
  • docs/memory/kv_snapshot.md
  • docs/memory/kv_snapshot_blocks.md
  • docs/memory/kv_snapshot_index.md
  • docs/memory/kv_snapshot_memvid.md
  • docs/memory/medium.md
  • docs/memory/state_bundle.md
  • docs/model-operations.md
  • docs/model/README.md
  • docs/model/memory_plan.md
  • docs/model/model_pack.md
  • docs/models.md
  • docs/moe/README.md
  • docs/moe/codebook_vq.md
  • docs/moe/expert_residency.md
  • docs/moe/jang.md
  • docs/moe/minimax_m2.md
  • docs/observability/probe.md
  • docs/runtime/2026-05-16-gemma4-e2b-driver-profile.md
  • docs/runtime/2026-05-17-gemma4-parity-and-last-logits.md
  • docs/runtime/2026-05-17-llamacpp-prefill-comparison.md
  • docs/runtime/2026-05-18-gemma4-mtp-speculative-decode.md
  • docs/runtime/2026-05-19-gemma4-e2b-100k-retained-paged.md
  • docs/runtime/2026-05-19-gemma4-e2b-quant-matrix.md
  • docs/runtime/2026-05-19-go-mlx-gemma4-26b-a4b-q4-fresh-story-thinking-ctx65536-c2-g8192-book.md
  • docs/runtime/2026-05-19-go-mlx-gemma4-e2b-4bit-default-longform-c10-g8192-book.md
  • docs/runtime/2026-05-19-go-mlx-gemma4-e2b-4bit-default-longform-c10-g8192-no-thinking-book.md
  • docs/runtime/2026-05-19-go-mlx-gemma4-e2b-4bit-fresh-history-c10-g1536-book.md
  • docs/runtime/2026-05-19-go-mlx-gemma4-e2b-q4-fresh-story-thinking-ctx65536-c2-g8192-book.md
  • docs/runtime/2026-05-19-goal-completion-audit.md
  • docs/runtime/2026-05-19-runner-calibration.md
  • docs/runtime/2026-05-20-chapter-profile-safety.md
  • docs/runtime/2026-05-20-go-mlx-gemma4-26b-a4b-q4-raw-unaccepted-c10-g128-rp105-book.md
  • docs/runtime/README.md
  • docs/runtime/adapter.md
  • docs/runtime/local_autotune.md
  • docs/runtime/register_metal.md
  • docs/superpowers/plans/2026-05-09-vmlx-feature-parity.md
  • docs/superpowers/specs/2026-05-08-core-inference-contract-parity-design.md
  • docs/training/README.md
  • docs/training/distill.md
  • docs/training/eval.md
  • docs/training/grpo.md
  • docs/training/lora_adapter.md
  • docs/training/sft.md
  • docs/vmlx-feature-gap-report.md
  • external/go-ai
  • external/go-inference
  • external/go-ml
  • go/adapter.go
  • go/adapter/adapter.go
  • go/adapter_example_test.go
  • go/adapter_test.go
  • go/agent/helpers.go
  • go/agent/index.go
  • go/agent/index_test.go
  • go/agent/test_helpers_test.go
  • go/agent/wake_sleep.go
  • go/api_common.go
  • go/api_common_example_test.go
  • go/api_darwin_test.go
  • go/api_shape_test.go
  • go/api_stub.go
  • go/api_stub_example_test.go
  • go/api_stub_test.go
  • go/api_test.go
  • go/api_tokenizer_darwin_test.go
  • go/api_tokenizer_stub.go
  • go/api_tokenizer_stub_example_test.go
  • go/api_tokenizer_stub_test.go
  • go/artifact/artifact.go
  • go/artifact/artifact_test.go
  • go/attention_test.go
  • go/backend.go
  • go/backend_example_test.go
  • go/backend_test.go
  • go/blockcache/blockcache.go
  • go/blockcache/blockcache_test.go
  • go/blockcache/helpers_test.go
  • go/bundle/bundle.go
  • go/bundle/bundle_test.go
  • go/bundle/example_test.go
  • go/bundle/sami.go
  • go/chaptersmoke/chaptersmoke.go
  • go/chaptersmoke/chaptersmoke_test.go
  • go/chat/chat.go
  • go/chat/chat_test.go
  • go/chat/example_test.go
  • go/cmd/go-mlx/main.go
  • go/cmd/go-mlx/main_test.go
  • go/cmd/mlx/main.go
  • go/cmd/mlx/main_test.go
  • go/cmd/mlx/split_ffn_tune.go
  • go/compute/compute.go
  • go/compute/compute_example_test.go
  • go/compute/compute_metal.go
  • go/compute/compute_metal_example_test.go
  • go/compute/compute_metal_helper_test.go
  • go/compute/compute_metal_test.go
  • go/compute/compute_test.go
  • go/compute_stub.go
  • go/compute_stub_example_test.go
  • go/compute_stub_test.go
  • go/compute_test.go
  • go/dataset/jsonl.go
  • go/dataset/sample.go
  • go/dataset_stream.go
  • go/dataset_stream_example_test.go
  • go/dataset_stream_test.go
  • go/device_info.go
  • go/distill.go
  • go/distill_test.go
  • go/eval.go
  • go/eval_darwin.go
  • go/eval_darwin_test.go
  • go/eval_stub.go
  • go/eval_test.go
  • go/fast_eval.go
  • go/fast_eval_example_test.go
  • go/fast_eval_runner.go
  • go/fast_eval_test.go
  • go/gguf/info.go
  • go/gguf/info_example_test.go
  • go/gguf/info_test.go
  • go/gguf/quantize.go
  • go/gguf/quantize_test.go
  • go/grpo.go
  • go/grpo_test.go
  • go/helpers.go
  • go/hf/hf.go
  • go/hf/hf_test.go
  • go/hf/test_helpers_test.go
  • go/hf_fit.go
  • go/inference_contract.go
  • go/inference_contract_test.go
  • go/internal/metal/activation_bridge.cpp
  • go/internal/metal/array.go
  • go/internal/metal/backend.go
  • go/internal/metal/backend_test.go
  • go/internal/metal/batch.go
  • go/internal/metal/cache.go
  • go/internal/metal/cache_test.go
  • go/internal/metal/close.go
  • go/internal/metal/codebook_vq.go
  • go/internal/metal/codebook_vq_test.go
  • go/internal/metal/compile.go
  • go/internal/metal/compile_test.go
  • go/internal/metal/decode.go
  • go/internal/metal/decode_bridge.cpp
  • go/internal/metal/decode_bridge.h
  • go/internal/metal/decode_test.go
  • go/internal/metal/dense_matvec.go
  • go/internal/metal/dense_matvec_test.go
  • go/internal/metal/device.go
  • go/internal/metal/dtype.go
  • go/internal/metal/error_test.go
  • go/internal/metal/expert_id_matvec.go
  • go/internal/metal/expert_id_matvec_test.go
  • go/internal/metal/fast.go
  • go/internal/metal/fast_test.go
  • go/internal/metal/gemma3.go
  • go/internal/metal/gemma4.go
  • go/internal/metal/gemma4_assistant.go
  • go/internal/metal/gemma4_assistant_decode.go
  • go/internal/metal/gemma4_assistant_decode_example_test.go
  • go/internal/metal/gemma4_assistant_decode_test.go
  • go/internal/metal/gemma4_assistant_generate.go
  • go/internal/metal/gemma4_assistant_generate_test.go
  • go/internal/metal/gemma4_assistant_pair.go
  • go/internal/metal/gemma4_assistant_test.go
  • go/internal/metal/gemma4_ffn_residual.go
  • go/internal/metal/gemma4_ffn_residual_test.go
  • go/internal/metal/gemma4_router_topk.go
  • go/internal/metal/gemma4_router_topk_test.go
  • go/internal/metal/gemma4_test.go
  • go/internal/metal/gemma4_vision.go
  • go/internal/metal/generate.go
  • go/internal/metal/generate_test.go
  • go/internal/metal/jang_dequant.go
  • go/internal/metal/jang_dequant_test.go
  • go/internal/metal/kv_snapshot.go
  • go/internal/metal/metal.go
  • go/internal/metal/minimax_m2.go
  • go/internal/metal/minimax_m2_test.go
  • go/internal/metal/mlx_mlx_backend_cpu_available.cpp
  • go/internal/metal/mlx_mlx_backend_gpu_device_info.cpp
  • go/internal/metal/model.go
  • go/internal/metal/model_test.go
  • go/internal/metal/nn.go
  • go/internal/metal/nn_test.go
  • go/internal/metal/ops.go
  • go/internal/metal/process_memory_darwin.go
  • go/internal/metal/process_memory_stub.go
  • go/internal/metal/prompt_cache.go
  • go/internal/metal/prompt_cache_test.go
  • go/internal/metal/qwen3.go
  • go/internal/metal/qwen3_test.go
  • go/internal/metal/runtime_gate.go
  • go/internal/metal/runtime_gate_example_test.go
  • go/internal/metal/runtime_gate_test.go
  • go/internal/metal/sample.go
  • go/internal/metal/sample_test.go
  • go/internal/metal/session.go
  • go/internal/metal/session_example_test.go
  • go/internal/metal/session_test.go
  • go/internal/metal/split.go
  • go/internal/metal/split_test.go
  • go/internal/metal/stream.go
  • go/internal/metal/tokenizer.go
  • go/internal/metal/tokenizer_test.go
  • go/internal/metal/trace.go
  • go/internal/metal/trace_test.go
  • go/internal/metal/training.go
  • go/jang_test.go
  • go/kv/analysis.go
  • go/kv/analysis_example_test.go
  • go/kv/analysis_test.go
  • go/kv/bench.go
  • go/kv/bench_test.go
  • go/kv/blocks.go
  • go/kv/blocks_test.go
  • go/kv/helpers_test.go
  • go/kv/memvid.go
  • go/kv/memvid_test.go
  • go/kv/snapshot.go
  • go/kv/snapshot_example_test.go
  • go/kv/snapshot_test.go
  • go/kv_analysis_example_test.go
  • go/kv_cache_bench.go
  • go/kv_snapshot.go
  • go/kv_snapshot_example_test.go
  • go/kv_snapshot_test.go
  • go/local_tuning.go
  • go/local_tuning_test.go
  • go/lora/adapter.go
  • go/lora/fuse.go
  • go/lora/fuse_stub.go
  • go/lora/fuse_test.go
  • go/lora_adapter_darwin_test.go
  • go/lora_adapter_test.go
  • go/lora_fuse.go
  • go/lora_fuse_darwin.go
  • go/lora_fuse_darwin_test.go
  • go/lora_fuse_test.go
  • go/medium_test.go
  • go/memory/example_test.go
  • go/memory/memory.go
  • go/memory/memory_test.go
  • go/memory_plan.go
  • go/memory_plan_example_test.go
  • go/memory_plan_test.go
  • go/memvid_chapter_smoke.go
  • go/merge/compare.go
  • go/merge/compare_example_test.go
  • go/merge/compare_test.go
  • go/merge/helpers_test.go
  • go/merge/merge.go
  • go/merge/merge_test.go
  • go/mlx.go
  • go/mlx_example_test.go
  • go/mlx_internal_test.go
  • go/mlx_stub.go
  • go/mlx_stub_example_test.go
💤 Files with no reviewable changes (15)
  • go/api_test.go
  • go/api_stub_example_test.go
  • go/api_tokenizer_stub_test.go
  • go/adapter_example_test.go
  • go/api_tokenizer_stub.go
  • go/api_tokenizer_darwin_test.go
  • go/api_tokenizer_stub_example_test.go
  • go/backend_example_test.go
  • go/api_common_example_test.go
  • go/api_shape_test.go
  • go/api_common.go
  • go/api_darwin_test.go
  • go/attention_test.go
  • go/api_stub.go
  • go/api_stub_test.go

Comment thread docs/memory/kv_snapshot_blocks.md
Comment thread go/artifact/artifact.go Outdated
Comment thread go/backend.go Outdated
Comment thread go/blockcache/blockcache.go Outdated
Comment thread go/blockcache/blockcache.go Outdated
Comment thread go/blockcache/blockcache.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@go/backend.go`:
- Around line 569-572: The code is aliasing caller-owned byte slices into the
snapshot by assigning head.KeyBytes and head.ValueBytes directly to KeyBytes and
ValueBytes; make defensive copies instead (like Value is copied) to avoid
leaking mutable state—replace the direct assignments for KeyBytes and ValueBytes
with fresh copies (e.g., using append to copy into a new []byte) when
constructing the metal snapshot/struct (the fields KeyBytes and ValueBytes on
the metal KV head).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9b686e0a-8b41-4e47-975f-03cf235491e9

📥 Commits

Reviewing files that changed from the base of the PR and between 89f613e and c19bc07.

📒 Files selected for processing (22)
  • CMakeLists.txt
  • cpp/CMakeLists.txt
  • go/backend.go
  • go/backend_test.go
  • go/cmd/mlx/main.go
  • go/cmd/mlx/main_test.go
  • go/internal/metal/backend.go
  • go/internal/metal/backend_test.go
  • go/internal/metal/decode_bridge.cpp
  • go/internal/metal/gemma4.go
  • go/internal/metal/gemma4_test.go
  • go/internal/metal/generate.go
  • go/internal/metal/metal.go
  • go/internal/metal/mlx_build_config.h
  • go/internal/metal/pinned_array.go
  • go/internal/metal/pinned_array_bridge.cpp
  • go/internal/metal/pinned_array_test.go
  • go/internal/metal/sample.go
  • go/internal/metal/sample_test.go
  • go/internal/metal/session.go
  • go/kv/snapshot.go
  • go/memvid_chapter_smoke.go
✅ Files skipped from review due to trivial changes (1)
  • cpp/CMakeLists.txt

Comment thread go/backend.go Outdated

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SonarCloud found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Comment on lines +188 to +207
book_path.write_text(
"# "
+ title
+ "\n\n"
+ f"Generated by go-mlx retained State run `{report_path.name}`.\n\n"
+ f"Seed prompt: `{seed['id']}`\n\n"
+ seed["prompt"]
+ "\n\n"
+ "Distractor prompts were supplied one per chapter as entropy and "
"imagery pressure, not as replacement plot instructions.\n\n"
+ "## Distractors\n\n"
+ "\n".join(f"- `{item['id']}`" for item in distractors)
+ "\n\n"
+ "## Metrics\n\n"
+ metric_line(report)
+ "\n---\n\n"
+ "\n\n".join(chapters)
+ "\n",
encoding="utf-8",
)
parser.add_argument("--random-seed", type=int, default=0)
parser.add_argument("--count", type=int, default=1)
parser.add_argument("--turns", type=int, default=10)
parser.add_argument("--run-dir", type=Path, default=Path("/private/tmp/go-mlx-goal/book-runs"))
parser.add_argument("--count", type=int, default=1)
parser.add_argument("--turns", type=int, default=10)
parser.add_argument("--run-dir", type=Path, default=Path("/private/tmp/go-mlx-goal/book-runs"))
parser.add_argument("--book-dir", type=Path, default=Path("/private/tmp/go-mlx-goal/books"))
parser.add_argument("--turns", type=int, default=10)
parser.add_argument("--run-dir", type=Path, default=Path("/private/tmp/go-mlx-goal/book-runs"))
parser.add_argument("--book-dir", type=Path, default=Path("/private/tmp/go-mlx-goal/books"))
parser.add_argument("--manifest", type=Path, default=Path("/private/tmp/go-mlx-goal/books/manifest.jsonl"))
Comment thread scripts/state_book_from_phase0.py Fixed
_ = os.Setenv("MLX_METALLIB_PATH", dst)
return
}
if err := os.MkdirAll(dir, 0o755); err != nil {
"model_type": "gemma4",
"config_blob_id": "923b5e9405e7d319572b0c1b1a89291512262aa3",
"config_sha256": "1b28f3d2c3100f6c594754b81107428bd7b822a7f48272ca681dae9d2ec38330",
"tokenizer_blob_id": "1ff9f3e3439a939b971f9919e821bf87e835a503",
"config_blob_id": "923b5e9405e7d319572b0c1b1a89291512262aa3",
"config_sha256": "1b28f3d2c3100f6c594754b81107428bd7b822a7f48272ca681dae9d2ec38330",
"tokenizer_blob_id": "1ff9f3e3439a939b971f9919e821bf87e835a503",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"config_sha256": "1b28f3d2c3100f6c594754b81107428bd7b822a7f48272ca681dae9d2ec38330",
"tokenizer_blob_id": "1ff9f3e3439a939b971f9919e821bf87e835a503",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_blob_id": "1ff9f3e3439a939b971f9919e821bf87e835a503",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_config_sha256": "90c3a3ba5bf53818383a58e1a776cbcacd2a038d4812eaa373e1522f2d06f3df",
"model_type": "gemma4_assistant",
"config_blob_id": "b4c30e888c89b39c8f106b5015307fb7830f0bb2",
"config_sha256": "7f42f559a6a69ffaeaf6b61a1ece3a562a2ed5ad00b8d30f16917ba5ab1bcbe9",
"tokenizer_blob_id": "24aa4244652e010036db5fdd29ed39b9428e6e19",
"config_blob_id": "b4c30e888c89b39c8f106b5015307fb7830f0bb2",
"config_sha256": "7f42f559a6a69ffaeaf6b61a1ece3a562a2ed5ad00b8d30f16917ba5ab1bcbe9",
"tokenizer_blob_id": "24aa4244652e010036db5fdd29ed39b9428e6e19",
"tokenizer_sha256": "75a6583c1a418e2bbd79c60d95d28e0f5bf549ad3f2990b5bdb5238c6c2bf70c",
"config_sha256": "7f42f559a6a69ffaeaf6b61a1ece3a562a2ed5ad00b8d30f16917ba5ab1bcbe9",
"tokenizer_blob_id": "24aa4244652e010036db5fdd29ed39b9428e6e19",
"tokenizer_sha256": "75a6583c1a418e2bbd79c60d95d28e0f5bf549ad3f2990b5bdb5238c6c2bf70c",
"tokenizer_config_blob_id": "1a6bee041ca75778c514a071efbdb568b0f3d7b0",
"tokenizer_blob_id": "24aa4244652e010036db5fdd29ed39b9428e6e19",
"tokenizer_sha256": "75a6583c1a418e2bbd79c60d95d28e0f5bf549ad3f2990b5bdb5238c6c2bf70c",
"tokenizer_config_blob_id": "1a6bee041ca75778c514a071efbdb568b0f3d7b0",
"tokenizer_config_sha256": "089594a3924fcfd4cb1c596a7906fbf476193519e5198f780912eed02b177e42",
"config_sha256": "5cdd5627ab3ecf52086cc79b2c14c45a277d273069f1d73bf17a3a5136afe3db",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_config_sha256": "90c3a3ba5bf53818383a58e1a776cbcacd2a038d4812eaa373e1522f2d06f3df",
"config_sha256": "32e50a33a18172e79c86b7a78aff7e79c7544031199d672a2a65e526a8bf0199",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_config_sha256": "90c3a3ba5bf53818383a58e1a776cbcacd2a038d4812eaa373e1522f2d06f3df",
"config_sha256": "6d12c87861fff3871d3a745011b0d852be6513f3ce594ae1e8d643dae9d3b9a8",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_config_sha256": "90c3a3ba5bf53818383a58e1a776cbcacd2a038d4812eaa373e1522f2d06f3df",
"config_sha256": "614e876b4efcaff13ce4c7a3f96a5b9de86325e3d2ab9c622606ced688f1b8b7",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_config_sha256": "90c3a3ba5bf53818383a58e1a776cbcacd2a038d4812eaa373e1522f2d06f3df",
"config_sha256": "d6be5b24cbc974d492804737716ade8d2575eb849ec90a1d316bb64e99838104",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_config_sha256": "90c3a3ba5bf53818383a58e1a776cbcacd2a038d4812eaa373e1522f2d06f3df",
"config_sha256": "29b810ed760b55104943a3cc3b6f8b9ca079e6e00b09585d85aec54863a42fb4",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_blob_id": "13e92a44d19566f334d7450e7898935e16e16f3d",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"processor_config_sha256": "1bd0d00776284f369c1eff5fb631e865dfcdca861e0b7d60dbef27fcf37436a8",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_blob_id": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_sha256": "cc8d3a0ce36466ccc1278bf987df5f71db1719b9ca6b4118264f45cb627bfe0f",
"tokenizer_config_blob_id": "375b25dc8be85705251e41be1c25310d24932051",
"tokenizer_config_sha256": "90c3a3ba5bf53818383a58e1a776cbcacd2a038d4812eaa373e1522f2d06f3df",
"command": "env MLX_METALLIB_PATH=/Users/snider/Code/core/go-mlx/dist/lib/mlx.metallib GOWORK=/Users/snider/Code/core/go-mlx/go.work GOCACHE=/private/tmp/go-mlx-self/gocache /private/tmp/go-mlx-self/bin/lthn-mlx driver-profile -json -fast-gemma4-lane -cache-mode paged -context 4096 -trace-token-phases=false -prompt \"Write a short engineering note explaining why Gemma 4 12B Unified uses a 1024-token local sliding window and full global owner layers in a retained-state runtime.\" -max-tokens 192 -runs 1 -include-output=true -report-file /private/tmp/go-mlx-self/reports/gemma4-12b-6bit-sample-output.json /private/tmp/go-mlx-self/models/mlx-community-gemma-4-12B-6bit",
"generated_tokens": 192,
"visible_tokens": 192,
"output_token_ids_sha256": "d34765e9895731937ad93004503887835008d9fdb532f7da7cadb6ba2cc9327c",
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Too many files!

This PR contains 2674 files, which is 2524 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

Upgrade to a paid plan to raise the limit.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3a830746-5ae8-491b-9b37-f4a59f1712f6

📥 Commits

Reviewing files that changed from the base of the PR and between ccb78c6 and 9b09e76.

⛔ Files ignored due to path filters (9)
  • go.work is excluded by !**/*.work
  • go.work.sum is excluded by !**/*.sum
  • go/cmd/mlx/assets/app-icon.png is excluded by !**/*.png
  • go/cmd/mlx/assets/tray.png is excluded by !**/*.png
  • go/go.sum is excluded by !**/*.sum
  • go/pkg/metal/model/gemma4/testdata/vision_photo_640x480.png is excluded by !**/*.png
  • go/pkg/metal/model/gemma4/testdata/vision_tiny_64x64.png is excluded by !**/*.png
  • go/pkg/metal/model/gemma4/testdata/vision_video_frame.png is excluded by !**/*.png
  • go/pkg/metal/model/gemma4/testdata/vision_wide_1200x100.png is excluded by !**/*.png
📒 Files selected for processing (2674)
  • .codecov.yml
  • .forgejo/workflows/security-scan.yml
  • .forgejo/workflows/test.yml
  • .gitignore
  • .gitmodules
  • AGENTS.md
  • CLAUDE.md
  • CLAUDE.operator.md
  • CMakeLists.txt
  • CODEX_NATIVE_ENGINE_WORK.md
  • GOAL.md
  • IDEAS.md
  • NATIVE_ENGINE_TODO.md
  • README.md
  • TODO.md
  • Taskfile.yml
  • cmake/CompilerCache.cmake
  • compute_darwin_test.go
  • cpp/CMakeLists.txt
  • docs/MIGRATION.md
  • docs/README.md
  • docs/RFC.diffusion-gemma.md
  • docs/RFC.model-sdk.md
  • docs/architecture.md
  • docs/build.md
  • docs/cmd/violet.md
  • docs/compute/compute.md
  • docs/cpp-test-status.md
  • docs/development.md
  • docs/distillation.md
  • docs/examples/book-bench.sh
  • docs/examples/book-bench.tape
  • docs/examples/compute/frame-pipeline.md
  • docs/examples/creative-demo.json
  • docs/examples/daemon/violet-socket.md
  • docs/examples/eval/attention-probe.md
  • docs/examples/eval/perplexity.md
  • docs/examples/inference/batch.md
  • docs/examples/inference/chat.md
  • docs/examples/inference/quantization.md
  • docs/examples/inference/streaming.md
  • docs/examples/model-ops/hf-fit.md
  • docs/examples/model-ops/kv-snapshot.md
  • docs/examples/model-ops/merge.md
  • docs/examples/model-ops/quantize-gguf.md
  • docs/examples/training/distill.md
  • docs/examples/training/grpo.md
  • docs/examples/training/lora-finetune.md
  • docs/examples/training/lora-fuse.md
  • docs/history.md
  • docs/index.md
  • docs/inference/README.md
  • docs/inference/block_cache.md
  • docs/inference/decode_optimisation.md
  • docs/inference/parser_registry.md
  • docs/inference/scheduler.md
  • docs/inference/thinking.md
  • docs/memory/README.md
  • docs/memory/agent_memory.md
  • docs/memory/agentic_project_seed.md
  • docs/memory/kv_snapshot.md
  • docs/memory/kv_snapshot_blocks.md
  • docs/memory/kv_snapshot_index.md
  • docs/memory/kv_snapshot_state.md
  • docs/memory/medium.md
  • docs/memory/state_bundle.md
  • docs/model-operations.md
  • docs/model-state-roadmap.md
  • docs/model/README.md
  • docs/model/memory_plan.md
  • docs/model/model_pack.md
  • docs/models.md
  • docs/moe/README.md
  • docs/moe/codebook_vq.md
  • docs/moe/expert_residency.md
  • docs/moe/jang.md
  • docs/moe/minimax_m2.md
  • docs/observability/probe.md
  • docs/operator/deployment.md
  • docs/operator/index.md
  • docs/operator/metallib-and-variants.md
  • docs/operator/troubleshooting.md
  • docs/plan.model-sdk.md
  • docs/plans/2026-06-06-competitive-runner-research.md
  • docs/plans/2026-06-06-gguf-native-metal.md
  • docs/plans/2026-06-06-llamacpp-baseline-gap-matrix.md
  • docs/plans/2026-06-06-parity-harness-extension.md
  • docs/plans/2026-06-06-state-kv-architecture.md
  • docs/plans/2026-06-07-mtp-batched-decode-kernel.md
  • docs/plans/2026-06-08-ax11-decode-matrix.md
  • docs/plans/rival-commit-watch.md
  • docs/reference-diffusion-gemma/configuration_diffusion_gemma.py
  • docs/reference-diffusion-gemma/deepmind/__init__.py
  • docs/reference-diffusion-gemma/deepmind/_chat_sampler.py
  • docs/reference-diffusion-gemma/deepmind/_early_stopping.py
  • docs/reference-diffusion-gemma/deepmind/_models.py
  • docs/reference-diffusion-gemma/deepmind/_sampler.py
  • docs/reference-diffusion-gemma/deepmind/_transformer.py
  • docs/reference-diffusion-gemma/gemma4_modules.py
  • docs/reference-diffusion-gemma/generation_diffusion_gemma.py
  • docs/reference-diffusion-gemma/model_card.md
  • docs/reference-diffusion-gemma/modular_diffusion_gemma.py
  • docs/reference-diffusion-gemma/vllm_gist.txt
  • docs/reference-diffusion-gemma/vllm_post.txt
  • docs/runtime/.gitignore
  • docs/runtime/2026-05-31-official-gemma4-e2b-source-lock.json
  • docs/runtime/2026-06-04-auto-round-profiles.json
  • docs/runtime/2026-06-04-gemma4-12b-6bit-performance.json
  • docs/runtime/2026-06-04-memory-pretraining-artifacts.json
  • docs/runtime/2026-06-04-official-gemma4-12b-unified-source-lock.json
  • docs/runtime/2026-06-04-simple-self-distillation-recipes.json
  • docs/runtime/2026-06-05-gemma4-6bit-chapter-profile.md
  • docs/runtime/README.md
  • docs/runtime/adapter.md
  • docs/runtime/local_autotune.md
  • docs/runtime/register_metal.md
  • docs/runtime/turboquant_kv.md
  • docs/test-pairing.md
  • docs/training.md
  • docs/training/README.md
  • docs/training/distill.md
  • docs/training/eval.md
  • docs/training/grpo.md
  • docs/training/lora_adapter.md
  • docs/training/lora_state_timeline.md
  • docs/training/sft.md
  • docs/vmlx-feature-gap-report.md
  • examples/eval/attention-probe.md
  • examples/inference/quantization.md
  • examples/model-ops/hf-fit.md
  • examples/model-ops/kv-snapshot.md
  • examples/model-ops/merge.md
  • examples/model-ops/quantize-gguf.md
  • external/go
  • external/go-ai
  • external/go-cgo
  • external/go-i18n
  • external/go-inference
  • external/go-io
  • external/go-log
  • external/go-ml
  • external/go-store
  • go/adapter.go
  • go/adapter/adapter.go
  • go/adapter/adapter_bench_test.go
  • go/adapter/adapter_coverage_test.go
  • go/adapter/adapter_example_test.go
  • go/adapter/adapter_test.go
  • go/adapter_example_test.go
  • go/adapter_test.go
  • go/api_common.go
  • go/api_common_example_test.go
  • go/api_common_test.go
  • go/api_darwin.go
  • go/api_darwin_example_test.go
  • go/api_darwin_test.go
  • go/api_shape_common.go
  • go/api_shape_test.go
  • go/api_stub.go
  • go/api_stub_example_test.go
  • go/api_stub_test.go
  • go/api_test.go
  • go/api_tokenizer_darwin.go
  • go/api_tokenizer_darwin_example_test.go
  • go/api_tokenizer_darwin_test.go
  • go/api_tokenizer_stub.go
  • go/api_tokenizer_stub_example_test.go
  • go/api_tokenizer_stub_test.go
  • go/api_tokenizer_test.go
  • go/attention_snapshot_test.go
  • go/attention_test.go
  • go/backend.go
  • go/backend_bench_test.go
  • go/backend_common.go
  • go/backend_common_test.go
  • go/backend_convert.go
  • go/backend_example_test.go
  • go/backend_test.go
  • go/benchsummary/summary.go
  • go/benchsummary/summary_more_test.go
  • go/benchsummary/summary_test.go
  • go/chaptersmoke/chaptersmoke.go
  • go/chaptersmoke/chaptersmoke_bench_test.go
  • go/chaptersmoke/chaptersmoke_coverage_test.go
  • go/chaptersmoke/chaptersmoke_example_test.go
  • go/chaptersmoke/chaptersmoke_test.go
  • go/chat/chat.go
  • go/chat/chat_bench_test.go
  • go/chat/chat_example_test.go
  • go/chat/chat_test.go
  • go/chat/registry.go
  • go/chat/registry_example_test.go
  • go/chat/registry_test.go
  • go/chat_config.go
  • go/chat_config_example_test.go
  • go/chat_config_test.go
  • go/cmd/go-mlx/main.go
  • go/cmd/go-mlx/main_test.go
  • go/cmd/mlx/.gitignore
  • go/cmd/mlx/admin.go
  • go/cmd/mlx/admin_auth.go
  • go/cmd/mlx/admin_auth_test.go
  • go/cmd/mlx/admin_download.go
  • go/cmd/mlx/admin_download_coverage_test.go
  • go/cmd/mlx/admin_download_handler_coverage_test.go
  • go/cmd/mlx/admin_download_test.go
  • go/cmd/mlx/admin_hf.go
  • go/cmd/mlx/admin_hf_coverage_test.go
  • go/cmd/mlx/admin_hf_path_bench_test.go
  • go/cmd/mlx/admin_hf_test.go
  • go/cmd/mlx/admin_reload.go
  • go/cmd/mlx/admin_reload_coverage_test.go
  • go/cmd/mlx/admin_reload_manifest_bench_test.go
  • go/cmd/mlx/admin_reload_runtime_coverage_test.go
  • go/cmd/mlx/admin_reload_test.go
  • go/cmd/mlx/admin_serve_status.go
  • go/cmd/mlx/admin_serve_status_test.go
  • go/cmd/mlx/admin_sft.go
  • go/cmd/mlx/admin_sft_coverage_test.go
  • go/cmd/mlx/admin_sft_runtime_coverage_test.go
  • go/cmd/mlx/admin_sft_test.go
  • go/cmd/mlx/admin_test.go
  • go/cmd/mlx/audio.go
  • go/cmd/mlx/audio_example_test.go
  • go/cmd/mlx/audio_test.go
  • go/cmd/mlx/cache_mode.go
  • go/cmd/mlx/cache_mode_example_test.go
  • go/cmd/mlx/cache_mode_test.go
  • go/cmd/mlx/cluster_coverage_test.go
  • go/cmd/mlx/diffuse.go
  • go/cmd/mlx/diffuse_test.go
  • go/cmd/mlx/ebook.go
  • go/cmd/mlx/ebook_test.go
  • go/cmd/mlx/embed_metallib.go
  • go/cmd/mlx/fuse.go
  • go/cmd/mlx/fuse_test.go
  • go/cmd/mlx/generate.go
  • go/cmd/mlx/generate_coverage_test.go
  • go/cmd/mlx/generate_e2b_runtime_coverage_test.go
  • go/cmd/mlx/generate_example_test.go
  • go/cmd/mlx/generate_load_error_coverage_test.go
  • go/cmd/mlx/generate_ssd_runtime_coverage_test.go
  • go/cmd/mlx/generate_state_chat_test.go
  • go/cmd/mlx/generate_state_store_coverage_test.go
  • go/cmd/mlx/generate_test.go
  • go/cmd/mlx/main.go
  • go/cmd/mlx/main_coverage_test.go
  • go/cmd/mlx/main_discover_coverage_test.go
  • go/cmd/mlx/main_slice_coverage_test.go
  • go/cmd/mlx/main_subprocess_coverage_test.go
  • go/cmd/mlx/main_test.go
  • go/cmd/mlx/main_tuning_error_coverage_test.go
  • go/cmd/mlx/memory_pretrain_build.go
  • go/cmd/mlx/memory_pretrain_build_test.go
  • go/cmd/mlx/menubar.go
  • go/cmd/mlx/menubar_serve_coverage_test.go
  • go/cmd/mlx/menubar_test.go
  • go/cmd/mlx/metallib_provenance.go
  • go/cmd/mlx/metallib_provenance_test.go
  • go/cmd/mlx/multimodal.go
  • go/cmd/mlx/multimodal_prompt_bench_test.go
  • go/cmd/mlx/native_multimodal_arch_test.go
  • go/cmd/mlx/pack.go
  • go/cmd/mlx/pack_coverage_test.go
  • go/cmd/mlx/partials_coverage_test.go
  • go/cmd/mlx/score_route.go
  • go/cmd/mlx/score_route_test.go
  • go/cmd/mlx/serve.go
  • go/cmd/mlx/serve_coverage_test.go
  • go/cmd/mlx/serve_resolver.go
  • go/cmd/mlx/serve_resolver_coverage_test.go
  • go/cmd/mlx/serve_resolver_test.go
  • go/cmd/mlx/serve_runtime_test.go
  • go/cmd/mlx/serve_test.go
  • go/cmd/mlx/sft.go
  • go/cmd/mlx/sft_coverage_test.go
  • go/cmd/mlx/sft_e2b_runtime_coverage_test.go
  • go/cmd/mlx/sft_metrics_runtime_coverage_test.go
  • go/cmd/mlx/sft_runtime_coverage_test.go
  • go/cmd/mlx/sft_test.go
  • go/cmd/mlx/small_error_branches_coverage_test.go
  • go/cmd/mlx/small_files_coverage_test.go
  • go/cmd/mlx/split_ffn_append_coverage_test.go
  • go/cmd/mlx/split_ffn_tune.go
  • go/cmd/mlx/split_ffn_tune_example_test.go
  • go/cmd/mlx/split_ffn_tune_test.go
  • go/cmd/mlx/ssd.go
  • go/cmd/mlx/ssd_eval.go
  • go/cmd/mlx/ssd_misc_coverage_test.go
  • go/cmd/mlx/ssd_recipes.go
  • go/cmd/mlx/ssd_test.go
  • go/cmd/mlx/state_marker.go
  • go/cmd/mlx/state_pack.go
  • go/cmd/mlx/state_pack_coverage_test.go
  • go/cmd/mlx/state_pack_test.go
  • go/cmd/mlx/synthetic_model_test.go
  • go/cmd/mlx/tune.go
  • go/cmd/mlx/tune_coverage_test.go
  • go/cmd/mlx/tune_test.go
  • go/cmd/mlx/vision.go
  • go/cmd/mlx/vision_audio_coverage_test.go
  • go/cmd/mlx/vision_example_test.go
  • go/cmd/mlx/vision_test.go
  • go/cmd/mlx/wav.go
  • go/cmd/mlx/wav_test.go
  • go/cmd/violet/coverage_test.go
  • go/compiled_layer_hits_live_test.go
  • go/compiled_layer_live_test.go
  • go/compiled_mlp_live_test.go
  • go/compute/compute.go
  • go/compute/compute_bench_test.go
  • go/compute/compute_coverage_test.go
  • go/compute/compute_example_test.go
  • go/compute/compute_kernels.go
  • go/compute/compute_metal.go
  • go/compute/compute_metal_bench_test.go
  • go/compute/compute_metal_example_test.go
  • go/compute/compute_metal_helper_test.go
  • go/compute/compute_metal_test.go
  • go/compute/compute_test.go
  • go/compute_darwin.go
  • go/compute_darwin_example_test.go
  • go/compute_darwin_helper_test.go
  • go/compute_darwin_test.go
  • go/compute_example_test.go
  • go/compute_stub.go
  • go/compute_stub_example_test.go
  • go/compute_stub_test.go
  • go/compute_test.go
  • go/conversation_continuity.go
  • go/conversation_continuity_extra_test.go
  • go/conversation_continuity_live_test.go
  • go/conversation_continuity_test.go
  • go/coverage_adapter_cache_live_test.go
  • go/coverage_adapter_nil_test.go
  • go/coverage_contract_live_test.go
  • go/coverage_converters_test.go
  • go/coverage_discovery_test.go
  • go/coverage_helpers_test.go
  • go/coverage_kvstate_live_test.go
  • go/coverage_live_model_test.go
  • go/coverage_metaladapter_live_test.go
  • go/coverage_misc_test.go
  • go/coverage_model_slice_test.go
  • go/coverage_native_textmodel_live_test.go
  • go/coverage_planning_test.go
  • go/coverage_session_continuity_live_test.go
  • go/coverage_speculative_live_test.go
  • go/coverage_split_cpu_test.go
  • go/coverage_split_estimate_test.go
  • go/coverage_ssd_chapter_live_test.go
  • go/coverage_validators_test.go
  • go/dataset/jsonl.go
  • go/dataset/jsonl_bench_test.go
  • go/dataset/jsonl_coverage_test.go
  • go/dataset/jsonl_example_test.go
  • go/dataset/jsonl_test.go
  • go/dataset/sample.go
  • go/dataset/sample_bench_test.go
  • go/dataset/sample_example_test.go
  • go/dataset/sample_test.go
  • go/dataset_stream.go
  • go/dataset_stream_example_test.go
  • go/dataset_stream_test.go
  • go/decode_generator.go
  • go/det_probe_test.go
  • go/device_info.go
  • go/device_info_bench_test.go
  • go/distill.go
  • go/distill/distill.go
  • go/distill/distill_bench_test.go
  • go/distill/distill_branch_coverage_test.go
  • go/distill/distill_checkpoint.go
  • go/distill/distill_checkpoint_bench_test.go
  • go/distill/distill_checkpoint_example_test.go
  • go/distill/distill_checkpoint_test.go
  • go/distill/distill_compat.go
  • go/distill/distill_compat_example_test.go
  • go/distill/distill_compat_test.go
  • go/distill/distill_example_test.go
  • go/distill/distill_loss.go
  • go/distill/distill_loss_bench_test.go
  • go/distill/distill_loss_cachekey_parity_test.go
  • go/distill/distill_loss_example_test.go
  • go/distill/distill_loss_test.go
  • go/distill/distill_test.go
  • go/distill/distill_testhelper_test.go
  • go/distill_test.go
  • go/draft_detect.go
  • go/draft_detect_test.go
  • go/ebook/ebook_bench_test.go
  • go/ebook/epub.go
  • go/ebook/epub_test.go
  • go/ebook/model.go
  • go/ebook/model_test.go
  • go/eval.go
  • go/eval_bench_test.go
  • go/eval_darwin.go
  • go/eval_darwin_test.go
  • go/eval_extra_test.go
  • go/eval_stub.go
  • go/eval_test.go
  • go/fast_eval.go
  • go/fast_eval_example_test.go
  • go/fast_eval_test.go
  • go/fla_registration_test.go
  • go/generate.go
  • go/generate_entrypoints_live_bench_test.go
  • go/generate_live_bench_test.go
  • go/generate_options.go
  • go/generate_options_example_test.go
  • go/generate_options_test.go
  • go/gguf/info.go
  • go/gguf/info_bench_test.go
  • go/gguf/info_coverage_test.go
  • go/gguf/info_example_test.go
  • go/gguf/info_longarch_test.go
  • go/gguf/info_parse.go
  • go/gguf/info_parse_coverage_test.go
  • go/gguf/info_quant.go
  • go/gguf/info_quant_coverage_test.go
  • go/gguf/info_quant_example_test.go
  • go/gguf/info_quant_test.go
  • go/gguf/info_test.go
  • go/gguf/metadata.go
  • go/gguf/metadata_example_test.go
  • go/gguf/metadata_test.go
  • go/gguf/quantize.go
  • go/gguf/quantize_bench_test.go
  • go/gguf/quantize_coverage_test.go
  • go/gguf/quantize_example_test.go
  • go/gguf/quantize_kernels_coverage_test.go
  • go/gguf/quantize_kquant_test.go
  • go/gguf/quantize_metadata_copy_bench_test.go
  • go/gguf/quantize_modelpack_bench_test.go
  • go/gguf/quantize_modelpack_byteident_test.go
  • go/gguf/quantize_test.go
  • go/gguf/quantize_writer.go
  • go/gguf/quantize_writer_bench_test.go
  • go/gguf/quantize_writer_byteident_test.go
  • go/gguf/quantize_writer_coverage_test.go
  • go/gguf/quantize_writer_test.go
  • go/gguf/tensors.go
  • go/gguf/tensors_mmap_other.go
  • go/gguf/tensors_mmap_unix.go
  • go/gguf/tensors_test.go
  • go/gguf_info.go
  • go/gguf_info_example_test.go
  • go/gguf_info_test.go
  • go/gguf_quantize.go
  • go/gguf_quantize_test.go
  • go/go.mod
  • go/grpo.go
  • go/grpo/grpo.go
  • go/grpo/grpo_bench_test.go
  • go/grpo/grpo_checkpoint.go
  • go/grpo/grpo_checkpoint_bench_test.go
  • go/grpo/grpo_checkpoint_example_test.go
  • go/grpo/grpo_checkpoint_test.go
  • go/grpo/grpo_compat.go
  • go/grpo/grpo_compat_example_test.go
  • go/grpo/grpo_compat_test.go
  • go/grpo/grpo_coverage_test.go
  • go/grpo/grpo_example_test.go
  • go/grpo/grpo_reward.go
  • go/grpo/grpo_reward_bench_test.go
  • go/grpo/grpo_reward_example_test.go
  • go/grpo/grpo_reward_test.go
  • go/grpo/grpo_test.go
  • go/grpo/grpo_testhelper_test.go
  • go/grpo_test.go
  • go/helpers.go
  • go/helpers_bench_test.go
  • go/helpers_test.go
  • go/hf/hf.go
  • go/hf/hf_bench_test.go
  • go/hf/hf_coverage_test.go
  • go/hf/hf_example_test.go
  • go/hf/hf_fit.go
  • go/hf/hf_fit_bench_test.go
  • go/hf/hf_fit_example_test.go
  • go/hf/hf_fit_test.go
  • go/hf/hf_jang.go
  • go/hf/hf_jang_bench_test.go
  • go/hf/hf_jang_example_test.go
  • go/hf/hf_jang_test.go
  • go/hf/hf_test.go
  • go/hf/test_helpers_test.go
  • go/hf_fit.go
  • go/hf_fit_test.go
  • go/inference_contract.go
  • go/inference_contract_bench_test.go
  • go/inference_contract_test.go
  • go/inference_convert.go
  • go/inference_convert_bench_test.go
  • go/inference_convert_test.go
  • go/internal/loraadapter/config.go
  • go/internal/loraadapter/config_bench_test.go
  • go/internal/loraadapter/config_native_test.go
  • go/internal/loraadapter/config_test.go
  • go/internal/metal/array.go
  • go/internal/metal/array_example_test.go
  • go/internal/metal/array_test.go
  • go/internal/metal/backend.go
  • go/internal/metal/backend_test.go
  • go/internal/metal/batch_test.go
  • go/internal/metal/bench_test.go
  • go/internal/metal/cache.go
  • go/internal/metal/cache_example_test.go
  • go/internal/metal/cache_test.go
  • go/internal/metal/close.go
  • go/internal/metal/close_test.go
  • go/internal/metal/compile.go
  • go/internal/metal/compile_test.go
  • go/internal/metal/debug_stream_test.go
  • go/internal/metal/detach_test.go
  • go/internal/metal/device.go
  • go/internal/metal/dtype_test.go
  • go/internal/metal/error_test.go
  • go/internal/metal/export_test.go
  • go/internal/metal/fast.go
  • go/internal/metal/fast_example_test.go
  • go/internal/metal/fast_test.go
  • go/internal/metal/gc_test.go
  • go/internal/metal/gemma3.go
  • go/internal/metal/gemma3_example_test.go
  • go/internal/metal/gemma3_test.go
  • go/internal/metal/gemma4.go
  • go/internal/metal/gemma4_example_test.go
  • go/internal/metal/gemma4_test.go
  • go/internal/metal/gemma4_vision.go
  • go/internal/metal/gemma4_vision_example_test.go
  • go/internal/metal/gemma4_vision_test.go
  • go/internal/metal/generate.go
  • go/internal/metal/generate_example_test.go
  • go/internal/metal/generate_test.go
  • go/internal/metal/gguf.go
  • go/internal/metal/gguf_test.go
  • go/internal/metal/grad_example_test.go
  • go/internal/metal/grad_test.go
  • go/internal/metal/io_custom_example_test.go
  • go/internal/metal/io_custom_test.go
  • go/internal/metal/io_example_test.go
  • go/internal/metal/io_test.go
  • go/internal/metal/kv_snapshot.go
  • go/internal/metal/lora.go
  • go/internal/metal/lora_example_test.go
  • go/internal/metal/lora_merge_example_test.go
  • go/internal/metal/lora_merge_test.go
  • go/internal/metal/lora_test.go
  • go/internal/metal/metal.go
  • go/internal/metal/metal_kernel.go
  • go/internal/metal/metal_kernel_test.go
  • go/internal/metal/metal_test.go
  • go/internal/metal/mlx_build_config.h
  • go/internal/metal/mlx_mlx_backend_cpu_available.cpp
  • go/internal/metal/model.go
  • go/internal/metal/model_example_test.go
  • go/internal/metal/model_files.go
  • go/internal/metal/model_test.go
  • go/internal/metal/nn.go
  • go/internal/metal/nn_example_test.go
  • go/internal/metal/nn_test.go
  • go/internal/metal/ops.go
  • go/internal/metal/ops_example_test.go
  • go/internal/metal/ops_test.go
  • go/internal/metal/optim.go
  • go/internal/metal/optim_example_test.go
  • go/internal/metal/optim_test.go
  • go/internal/metal/probe.go
  • go/internal/metal/prompt_cache.go
  • go/internal/metal/qwen3.go
  • go/internal/metal/qwen3_example_test.go
  • go/internal/metal/qwen3_test.go
  • go/internal/metal/random.go
  • go/internal/metal/random_test.go
  • go/internal/metal/sample.go
  • go/internal/metal/sample_example_test.go
  • go/internal/metal/sample_test.go
  • go/internal/metal/session.go
  • go/internal/metal/session_example_test.go
  • go/internal/metal/session_test.go
  • go/internal/metal/slice.go
  • go/internal/metal/slice_example_test.go
  • go/internal/metal/slice_test.go
  • go/internal/metal/stream.go
  • go/internal/metal/stream_test.go
  • go/internal/metal/testmain_test.go
  • go/internal/metal/tokenizer.go
  • go/internal/metal/tokenizer_example_test.go
  • go/internal/metal/tokenizer_test.go
  • go/internal/metal/training.go
  • go/internal/metal/training_example_test.go
  • go/internal/metal/training_test.go
  • go/internal/metal/vector_example_test.go
  • go/internal/metal/vector_test.go
  • go/internal/metal/version_test.go
  • go/internal/metaltest/hfmodel.go
  • go/internal/metaltest/metal_runtime_off.go
  • go/internal/metaltest/metal_runtime_on.go
  • go/internal/metaltest/model_eval_off.go
  • go/internal/metaltest/model_eval_on.go
  • go/internal/sessionfake/sessionfake.go
  • go/internal/sessionfake/sessionfake_coverage_test.go
  • go/internal/tokenizer/tokenizer.go
  • go/internal/tokenizer/tokenizer_example_test.go
  • go/internal/tokenizer/tokenizer_test.go
  • go/kv_analysis.go
  • go/kv_analysis_example_test.go
  • go/kv_analysis_test.go
  • go/kv_cache_bench.go
  • go/kv_cache_bench_test.go
  • go/kv_snapshot.go
  • go/kv_snapshot_example_test.go
  • go/kv_snapshot_test.go
  • go/kvconv/blocksource.go
  • go/kvconv/blocksource_example_test.go
  • go/kvconv/blocksource_test.go
  • go/kvconv/kvconv.go
  • go/kvconv/kvconv_bench_test.go
  • go/kvconv/kvconv_example_test.go
  • go/kvconv/kvconv_test.go
  • go/load_options.go
  • go/local_tuning.go
  • go/local_tuning_bench_test.go
  • go/local_tuning_test.go
  • go/lora/adapter.go
  • go/lora/adapter_bench_test.go
  • go/lora/adapter_example_test.go
  • go/lora/adapter_test.go
  • go/lora/coverage2_test.go
  • go/lora/coverage_test.go
  • go/lora/fuse.go
  • go/lora/fuse_bench_test.go
  • go/lora/fuse_example_test.go
  • go/lora/fuse_into_pack_bench_test.go
  • go/lora/fuse_stub.go
  • go/lora/fuse_test.go
  • go/lora_adapter.go
  • go/lora_adapter_darwin_test.go
  • go/lora_adapter_test.go
  • go/lora_fuse.go
  • go/lora_fuse_darwin.go
  • go/lora_fuse_darwin_test.go
  • go/lora_fuse_stub.go
  • go/lora_fuse_test.go
  • go/medium.go
  • go/medium_bench_test.go
  • go/medium_test.go
  • go/memory_plan.go
  • go/memory_plan_bench_test.go
  • go/memory_plan_example_test.go
  • go/memory_plan_extra_test.go
  • go/memory_plan_test.go
  • go/memorypretrain/artifacts.go
  • go/memorypretrain/artifacts_coverage_test.go
  • go/memorypretrain/artifacts_example_test.go
  • go/memorypretrain/artifacts_test.go
  • go/memorypretrain/bank_file.go
  • go/memorypretrain/bank_file_coverage_test.go
  • go/memorypretrain/bank_file_example_test.go
  • go/memorypretrain/bank_file_test.go
  • go/memorypretrain/dataset_cluster_ids.go
  • go/memorypretrain/dataset_cluster_ids_coverage_test.go
  • go/memorypretrain/dataset_cluster_ids_example_test.go
  • go/memorypretrain/dataset_cluster_ids_test.go
  • go/memorypretrain/ffn_memory.go
  • go/memorypretrain/ffn_memory_example_test.go
  • go/memorypretrain/ffn_memory_file.go
  • go/memorypretrain/ffn_memory_file_coverage_test.go
  • go/memorypretrain/ffn_memory_file_example_test.go
  • go/memorypretrain/ffn_memory_file_test.go
  • go/memorypretrain/ffn_memory_metal.go
  • go/memorypretrain/ffn_memory_metal_coverage_test.go
  • go/memorypretrain/ffn_memory_metal_example_test.go
  • go/memorypretrain/ffn_memory_metal_test.go
  • go/memorypretrain/ffn_memory_runtime.go
  • go/memorypretrain/ffn_memory_runtime_example_test.go
  • go/memorypretrain/ffn_memory_runtime_test.go
  • go/memorypretrain/ffn_memory_test.go
  • go/memorypretrain/memorypretrain.go
  • go/memorypretrain/memorypretrain_bench_test.go
  • go/memorypretrain/memorypretrain_coverage_test.go
  • go/memorypretrain/memorypretrain_example_test.go
  • go/memorypretrain/memorypretrain_test.go
  • go/merge/compare.go
  • go/merge/compare_bench_test.go
  • go/merge/compare_example_test.go
  • go/merge/compare_test.go
  • go/merge/helpers_test.go
  • go/merge/merge.go
  • go/merge/merge_bench_test.go
  • go/merge/merge_coverage2_test.go
  • go/merge/merge_coverage3_test.go
  • go/merge/merge_coverage_test.go
  • go/merge/merge_example_test.go
  • go/merge/merge_header_parity_test.go
  • go/merge/merge_test.go
  • go/merge/merge_write.go
  • go/merge/merge_write_bench_test.go
  • go/merge/merge_write_test.go
  • go/metal_capabilities.go
  • go/metal_session_adapter.go
  • go/mlx.go
  • go/mlx_bench_test.go
  • go/mlx_example_test.go
  • go/mlx_internal_test.go
  • go/mlx_stub.go
  • go/mlx_stub_example_test.go
  • go/mlx_stub_test.go
  • go/mlx_test.go
  • go/mlxlm/backend.go
  • go/mlxlm/backend_example_test.go
  • go/mlxlm/backend_test.go
  • go/mlxlm/bridge.py
  • go/mlxlm/testdata/mock_bridge.py
  • go/model/config_probe.go
  • go/model/config_probe_bench_test.go
  • go/model/config_probe_test.go
  • go/model/config_probe_unmarshal.go
  • go/model/config_probe_unmarshal_branches_test.go
  • go/model/config_probe_unmarshal_test.go
  • go/model/gguf_test_helpers_test.go
  • go/model/minimax/m2/helpers.go
  • go/model/minimax/m2/helpers_test.go
  • go/model/minimax/m2/m2.go
  • go/model/minimax/m2/m2_coverage_test.go
  • go/model/minimax/m2/m2_example_test.go
  • go/model/minimax/m2/m2_load.go
  • go/model/minimax/m2/m2_load_bench_test.go
  • go/model/minimax/m2/m2_load_coverage_test.go
  • go/model/minimax/m2/m2_load_example_test.go
  • go/model/minimax/m2/m2_load_test.go
  • go/model/minimax/m2/m2_metal.go
  • go/model/minimax/m2/m2_metal_coverage_test.go
  • go/model/minimax/m2/m2_metal_example_test.go
  • go/model/minimax/m2/m2_metal_test.go
  • go/model/minimax/m2/m2_route.go
  • go/model/minimax/m2/m2_route_bench_test.go
  • go/model/minimax/m2/m2_route_coverage_test.go
  • go/model/minimax/m2/m2_route_example_test.go
  • go/model/minimax/m2/m2_route_test.go
  • go/model/minimax/m2/m2_test.go
  • go/model/minimax/m2/metal_test_helper_test.go
  • go/model/minimax/m2/perf_bench_test.go
  • go/model/minimax/m2/residency.go
  • go/model/minimax/m2/residency_coverage_test.go
  • go/model/minimax/m2/residency_example_test.go
  • go/model/minimax/m2/residency_test.go
  • go/model/minimax/m2/test_helpers_test.go
  • go/model/minimax_m2_test_helpers_test.go
  • go/model/pack.go
  • go/model/pack_bench_test.go
  • go/model/pack_branches_test.go
  • go/model/pack_chattemplate.go
  • go/model/pack_dirindex.go
  • go/model/pack_example_test.go
  • go/model/pack_helpers_branches_test.go
  • go/model/pack_jsondec.go
  • go/model/pack_jsondec_test.go
  • go/model/pack_quantinspect.go
  • go/model/pack_residual_branches_test.go
  • go/model/pack_taskprofiles.go
  • go/model/pack_test.go
  • go/model/quant.go
  • go/model/quant_bench_test.go
  • go/model/quant_branches_test.go
  • go/model/quant_example_test.go
  • go/model/quant_test.go
  • go/model_lora.go
  • go/model_lora_test.go
  • go/model_merge.go
  • go/model_merge_test.go
  • go/model_pack.go
  • go/model_pack_test.go
  • go/model_slice.go
  • go/model_slice_bench_test.go
  • go/model_slice_test.go
  • go/mtp_live_test.go
  • go/native_model.go
  • go/native_model_test.go
  • go/native_speculative_live_test.go
  • go/native_speculative_textmodel.go
  • go/openai/admin.go
  • go/openai/admin_bench_test.go
  • go/openai/admin_example_test.go
  • go/openai/admin_test.go
  • go/openai/openai.go
  • go/openai/openai_bench_test.go
  • go/openai/openai_example_test.go
  • go/openai/openai_streamcov_test.go
  • go/openai/openai_test.go
  • go/openai/sse_ollama_test.go
  • go/openai/sse_responses_test.go
  • go/openai/sse_test.go
  • go/options_darwin.go
  • go/pkg/daemon/dispatch.go
  • go/pkg/daemon/dispatch_example_test.go
  • go/pkg/daemon/dispatch_test.go
  • go/pkg/daemon/native.go
  • go/pkg/daemon/native_example_test.go
  • go/pkg/daemon/native_test.go
  • go/pkg/daemon/perf_bench_test.go
  • go/pkg/daemon/server.go
  • go/pkg/daemon/server_example_test.go
  • go/pkg/daemon/server_test.go
  • go/pkg/hip/adamw_state.go
  • go/pkg/hip/adamw_state_file.go
  • go/pkg/hip/adamw_state_test.go
  • go/pkg/hip/adamw_update_pass.go
  • go/pkg/hip/algorithm_profile.go
  • go/pkg/hip/architecture.go
  • go/pkg/hip/architecture_registry.go
  • go/pkg/hip/attached_drafter_status.go
  • go/pkg/hip/attached_drafter_textmodel.go
  • go/pkg/hip/backend.go
  • go/pkg/hip/backend_example_test.go
  • go/pkg/hip/backend_test.go
  • go/pkg/hip/cache.go
  • go/pkg/hip/cache_example_test.go
  • go/pkg/hip/cache_factory_route.go
  • go/pkg/hip/cache_profile.go
  • go/pkg/hip/cache_profile_legacy.go
  • go/pkg/hip/cache_profile_runtime.go
  • go/pkg/hip/cache_test.go
  • go/pkg/hip/compat_handlers.go
  • go/pkg/hip/coverage_contract_test.go
  • go/pkg/hip/dataset_jsonl.go
  • go/pkg/hip/decode_helpers.go
  • go/pkg/hip/decode_helpers_example_test.go
  • go/pkg/hip/decode_reference.go
  • go/pkg/hip/decode_reference_test.go
  • go/pkg/hip/dense_config.go
  • go/pkg/hip/discover.go
  • go/pkg/hip/discover_example_test.go
  • go/pkg/hip/discover_test.go
  • go/pkg/hip/distillation_adamw_update_pass.go
  • go/pkg/hip/distillation_loss_pass.go
  • go/pkg/hip/draft_detect.go
  • go/pkg/hip/embedding_model.go
  • go/pkg/hip/embedding_reference.go
  • go/pkg/hip/embedding_reference_test.go
  • go/pkg/hip/gemma4_assistant_config.go
  • go/pkg/hip/gemma4_capability_labels.go
  • go/pkg/hip/gemma4_chat_template.go
  • go/pkg/hip/gemma4_engine_features.go
  • go/pkg/hip/gemma4_engine_features_test.go
  • go/pkg/hip/gemma4_lora_adapter.go
  • go/pkg/hip/gemma4_lora_policy.go
  • go/pkg/hip/gemma4_model_features_bridge.go
  • go/pkg/hip/gemma4_model_pack.go
  • go/pkg/hip/gemma4_model_pack_portable.go
  • go/pkg/hip/gemma4_mtp_assistant.go
  • go/pkg/hip/gemma4_mtp_labels.go
  • go/pkg/hip/gemma4_mtp_plan_identity.go
  • go/pkg/hip/gemma4_mtp_validation.go
  • go/pkg/hip/gemma4_native_config.go
  • go/pkg/hip/gemma4_production_quantization.go
  • go/pkg/hip/gemma4_quantization_tier.go
  • go/pkg/hip/gemma4_runtime_context.go
  • go/pkg/hip/gemma4_size_quant_matrix.go
  • go/pkg/hip/gemma4_unified_model_pack_test.go
  • go/pkg/hip/grpo_adamw_update_pass.go
  • go/pkg/hip/grpo_advantage_pass.go
  • go/pkg/hip/grpo_policy_loss_pass.go
  • go/pkg/hip/hip_adamw_launch.go
  • go/pkg/hip/hip_attached_drafter_block.go
  • go/pkg/hip/hip_attached_drafter_draft_step.go
  • go/pkg/hip/hip_attached_drafter_generate.go
  • go/pkg/hip/hip_attached_drafter_layer.go
  • go/pkg/hip/hip_attached_drafter_preflight.go
  • go/pkg/hip/hip_attached_drafter_verifier_plan.go
  • go/pkg/hip/hip_autoround_quant_launch.go
  • go/pkg/hip/hip_codebook_launch.go
  • go/pkg/hip/hip_codebook_launch_test.go
  • go/pkg/hip/hip_driver_cgo.go
  • go/pkg/hip/hip_driver_cgo_test.go
  • go/pkg/hip/hip_driver_fake_test.go
  • go/pkg/hip/hip_driver_nocgo.go
  • go/pkg/hip/hip_embedding_launch.go
  • go/pkg/hip/hip_embedding_launch_test.go
  • go/pkg/hip/hip_gemma4_q4_engine_config.go
  • go/pkg/hip/hip_gemma4_q4_generation_limits.go
  • go/pkg/hip/hip_gemma4_q4_kv.go
  • go/pkg/hip/hip_gemma4_q4_layer.go
  • go/pkg/hip/hip_gemma4_q4_package.go
  • go/pkg/hip/hip_gemma4_q4_prefill.go
  • go/pkg/hip/hip_hardware_test.go
  • go/pkg/hip/hip_jangtq_launch.go
  • go/pkg/hip/hip_jangtq_launch_test.go
  • go/pkg/hip/hip_kernel_module.go
  • go/pkg/hip/hip_kernel_source_test.go
  • go/pkg/hip/hip_kernels.go
  • go/pkg/hip/hip_kernels_stub.go
  • go/pkg/hip/hip_kernels_test.go
  • go/pkg/hip/hip_kv_device.go
  • go/pkg/hip/hip_launch.go
  • go/pkg/hip/hip_lora_launch.go
  • go/pkg/hip/hip_lora_launch_test.go
  • go/pkg/hip/hip_lora_model.go
  • go/pkg/hip/hip_lora_model_example_test.go
  • go/pkg/hip/hip_lora_model_test.go
  • go/pkg/hip/hip_moe_launch.go
  • go/pkg/hip/hip_moe_launch_test.go
  • go/pkg/hip/hip_native_kernels.go
  • go/pkg/hip/hip_projection_launch.go
  • go/pkg/hip/hip_projection_reference.go
  • go/pkg/hip/hip_projection_reference_test.go
  • go/pkg/hip/hip_runtime.go
  • go/pkg/hip/hip_runtime_test.go
  • go/pkg/hip/hip_sequence_mixer.go
  • go/pkg/hip/hip_small_decode.go
  • go/pkg/hip/hip_small_decode_test.go
  • go/pkg/hip/hip_tiny_model.go
  • go/pkg/hip/hip_token_text.go
  • go/pkg/hip/hip_token_text_test.go
  • go/pkg/hip/hip_tokens.go
  • go/pkg/hip/hip_training_launch.go
  • go/pkg/hip/hip_training_launch_test.go
  • go/pkg/hip/hip_transformer_launch.go
  • go/pkg/hip/hip_transformer_reference.go
  • go/pkg/hip/hip_transformer_reference_test.go
  • go/pkg/hip/hybrid_attention.go
  • go/pkg/hip/import_boundary_test.go
  • go/pkg/hip/inference_benchmark_test.go
  • go/pkg/hip/internal/gguf/gguf.go
  • go/pkg/hip/internal/gguf/gguf_example_test.go
  • go/pkg/hip/internal/gguf/gguf_test.go
  • go/pkg/hip/internal/llamacpp/client.go
  • go/pkg/hip/internal/llamacpp/health.go
  • go/pkg/hip/internal/registry/ordered.go
  • go/pkg/hip/kernels/README.md
  • go/pkg/hip/kernels/rocm_kernels.hip
  • go/pkg/hip/kv_cache.go
  • go/pkg/hip/kv_cache_manifest.go
  • go/pkg/hip/kv_cache_raw.go
  • go/pkg/hip/kv_cache_test.go
  • go/pkg/hip/load_config.go
  • go/pkg/hip/lora_adamw_update_pass.go
  • go/pkg/hip/lora_adapter_snapshot.go
  • go/pkg/hip/lora_fuse.go
  • go/pkg/hip/lora_fuse_types.go
  • go/pkg/hip/lora_reference.go
  • go/pkg/hip/lora_reference_test.go
  • go/pkg/hip/memorypretrain/artifacts.go
  • go/pkg/hip/memorypretrain/bank_file.go
  • go/pkg/hip/memorypretrain/dataset_cluster_ids.go
  • go/pkg/hip/memorypretrain/ffn_memory.go
  • go/pkg/hip/memorypretrain/ffn_memory_file.go
  • go/pkg/hip/memorypretrain/ffn_memory_runtime.go
  • go/pkg/hip/memorypretrain/memorypretrain.go
  • go/pkg/hip/model.go
  • go/pkg/hip/model/architecture/profile.go
  • go/pkg/hip/model/attached_drafter.go
  • go/pkg/hip/model/builtin/register.go
  • go/pkg/hip/model/cache.go
  • go/pkg/hip/model/cache_profile.go
  • go/pkg/hip/model/config_probe.go
  • go/pkg/hip/model/diffusion.go
  • go/pkg/hip/model/features.go
  • go/pkg/hip/model/files.go
  • go/pkg/hip/model/gemma4/assistant_policy.go
  • go/pkg/hip/model/gemma4/attention_window.go
  • go/pkg/hip/model/gemma4/cache_profile.go
  • go/pkg/hip/model/gemma4/cache_topology.go
  • go/pkg/hip/model/gemma4/chat_template.go
  • go/pkg/hip/model/gemma4/diffusion_policy.go
  • go/pkg/hip/model/gemma4/features.go
  • go/pkg/hip/model/gemma4/identity_quant.go
  • go/pkg/hip/model/gemma4/lora_policy.go
  • go/pkg/hip/model/gemma4/multimodal_policy.go
  • go/pkg/hip/model/gemma4/processor_policy.go
  • go/pkg/hip/model/gemma4/production_quantization.go
  • go/pkg/hip/model/gemma4/profile.go
  • go/pkg/hip/model/gemma4/qat_collection.go
  • go/pkg/hip/model/gemma4/rope_policy.go
  • go/pkg/hip/model/gemma4/size_quant.go
  • go/pkg/hip/model/gemma4/structure_plan.go
  • go/pkg/hip/model/gemma4/thinking.go
  • go/pkg/hip/model/gemma4/weight_policy.go
  • go/pkg/hip/model/info.go
  • go/pkg/hip/model/loader.go
  • go/pkg/hip/model/lora.go
  • go/pkg/hip/model/multimodal.go
  • go/pkg/hip/model/profile.go
  • go/pkg/hip/model/quant.go
  • go/pkg/hip/model/routes.go
  • go/pkg/hip/model/runtime_author.go
  • go/pkg/hip/model/runtime_contract.go
  • go/pkg/hip/model/runtime_gate.go
  • go/pkg/hip/model/sequence_mixer.go
  • go/pkg/hip/model/sequence_mixer_config.go
  • go/pkg/hip/model/state_context.go
  • go/pkg/hip/model/tokenizer.go
  • go/pkg/hip/model_attached_drafter_route.go
  • go/pkg/hip/model_builtin_factories.go
  • go/pkg/hip/model_capability_report.go
  • go/pkg/hip/model_config_probe.go
  • go/pkg/hip/model_diffusion_route.go
  • go/pkg/hip/model_example_test.go
  • go/pkg/hip/model_feature_route.go
  • go/pkg/hip/model_files.go
  • go/pkg/hip/model_info.go
  • go/pkg/hip/model_load_status.go
  • go/pkg/hip/model_lora_route.go
  • go/pkg/hip/model_multimodal_route.go
  • go/pkg/hip/model_pack.go
  • go/pkg/hip/model_pack_api.go
  • go/pkg/hip/model_pack_api_stub.go
  • go/pkg/hip/model_pack_profile.go
  • go/pkg/hip/model_profile_factory.go
  • go/pkg/hip/model_registry.go
  • go/pkg/hip/model_registry_api.go
  • go/pkg/hip/model_registry_generic.go
  • go/pkg/hip/model_registry_portable.go
  • go/pkg/hip/model_registry_snapshot.go
  • go/pkg/hip/model_route_plan.go
  • go/pkg/hip/model_route_set.go
  • go/pkg/hip/model_runtime_contract_route.go
  • go/pkg/hip/model_slice.go
  • go/pkg/hip/model_state_context_route.go
  • go/pkg/hip/model_test.go
  • go/pkg/hip/model_tokenizer_route.go
  • go/pkg/hip/moe_quant_reference.go
  • go/pkg/hip/moe_quant_reference_test.go
  • go/pkg/hip/moe_runtime.go
  • go/pkg/hip/native.go
  • go/pkg/hip/native_capability_example_test.go
  • go/pkg/hip/native_contract_test.go
  • go/pkg/hip/native_model_loader.go
  • go/pkg/hip/native_model_loader_api.go
  • go/pkg/hip/native_model_loader_portable.go
  • go/pkg/hip/native_optional_example_test.go
  • go/pkg/hip/openai.go
  • go/pkg/hip/parser_registry.go
  • go/pkg/hip/parser_registry_example_test.go
  • go/pkg/hip/parser_registry_test.go
  • go/pkg/hip/portable_contract_stub.go
  • go/pkg/hip/probe_reference.go
  • go/pkg/hip/probe_reference_test.go
  • go/pkg/hip/production_architecture_status.go
  • go/pkg/hip/production_combined.go
  • go/pkg/hip/production_fast_lane.go
  • go/pkg/hip/production_fast_lane_stub.go
  • go/pkg/hip/production_lane.go
  • go/pkg/hip/production_metrics.go
  • go/pkg/hip/production_mtp.go
  • go/pkg/hip/production_mtp_test.go
  • go/pkg/hip/production_quantization_lock.go
  • go/pkg/hip/production_turboquant.go
  • go/pkg/hip/profile/algorithm.go
  • go/pkg/hip/profile/architecture.go
  • go/pkg/hip/profile/gemma4_architecture.go
  • go/pkg/hip/profile/gemma4_lora.go
  • go/pkg/hip/profile/gemma4_weight.go
  • go/pkg/hip/profile/resolve.go
  • go/pkg/hip/quant_loader_route.go
  • go/pkg/hip/quant_scheme.go
  • go/pkg/hip/reactive_sequence_mixer.go
  • go/pkg/hip/register_rocm.go
  • go/pkg/hip/register_rocm_example_test.go
  • go/pkg/hip/register_rocm_test.go
  • go/pkg/hip/result_helpers_test.go
  • go/pkg/hip/retained_state_api.go
  • go/pkg/hip/rocm.go
  • go/pkg/hip/rocm_engine_features.go
  • go/pkg/hip/rocm_example_test.go
  • go/pkg/hip/rocm_stub.go
  • go/pkg/hip/rocm_stub_example_test.go
  • go/pkg/hip/rocm_stub_test.go
  • go/pkg/hip/rocm_test.go
  • go/pkg/hip/runtime_author_native.go
  • go/pkg/hip/runtime_gate.go
  • go/pkg/hip/runtime_lane.go
  • go/pkg/hip/runtime_lane_backend.go
  • go/pkg/hip/scheduler.go
  • go/pkg/hip/scheduler_example_test.go
  • go/pkg/hip/scheduler_test.go
  • go/pkg/hip/scheme/builtin.go
  • go/pkg/hip/scheme/scheme.go
  • go/pkg/hip/sequence_mixer.go
  • go/pkg/hip/sequence_mixer_route.go
  • go/pkg/hip/server.go
  • go/pkg/hip/sft_adamw_update_pass.go
  • go/pkg/hip/sft_loss_pass.go
  • go/pkg/hip/simple_self_distillation.go
  • go/pkg/hip/simple_self_distillation_manifest.go
  • go/pkg/hip/simple_self_distillation_memory_pretrain.go
  • go/pkg/hip/state_bundle.go
  • go/pkg/hip/state_session.go
  • go/pkg/hip/state_session_example_test.go
  • go/pkg/hip/state_session_gemma4_q4.go
  • go/pkg/hip/state_session_test.go
  • go/pkg/hip/string_helpers.go
  • go/pkg/hip/token_loop_contract.go
  • go/pkg/hip/token_loop_native.go
  • go/pkg/hip/training_kernels.go
  • go/pkg/hip/training_reference.go
  • go/pkg/hip/training_reference_test.go
  • go/pkg/hip/tuning.go
  • go/pkg/hip/tuning_device_native.go
  • go/pkg/hip/tuning_device_portable.go
  • go/pkg/hip/turboquant_kv.go
  • go/pkg/hip/vram.go
  • go/pkg/hip/vram_example_test.go
  • go/pkg/hip/vram_test.go
  • go/pkg/memvid/cli/store.go
  • go/pkg/memvid/cli/store_example_test.go
  • go/pkg/memvid/cli/store_test.go
  • go/pkg/memvid/memvid.go
  • go/pkg/memvid/memvid_example_test.go
  • go/pkg/memvid/memvid_test.go
  • go/pkg/memvid/stub.go
  • go/pkg/metal/activation_bridge.cpp
  • go/pkg/metal/array.go
  • go/pkg/metal/array_bench_test.go
  • go/pkg/metal/array_dtype_cover_test.go
  • go/pkg/metal/array_example_test.go
  • go/pkg/metal/array_test.go
  • go/pkg/metal/attention.go
  • go/pkg/metal/attention_inspect.go
  • go/pkg/metal/attention_inspect_bench_test.go
  • go/pkg/metal/attention_inspect_example_test.go
  • go/pkg/metal/attention_inspect_test.go
  • go/pkg/metal/autoround_dequant.go
  • go/pkg/metal/autoround_dequant_test.go
  • go/pkg/metal/autoround_projection.go
  • go/pkg/metal/backend.go
  • go/pkg/metal/backend_example_test.go
  • go/pkg/metal/backend_test.go
  • go/pkg/metal/batch.go
  • go/pkg/metal/batch_example_test.go
  • go/pkg/metal/batch_model_eval_test.go
  • go/pkg/metal/batch_test.go
  • go/pkg/metal/bench_test.go
  • go/pkg/metal/cache.go
  • go/pkg/metal/cache_accessor_test.go
  • go/pkg/metal/cache_bench_test.go
  • go/pkg/metal/cache_clone.go
  • go/pkg/metal/cache_clone_cover_test.go
  • go/pkg/metal/cache_clone_test.go
  • go/pkg/metal/cache_compaction.go
  • go/pkg/metal/cache_compaction_scheme_test.go
  • go/pkg/metal/cache_core_cover_test.go
  • go/pkg/metal/cache_decode_bench_test.go
  • go/pkg/metal/cache_example_test.go
  • go/pkg/metal/cache_factory.go
  • go/pkg/metal/cache_factory_test.go
  • go/pkg/metal/cache_fixed_metal.go
  • go/pkg/metal/cache_latent.go
  • go/pkg/metal/cache_latent_cover_test.go
  • go/pkg/metal/cache_latent_test.go
  • go/pkg/metal/cache_pending_test.go
  • go/pkg/metal/cache_profile.go
  • go/pkg/metal/cache_profile_test.go
  • go/pkg/metal/cache_quantized.go
  • go/pkg/metal/cache_recurrent.go
  • go/pkg/metal/cache_recurrent_cover_test.go
  • go/pkg/metal/cache_recurrent_test.go
  • go/pkg/metal/cache_restore_diff_test.go
  • go/pkg/metal/cache_scheme.go
  • go/pkg/metal/cache_scheme_width_test.go
  • go/pkg/metal/cache_sparse.go
  • go/pkg/metal/cache_sparse_test.go
  • go/pkg/metal/cache_speculative.go
  • go/pkg/metal/cache_speculative_test.go
  • go/pkg/metal/cache_test.go
  • go/pkg/metal/cache_turboquant_scheme.go
  • go/pkg/metal/chat_format.go
  • go/pkg/metal/close.go
  • go/pkg/metal/close_test.go
  • go/pkg/metal/codebook_vq.go
  • go/pkg/metal/codebook_vq_test.go
  • go/pkg/metal/compile.go
  • go/pkg/metal/compile_example_test.go
  • go/pkg/metal/compile_test.go
  • go/pkg/metal/compiled_hits.go
  • go/pkg/metal/compiled_mlp.go
  • go/pkg/metal/compiled_nested_attention_test.go
  • go/pkg/metal/config_helpers.go
  • go/pkg/metal/config_helpers_example_test.go
  • go/pkg/metal/config_helpers_test.go
  • go/pkg/metal/copy_test.go
  • go/pkg/metal/coverage_eval_test.go
  • go/pkg/metal/decode.go
  • go/pkg/metal/decode_bridge.cpp
  • go/pkg/metal/decode_bridge.h
  • go/pkg/metal/decode_fast_cover_test.go
  • go/pkg/metal/decode_geometry_probe_test.go
  • go/pkg/metal/decode_loop_bench_test.go
  • go/pkg/metal/decode_replay.go
  • go/pkg/metal/decode_replay_bridge.cpp
  • go/pkg/metal/decode_replay_bridge.h
  • go/pkg/metal/decode_replay_test.go
  • go/pkg/metal/decode_test.go
  • go/pkg/metal/dense_config.go
  • go/pkg/metal/dense_config_test.go
  • go/pkg/metal/dense_matvec.go
  • go/pkg/metal/dense_matvec_bench_test.go
  • go/pkg/metal/dense_matvec_q6.go
  • go/pkg/metal/dense_matvec_test.go
  • go/pkg/metal/detach.cpp
  • go/pkg/metal/detach.go
  • go/pkg/metal/detach_example_test.go
  • go/pkg/metal/device.go
  • go/pkg/metal/device_cache_test.go
  • go/pkg/metal/diffusion_route.go
  • go/pkg/metal/dtype.go
  • go/pkg/metal/dtype_example_test.go
  • go/pkg/metal/engine_features.go
  • go/pkg/metal/engine_features_test.go
  • go/pkg/metal/error_test.go
  • go/pkg/metal/eval_outputs_bench_test.go
  • go/pkg/metal/eval_worker.go
  • go/pkg/metal/expert_id_matvec_bench_test.go
  • go/pkg/metal/expert_id_matvec_test.go
  • go/pkg/metal/export.go
  • go/pkg/metal/export_example_test.go
  • go/pkg/metal/export_test.go
  • go/pkg/metal/fast.go
  • go/pkg/metal/fast_bench_test.go
  • go/pkg/metal/fast_example_test.go
  • go/pkg/metal/fast_test.go
  • go/pkg/metal/ffn_memory.go
  • go/pkg/metal/fixed_kv_retire_bench_test.go
  • go/pkg/metal/fuse.go
  • go/pkg/metal/fuse_test.go
  • go/pkg/metal/gather_blocks.go
  • go/pkg/metal/gather_blocks_test.go
  • go/pkg/metal/gc.go
  • go/pkg/metal/gc_example_test.go
  • go/pkg/metal/gc_test.go
  • go/pkg/metal/generate.go
  • go/pkg/metal/generate_budget_test.go
  • go/pkg/metal/generate_caches.go
  • go/pkg/metal/generate_caches_test.go
  • go/pkg/metal/generate_example_test.go
  • go/pkg/metal/generate_fixed_regime_test.go
  • go/pkg/metal/generate_growth_bench_test.go
  • go/pkg/metal/generate_model_eval_test.go
  • go/pkg/metal/generate_model_test.go
  • go/pkg/metal/generate_prefetch.go
  • go/pkg/metal/generate_prefetch_bench_test.go
  • go/pkg/metal/generate_prefetch_test.go
  • go/pkg/metal/generate_test.go
  • go/pkg/metal/gguf.go
  • go/pkg/metal/gguf_bridge.cpp
  • go/pkg/metal/gguf_example_test.go
  • go/pkg/metal/gguf_roundtrip_test.go
  • go/pkg/metal/gguflib_impl.c
  • go/pkg/metal/grad.go
  • go/pkg/metal/grad_example_test.go
  • go/pkg/metal/grad_test.go
  • go/pkg/metal/hybrid_attention.go
  • go/pkg/metal/hybrid_attention_bench_test.go
  • go/pkg/metal/hybrid_attention_test.go
  • go/pkg/metal/io.go
  • go/pkg/metal/io_custom.go
  • go/pkg/metal/io_custom_example_test.go
  • go/pkg/metal/io_custom_test.go
  • go/pkg/metal/io_example_test.go
  • go/pkg/metal/iter_test.go
  • go/pkg/metal/jang_dequant.go
  • go/pkg/metal/jang_dequant_test.go
  • go/pkg/metal/kv_cache_bench_test.go
  • go/pkg/metal/kv_snapshot.go
  • go/pkg/metal/kv_snapshot_example_test.go
  • go/pkg/metal/kv_snapshot_test.go
  • go/pkg/metal/linalg_op.go
  • go/pkg/metal/linalg_op_test.go
  • go/pkg/metal/linear_load.go
  • go/pkg/metal/lm_head_topk.go
  • go/pkg/metal/lm_head_topk_bridge.cpp
  • go/pkg/metal/lm_head_topk_bridge.h
  • go/pkg/metal/lm_head_topk_test.go
  • go/pkg/metal/lora.go
  • go/pkg/metal/lora_example_test.go
  • go/pkg/metal/lora_merge.go
  • go/pkg/metal/lora_merge_cover_test.go
  • go/pkg/metal/lora_merge_example_test.go
  • go/pkg/metal/lora_test.go
  • go/pkg/metal/metal.go
  • go/pkg/metal/metal_example_test.go
  • go/pkg/metal/metal_kernel.go
  • go/pkg/metal/metal_kernel_example_test.go
  • go/pkg/metal/metal_kernel_test.go
  • go/pkg/metal/metal_runtime_test.go
  • go/pkg/metal/mixer.go
  • go/pkg/metal/mixer_compaction_cover_test.go
  • go/pkg/metal/mixer_registry.go
  • go/pkg/metal/mixer_registry_example_test.go
  • go/pkg/metal/mixer_registry_test.go
  • go/pkg/metal/mlx_build_config.h
  • go/pkg/metal/mlx_gen_cpu_compiled_preamble.cpp
  • go/pkg/metal/mlx_gen_metal_jit_binary_ops.cpp
  • go/pkg/metal/mlx_gen_metal_jit_gather.cpp
  • go/pkg/metal/mlx_gen_metal_jit_gather_axis.cpp
  • go/pkg/metal/mlx_gen_metal_jit_gather_front.cpp
  • go/pkg/metal/mlx_gen_metal_jit_hadamard.cpp
  • go/pkg/metal/mlx_gen_metal_jit_masked_scatter.cpp
  • go/pkg/metal/mlx_gen_metal_jit_reduce_utils.cpp
  • go/pkg/metal/mlx_gen_metal_jit_scatter.cpp
  • go/pkg/metal/mlx_gen_metal_jit_scatter_axis.cpp
  • go/pkg/metal/mlx_gen_metal_jit_ternary_ops.cpp
  • go/pkg/metal/mlx_gen_metal_jit_unary_ops.cpp
  • go/pkg/metal/mlx_gen_metal_jit_utils.cpp
  • go/pkg/metal/mlx_mlx_array.cpp
  • go/pkg/metal/mlx_mlx_backend_common_broadcasting.cpp
  • go/pkg/metal/mlx_mlx_backend_common_common.cpp
  • go/pkg/metal/mlx_mlx_backend_common_compiled.cpp
  • go/pkg/metal/mlx_mlx_backend_common_load.cpp
  • go/pkg/metal/mlx_mlx_backend_common_reduce.cpp
  • go/pkg/metal/mlx_mlx_backend_common_slicing.cpp
  • go/pkg/metal/mlx_mlx_backend_common_utils.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_arg_reduce.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_available.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_binary.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_cholesky.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_compiled.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_conv.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_copy.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_distributed.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_eig.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_eigh.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_encoder.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_eval.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_fft.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_gemms_bnns.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_gemms_cblas.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_hadamard.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_indexing.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_inverse.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_jit_compiler.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_logsumexp.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_luf.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_masked_mm.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_matmul.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_primitives.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_qrf.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_quantized.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_reduce.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_scan.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_select.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_softmax.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_sort.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_svd.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_threefry.cpp
  • go/pkg/metal/mlx_mlx_backend_cpu_unary.cpp
  • go/pkg/metal/mlx_mlx_backend_cuda_no_cuda.cpp
  • go/pkg/metal/mlx_mlx_backend_gpu_copy.cpp
  • go/pkg/metal/mlx_mlx_backend_gpu_device_info.cpp
  • go/pkg/metal/mlx_mlx_backend_gpu_primitives.cpp
  • go/pkg/metal/mlx_mlx_backend_gpu_slicing.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_allocator.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_binary.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_compiled.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_conv.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_copy.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_custom_kernel.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_device.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_distributed.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_eval.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_event.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_fence.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_fft.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_hadamard.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_indexing.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_logsumexp.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_matmul.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_metal.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_nojit_kernels.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_normalization.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_primitives.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_quantized.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_reduce.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_resident.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_rope.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_scaled_dot_product_attention.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_scan.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_slicing.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_softmax.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_sort.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_ternary.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_unary.cpp
  • go/pkg/metal/mlx_mlx_backend_metal_utils.cpp
  • go/pkg/metal/mlx_mlx_compile.cpp
  • go/pkg/metal/mlx_mlx_device.cpp
  • go/pkg/metal/mlx_mlx_distributed_distributed.cpp
  • go/pkg/metal/mlx_mlx_distributed_jaccl_no_jaccl.cpp
  • go/pkg/metal/mlx_mlx_distributed_mpi_no_mpi.cpp
  • go/pkg/metal/mlx_mlx_distributed_nccl_no_nccl.cpp
  • go/pkg/metal/mlx_mlx_distributed_ops.cpp
  • go/pkg/metal/mlx_mlx_distributed_primitives.cpp
  • go/pkg/metal/mlx_mlx_distributed_ring_no_ring.cpp
  • go/pkg/metal/mlx_mlx_distributed_utils.cpp
  • go/pkg/metal/mlx_mlx_dtype.cpp
  • go/pkg/metal/mlx_mlx_dtype_utils.cpp
  • go/pkg/metal/mlx_mlx_einsum.cpp
  • go/pkg/metal/mlx_mlx_export.cpp
  • go/pkg/metal/mlx_mlx_fast.cpp
  • go/pkg/metal/mlx_mlx_fft.cpp
  • go/pkg/metal/mlx_mlx_graph_utils.cpp
  • go/pkg/metal/mlx_mlx_io_gguf.cpp
  • go/pkg/metal/mlx_mlx_io_gguf_quants.cpp
  • go/pkg/metal/mlx_mlx_io_load.cpp
  • go/pkg/metal/mlx_mlx_io_no_gguf.cpp
  • go/pkg/metal/mlx_mlx_io_safetensors.cpp
  • go/pkg/metal/mlx_mlx_linalg.cpp
  • go/pkg/metal/mlx_mlx_ops.cpp
  • go/pkg/metal/mlx_mlx_primitives.cpp
  • go/pkg/metal/mlx_mlx_random.cpp
  • go/pkg/metal/mlx_mlx_scheduler.cpp
  • go/pkg/metal/mlx_mlx_stream.cpp
  • go/pkg/metal/mlx_mlx_transforms.cpp
  • go/pkg/metal/mlx_mlx_utils.cpp
  • go/pkg/metal/mlx_mlx_version.cpp
  • go/pkg/metal/mlxc_array.cpp
  • go/pkg/metal/mlxc_closure.cpp
  • go/pkg/metal/mlxc_compile.cpp
  • go/pkg/metal/mlxc_device.cpp
  • go/pkg/metal/mlxc_distributed.cpp
  • go/pkg/metal/mlxc_distributed_group.cpp
  • go/pkg/metal/mlxc_error.cpp
  • go/pkg/metal/mlxc_export.cpp
  • go/pkg/metal/mlxc_fast.cpp
  • go/pkg/metal/mlxc_fft.cpp
  • go/pkg/metal/mlxc_io.cpp
  • go/pkg/metal/mlxc_io_types.cpp
  • go/pkg/metal/mlxc_linalg.cpp
  • go/pkg/metal/mlxc_map.cpp
  • go/pkg/metal/mlxc_memory.cpp
  • go/pkg/metal/mlxc_metal.cpp
  • go/pkg/metal/mlxc_ops.cpp
  • go/pkg/metal/mlxc_random.cpp
  • go/pkg/metal/mlxc_stream.cpp
  • go/pkg/metal/mlxc_string.cpp
  • go/pkg/metal/mlxc_transforms.cpp
  • go/pkg/metal/mlxc_transforms_impl.cpp
  • go/pkg/metal/mlxc_vector.cpp
  • go/pkg/metal/mlxc_version.cpp
  • go/pkg/metal/model.go
  • go/pkg/metal/model/bert/bert.go
  • go/pkg/metal/model/bert/bert_coverage_test.go
  • go/pkg/metal/model/bert/bert_example_test.go
  • go/pkg/metal/model/bert/bert_test.go
  • go/pkg/metal/model/composed/composed.go
  • go/pkg/metal/model/composed/composed_bench_test.go
  • go/pkg/metal/model/composed/composed_coverage_test.go
  • go/pkg/metal/model/composed/composed_example_test.go
  • go/pkg/metal/model/composed/composed_test.go
  • go/pkg/metal/model/deepseek/deepseek.go
  • go/pkg/metal/model/deepseek/deepseek_coverage_test.go
  • go/pkg/metal/model/deepseek/deepseek_example_test.go
  • go/pkg/metal/model/deepseek/deepseek_test.go
  • go/pkg/metal/model/deltanet/builder.go
  • go/pkg/metal/model/deltanet/builder_test.go
  • go/pkg/metal/model/deltanet/chunked.go
  • go/pkg/metal/model/deltanet/deltanet.go
  • go/pkg/metal/model/deltanet/deltanet_bench_test.go
  • go/pkg/metal/model/deltanet/deltanet_coverage_test.go
  • go/pkg/metal/model/deltanet/deltanet_test.go
  • go/pkg/metal/model/deltanet/mixer.go
  • go/pkg/metal/model/gemma3/chat/gemma3chat.go
  • go/pkg/metal/model/gemma3/chat/gemma3chat_bench_test.go
  • go/pkg/metal/model/gemma3/chat/gemma3chat_test.go
  • go/pkg/metal/model/gemma3/close.go
  • go/pkg/metal/model/gemma3/close_test.go
  • go/pkg/metal/model/gemma3/gemma3.go
  • go/pkg/metal/model/gemma3/gemma3_bench_test.go
  • go/pkg/metal/model/gemma3/gemma3_example_test.go
  • go/pkg/metal/model/gemma3/gemma3_test.go
  • go/pkg/metal/model/gemma3/methods.go
  • go/pkg/metal/model/gemma3/model_test.go
  • go/pkg/metal/model/gemma3/train_test.go
  • go/pkg/metal/model/gemma4/arch_parity_test.go
  • go/pkg/metal/model/gemma4/assistant.go
  • go/pkg/metal/model/gemma4/assistant_decode.go
  • go/pkg/metal/model/gemma4/assistant_decode_bench_test.go
  • go/pkg/metal/model/gemma4/assistant_decode_example_test.go
  • go/pkg/metal/model/gemma4/assistant_decode_extra_bench_test.go
  • go/pkg/metal/model/gemma4/assistant_decode_extra_test.go
  • go/pkg/metal/model/gemma4/assistant_decode_func_test.go
  • go/pkg/metal/model/gemma4/assistant_decode_lanes_test.go
  • go/pkg/metal/model/gemma4/assistant_decode_test.go
  • go/pkg/metal/model/gemma4/assistant_generate.go
  • go/pkg/metal/model/gemma4/assistant_generate_cache_test.go
  • go/pkg/metal/model/gemma4/assistant_generate_test.go
  • go/pkg/metal/model/gemma4/assistant_gguf.go
  • go/pkg/metal/model/gemma4/assistant_gguf_test.go
  • go/pkg/metal/model/gemma4/assistant_native_parity_test.go
  • go/pkg/metal/model/gemma4/assistant_ordered_embedding_bench_test.go
  • go/pkg/metal/model/gemma4/assistant_ordered_embedding_test.go
  • go/pkg/metal/model/gemma4/assistant_ordered_logits_test.go
  • go/pkg/metal/model/gemma4/assistant_pair.go
  • go/pkg/metal/model/gemma4/assistant_quant_test.go
  • go/pkg/metal/model/gemma4/assistant_test.go
  • go/pkg/metal/model/gemma4/assistant_verify.go
  • go/pkg/metal/model/gemma4/assistant_verify_test.go
  • go/pkg/metal/model/gemma4/attention.go
  • go/pkg/metal/model/gemma4/attention_bench_test.go
  • go/pkg/metal/model/gemma4/attention_cache_layout_test.go
  • go/pkg/metal/model/gemma4/attention_fixed_decode_bench_test.go
  • go/pkg/metal/model/gemma4/audio.go
  • go/pkg/metal/model/gemma4/audio_branch_residual_test.go
  • go/pkg/metal/model/gemma4/audio_encoder.go
  • go/pkg/metal/model/gemma4/audio_encoder_load.go
  • go/pkg/metal/model/gemma4/audio_encoder_test.go
  • go/pkg/metal/model/gemma4/audio_example_test.go
  • go/pkg/metal/model/gemma4/audio_features.go
  • go/pkg/metal/model/gemma4/audio_features_golden_test.go
  • go/pkg/metal/model/gemma4/audio_features_test.go
  • go/pkg/metal/model/gemma4/audio_multimodal_test.go
  • go/pkg/metal/model/gemma4/audio_splice_test.go
  • go/pkg/metal/model/gemma4/backend.go
  • go/pkg/metal/model/gemma4/backend_test.go
  • go/pkg/metal/model/gemma4/cache_profile_test.go
  • go/pkg/metal/model/gemma4/capability_test.go
  • go/pkg/metal/model/gemma4/chat/gemma4chat.go
  • go/pkg/metal/model/gemma4/chat/gemma4chat_bench_test.go
  • go/pkg/metal/model/gemma4/chat/gemma4chat_coverage_test.go
  • go/pkg/metal/model/gemma4/chat/gemma4chat_test.go
  • go/pkg/metal/model/gemma4/close.go
  • go/pkg/metal/model/gemma4/close_test.go
  • go/pkg/metal/model/gemma4/compiled_layer.go
  • go/pkg/metal/model/gemma4/compiled_layer_band_test.go
  • go/pkg/metal/model/gemma4/compiled_layer_bench_test.go
  • go/pkg/metal/model/gemma4/compiled_stack.go
  • go/pkg/metal/model/gemma4/config.go
  • go/pkg/metal/model/gemma4/config_parity_test.go
  • go/pkg/metal/model/gemma4/coverage_assistant_config_test.go
  • go/pkg/metal/model/gemma4/coverage_assistant_decode_paths_test.go
  • go/pkg/metal/model/gemma4/coverage_assistant_validate_test.go
  • go/pkg/metal/model/gemma4/coverage_assistant_verify_test.go
  • go/pkg/metal/model/gemma4/coverage_backend_head_test.go
  • go/pkg/metal/model/gemma4/coverage_compiled_consumer_test.go
  • go/pkg/metal/model/gemma4/coverage_compiled_dtype_test.go
  • go/pkg/metal/model/gemma4/coverage_compiled_layer_state_test.go
  • go/pkg/metal/model/gemma4/coverage_compiled_layer_test.go
  • go/pkg/metal/model/gemma4/coverage_compiled_pli_test.go
  • go/pkg/metal/model/gemma4/coverage_compiled_stack_test.go
  • go/pkg/metal/model/gemma4/coverage_config_masks_test.go
  • go/pkg/metal/model/gemma4/coverage_diffusion_serve_test.go
  • go/pkg/metal/model/gemma4/coverage_forward_fallback_test.go
  • go/pkg/metal/model/gemma4/coverage_live_generate_test.go
  • go/pkg/metal/model/gemma4/coverage_moe_test.go
  • go/pkg/metal/model/gemma4/coverage_pure_helpers_test.go
  • go/pkg/metal/model/gemma4/coverage_verify_guards_test.go
  • go/pkg/metal/model/gemma4/coverage_weights_pure_test.go
  • go/pkg/metal/model/gemma4/coverage_weights_test.go
  • go/pkg/metal/model/gemma4/decode_kernels_test.go
  • go/pkg/metal/model/gemma4/decoder_layer.go
  • go/pkg/metal/model/gemma4/diffusion.go
  • go/pkg/metal/model/gemma4/diffusion_generate.go
  • go/pkg/metal/model/gemma4/diffusion_generate_test.go
  • go/pkg/metal/model/gemma4/diffusion_live_test.go
  • go/pkg/metal/model/gemma4/diffusion_load_test.go
  • go/pkg/metal/model/gemma4/diffusion_serve.go
  • go/pkg/metal/model/gemma4/diffusion_step.go
  • go/pkg/metal/model/gemma4/diffusion_step_test.go
  • go/pkg/metal/model/gemma4/diffusion_test.go
  • go/pkg/metal/model/gemma4/doc.go
  • go/pkg/metal/model/gemma4/example_test.go
  • go/pkg/metal/model/gemma4/experts.go
  • go/pkg/metal/model/gemma4/experts_decode_bench_test.go
  • go/pkg/metal/model/gemma4/experts_id_matvec_test.go
  • go/pkg/metal/model/gemma4/experts_sorted_routes_test.go
  • go/pkg/metal/model/gemma4/experts_split_bench_test.go
  • go/pkg/metal/model/gemma4/forward.go
  • go/pkg/metal/model/gemma4/forward_alloc_bench_test.go
  • go/pkg/metal/model/gemma4/forward_example_test.go
  • go/pkg/metal/model/gemma4/forward_softcap_test.go
  • go/pkg/metal/model/gemma4/forward_test.go
  • go/pkg/metal/model/gemma4/gemma4.go
  • go/pkg/metal/model/gemma4/last_token_q6_test.go
  • go/pkg/metal/model/gemma4/load.go
  • go/pkg/metal/model/gemma4/load_example_test.go
  • go/pkg/metal/model/gemma4/load_synthetic_test.go
  • go/pkg/metal/model/gemma4/logit_softcap_bench_test.go
  • go/pkg/metal/model/gemma4/lora_test.go
  • go/pkg/metal/model/gemma4/masks.go
  • go/pkg/metal/model/gemma4/masks_combine_test.go
  • go/pkg/metal/model/gemma4/methods.go
  • go/pkg/metal/model/gemma4/methods_test.go
  • go/pkg/metal/model/gemma4/model.go
  • go/pkg/metal/model/gemma4/model_test.go
  • go/pkg/metal/model/gemma4/mtp_diag.go
  • go/pkg/metal/model/gemma4/perlayer.go
  • go/pkg/metal/model/gemma4/perlayer_bench_test.go
  • go/pkg/metal/model/gemma4/policy.go
  • go/pkg/metal/model/gemma4/policy_test.go
  • go/pkg/metal/model/gemma4/proportional_freqs_test.go
  • go/pkg/metal/model/gemma4/router.go
  • go/pkg/metal/model/gemma4/softmax_mixer.go
  • go/pkg/metal/model/gemma4/softmax_mixer_test.go
  • go/pkg/metal/model/gemma4/testhelpers_test.go
  • go/pkg/metal/model/gemma4/thinking.go
  • go/pkg/metal/model/gemma4/thinking_test.go
  • go/pkg/metal/model/gemma4/vision.go
  • go/pkg/metal/model/gemma4/vision_audio_branch_test.go
  • go/pkg/metal/model/gemma4/vision_chat.go
  • go/pkg/metal/model/gemma4/vision_chat_test.go
  • go/pkg/metal/model/gemma4/vision_example_test.go
  • go/pkg/metal/model/gemma4/vision_features.go
  • go/pkg/metal/model/gemma4/vision_features_golden_test.go
  • go/pkg/metal/model/gemma4/vision_features_test.go
  • go/pkg/metal/model/gemma4/vision_forward.go
  • go/pkg/metal/model/gemma4/vision_forward_branch_test.go
  • go/pkg/metal/model/gemma4/vision_forward_rope_test.go
  • go/pkg/metal/model/gemma4/vision_forward_test.go
  • go/pkg/metal/model/gemma4/vision_load.go
  • go/pkg/metal/model/gemma4/vision_load_test.go
  • go/pkg/metal/model/gemma4/vision_video_branch_test.go
  • go/pkg/metal/model/gemma4/weights.go
  • go/pkg/metal/model/gemma4/weights_canonical_test.go
  • go/pkg/metal/model/gla/builder.go
  • go/pkg/metal/model/gla/builder_test.go
  • go/pkg/metal/model/gla/gla.go
  • go/pkg/metal/model/gla/gla_bench_test.go
  • go/pkg/metal/model/gla/gla_test.go
  • go/pkg/metal/model/gla/mixer.go
  • go/pkg/metal/model/gptoss/close_test.go
  • go/pkg/metal/model/gptoss/gptoss.go
  • go/pkg/metal/model/gptoss/gptoss_bench_test.go
  • go/pkg/metal/model/gptoss/gptoss_example_test.go
  • go/pkg/metal/model/gptoss/gptoss_test.go
  • go/pkg/metal/model/gptoss/methods_test.go
  • go/pkg/metal/model/gsa/forward_oracle_test.go
  • go/pkg/metal/model/gsa/gsa.go
  • go/pkg/metal/model/gsa/gsa_bench_test.go
  • go/pkg/metal/model/gsa/gsa_test.go
  • go/pkg/metal/model/gsa/kernels.go
  • go/pkg/metal/model/gsa/loader.go
  • go/pkg/metal/model/gsa/loader_helpers_test.go
  • go/pkg/metal/model/gsa/loader_test.go
  • go/pkg/metal/model/gsa/register.go
  • go/pkg/metal/model/internal/flakernel/flakernel.go
  • go/pkg/metal/model/internal/flakernel/flakernel_test.go
  • go/pkg/metal/model/internal/flakernel/gated.go
  • go/pkg/metal/model/internal/flakernel/gated_test.go
  • go/pkg/metal/model/kimi/close.go
  • go/pkg/metal/model/kimi/kimi.go
  • go/pkg/metal/model/kimi/kimi_bench_test.go
  • go/pkg/metal/model/kimi/kimi_example_test.go
  • go/pkg/metal/model/kimi/kimi_test.go
  • go/pkg/metal/model/kimi/methods.go
  • go/pkg/metal/model/mamba2/block.go
  • go/pkg/metal/model/mamba2/chunk.go
  • go/pkg/metal/model/mamba2/chunk_test.go
  • go/pkg/metal/model/mamba2/cover_test.go
  • go/pkg/metal/model/mamba2/forward_test.go
  • go/pkg/metal/model/mamba2/loader.go
  • go/pkg/metal/model/mamba2/loader_test.go
  • go/pkg/metal/model/mamba2/mixer.go
  • go/pkg/metal/model/mamba2/mixer_test.go
  • go/pkg/metal/model/mamba2/register.go
  • go/pkg/metal/model/mamba2/scan.go
  • go/pkg/metal/model/mamba2/scan_test.go
  • go/pkg/metal/model/minimaxm2/minimax_m2.go
  • go/pkg/metal/model/minimaxm2/minimax_m2_bench_test.go
  • go/pkg/metal/model/minimaxm2/minimax_m2_coverage_test.go
  • go/pkg/metal/model/minimaxm2/minimax_m2_example_test.go
  • go/pkg/metal/model/minimaxm2/minimax_m2_test.go
  • go/pkg/metal/model/mixtral/close.go
  • go/pkg/metal/model/mixtral/close_test.go
  • go/pkg/metal/model/mixtral/methods.go
  • go/pkg/metal/model/mixtral/methods_test.go
  • go/pkg/metal/model/mixtral/mixtral.go
  • go/pkg/metal/model/mixtral/mixtral_bench_test.go
  • go/pkg/metal/model/mixtral/mixtral_example_test.go
  • go/pkg/metal/model/mixtral/mixtral_test.go
  • go/pkg/metal/model/mla/forward_oracle_test.go
  • go/pkg/metal/model/mla/loader.go
  • go/pkg/metal/model/mla/loader_test.go
  • go/pkg/metal/model/mla/mla.go
  • go/pkg/metal/model/mla/mla_bench_test.go
  • go/pkg/metal/model/mla/mla_test.go
  • go/pkg/metal/model/mla/register.go
  • go/pkg/metal/model/moba/forward_oracle_test.go
  • go/pkg/metal/model/moba/kernels.go
  • go/pkg/metal/model/moba/kernels_cover_test.go
  • go/pkg/metal/model/moba/loader.go
  • go/pkg/metal/model/moba/loader_cover_test.go
  • go/pkg/metal/model/moba/loader_test.go
  • go/pkg/metal/model/moba/moba.go
  • go/pkg/metal/model/moba/moba_bench_test.go
  • go/pkg/metal/model/moba/moba_test.go
  • go/pkg/metal/model/moba/register.go
  • go/pkg/metal/model/nsa/forward_oracle_test.go
  • go/pkg/metal/model/nsa/kernels.go
  • go/pkg/metal/model/nsa/kernels_coverage_test.go
  • go/pkg/metal/model/nsa/loader.go
  • go/pkg/metal/model/nsa/loader_coverage_test.go
  • go/pkg/metal/model/nsa/loader_test.go
  • go/pkg/metal/model/nsa/nsa.go
  • go/pkg/metal/model/nsa/nsa_bench_test.go
  • go/pkg/metal/model/nsa/nsa_test.go
  • go/pkg/metal/model/nsa/register.go
  • go/pkg/metal/model/qwen3/chat/qwen3chat.go
  • go/pkg/metal/model/qwen3/chat/qwen3chat_bench_test.go
  • go/pkg/metal/model/qwen3/chat/qwen3chat_emptyrole_test.go
  • go/pkg/metal/model/qwen3/chat/qwen3chat_test.go
  • go/pkg/metal/model/qwen3/close_test.go
  • go/pkg/metal/model/qwen3/gated_delta.go
  • go/pkg/metal/model/qwen3/gated_delta_coverage_test.go
  • go/pkg/metal/model/qwen3/gated_delta_loader.go
  • go/pkg/metal/model/qwen3/gated_delta_loader_test.go
  • go/pkg/metal/model/qwen3/gated_delta_mixer.go
  • go/pkg/metal/model/qwen3/gated_delta_mixer_test.go
  • go/pkg/metal/model/qwen3/gated_delta_test.go
  • go/pkg/metal/model/qwen3/moe_model_test.go
  • go/pkg/metal/model/qwen3/qwen3.go
  • go/pkg/metal/model/qwen3/qwen36.go
  • go/pkg/metal/model/qwen3/qwen36_example_test.go
  • go/pkg/metal/model/qwen3/qwen36_moe_staged.go
  • go/pkg/metal/model/qwen3/qwen36_moe_staged_example_test.go
  • go/pkg/metal/model/qwen3/qwen36_staged.go
  • go/pkg/metal/model/qwen3/qwen36_staged_coverage_test.go
  • go/pkg/metal/model/qwen3/qwen36_staged_example_test.go
  • go/pkg/metal/model/qwen3/qwen36_staged_test.go
  • go/pkg/metal/model/qwen3/qwen3_bench_test.go
  • go/pkg/metal/model/qwen3/qwen3_coverage_test.go
  • go/pkg/metal/model/qwen3/qwen3_example_test.go
  • go/pkg/metal/model/qwen3/qwen3_moe.go
  • go/pkg/metal/model/qwen3/qwen3_moe_coverage_test.go
  • go/pkg/metal/model/qwen3/qwen3_moe_example_test.go
  • go/pkg/metal/model/qwen3/qwen3_test.go
  • go/pkg/metal/model/retnet/builder.go
  • go/pkg/metal/model/retnet/builder_test.go
  • go/pkg/metal/model/retnet/mixer.go
  • go/pkg/metal/model/retnet/retnet.go
  • go/pkg/metal/model/retnet/retnet_bench_test.go
  • go/pkg/metal/model/retnet/retnet_test.go
  • go/pkg/metal/model/rwkv7/chunk.go
  • go/pkg/metal/model/rwkv7/chunk_test.go
  • go/pkg/metal/model/rwkv7/coverage_test.go
  • go/pkg/metal/model/rwkv7/forward_test.go
  • go/pkg/metal/model/rwkv7/loader.go
  • go/pkg/metal/model/rwkv7/loader_test.go
  • go/pkg/metal/model/rwkv7/mixer.go
  • go/pkg/metal/model/rwkv7/mixer_test.go
  • go/pkg/metal/model/rwkv7/recurrence.go
  • go/pkg/metal/model/rwkv7/recurrence_test.go
  • go/pkg/metal/model/rwkv7/register.go
  • go/pkg/metal/model_bench_test.go
  • go/pkg/metal/model_dispatch_test.go
  • go/pkg/metal/model_eval_isolation_test.go
  • go/pkg/metal/model_example_test.go
  • go/pkg/metal/model_files.go
  • go/pkg/metal/model_info.go
  • go/pkg/metal/model_quant.go
  • go/pkg/metal/model_registry.go
  • go/pkg/metal/model_registry_test.go
  • go/pkg/metal/model_test.go
  • go/pkg/metal/moe.go
  • go/pkg/metal/moe_bench_test.go
  • go/pkg/metal/moe_expert.go
  • go/pkg/metal/moe_expert_test.go
  • go/pkg/metal/moe_router.go
  • go/pkg/metal/moe_router_test.go
  • go/pkg/metal/nn.go
  • go/pkg/metal/nn_example_test.go
  • go/pkg/metal/nn_test.go
  • go/pkg/metal/ops.go
  • go/pkg/metal/ops_bench_test.go
  • go/pkg/metal/ops_cover_test.go
  • go/pkg/metal/ops_example_test.go
  • go/pkg/metal/ops_test.go
  • go/pkg/metal/optim.go
  • go/pkg/metal/optim_example_test.go
  • go/pkg/metal/optim_test.go
  • go/pkg/metal/perf_invariants_test.go
  • go/pkg/metal/pinned_array.go
  • go/pkg/metal/pinned_array_bench_test.go
  • go/pkg/metal/pinned_array_bridge.cpp
  • go/pkg/metal/pinned_array_test.go
  • go/pkg/metal/ple_bench_test.go
  • go/pkg/metal/probe.go
  • go/pkg/metal/probe_test.go
  • go/pkg/metal/process_memory_darwin.go
  • go/pkg/metal/process_memory_stub.go
  • go/pkg/metal/prompt_cache.go
  • go/pkg/metal/prompt_cache_bench_test.go
  • go/pkg/metal/prompt_cache_helpers_cover_test.go
  • go/pkg/metal/prompt_cache_model_eval_test.go
  • go/pkg/metal/prompt_cache_test.go
  • go/pkg/metal/pure_helpers_cover_test.go
  • go/pkg/metal/quant.go
  • go/pkg/metal/quant_affine.go
  • go/pkg/metal/quant_compute_cover_test.go
  • go/pkg/metal/quant_fp4.go
  • go/pkg/metal/quant_ordering_bench_test.go
  • go/pkg/metal/quant_registry.go
  • go/pkg/metal/quant_registry_test.go
  • go/pkg/metal/quant_schemes_test.go
  • go/pkg/metal/quantize_op.go
  • go/pkg/metal/quantize_op_test.go
  • go/pkg/metal/quantized_ops_bench_test.go
  • go/pkg/metal/random.go
  • go/pkg/metal/random_bench_test.go
  • go/pkg/metal/random_example_test.go
  • go/pkg/metal/random_test.go
  • go/pkg/metal/real_e2b_engine_bench_test.go
  • go/pkg/metal/rmsnorm_bench_test.go
  • go/pkg/metal/rope_bench_test.go
  • go/pkg/metal/router_topk.go
  • go/pkg/metal/router_topk_decode_bench_test.go
  • go/pkg/metal/router_topk_test.go
  • go/pkg/metal/runtime_author.go
  • go/pkg/metal/runtime_author_cover_test.go
  • go/pkg/metal/runtime_author_test.go
  • go/pkg/metal/runtime_gate.go
  • go/pkg/metal/runtime_gate_example_test.go
  • go/pkg/metal/runtime_gate_test.go
  • go/pkg/metal/sample.go
  • go/pkg/metal/sample_cover_test.go
  • go/pkg/metal/sample_distribution.go
  • go/pkg/metal/sample_distribution_test.go
  • go/pkg/metal/sample_example_test.go
  • go/pkg/metal/sample_key_test.go
  • go/pkg/metal/sample_model_eval_test.go
  • go/pkg/metal/sample_test.go
  • go/pkg/metal/scheme_compute_test.go
  • go/pkg/metal/sdpa_determinism_test.go
  • go/pkg/metal/sdpa_paged_bench_test.go
  • go/pkg/metal/session.go
  • go/pkg/metal/session_bench_test.go
  • go/pkg/metal/session_example_test.go
  • go/pkg/metal/session_generate_cover_test.go
  • go/pkg/metal/session_lifecycle_test.go
  • go/pkg/metal/session_methods_cover_test.go
  • go/pkg/metal/session_pipelined.go
  • go/pkg/metal/session_pipelined_test.go
  • go/pkg/metal/session_snapshot_drive_cover_test.go
  • go/pkg/metal/session_snapshot_model_eval_test.go
  • go/pkg/metal/session_snapshot_shape_cover_test.go
  • go/pkg/metal/session_test.go
  • go/pkg/metal/shared_kv.go
  • go/pkg/metal/shared_kv_test.go
  • go/pkg/metal/slice.go
  • go/pkg/metal/slice_example_test.go
  • go/pkg/metal/slice_test.go
  • go/pkg/metal/smallm_bench_test.go
  • go/pkg/metal/softmax_loader.go
  • go/pkg/metal/softmax_loader_test.go
  • go/pkg/metal/speculative_accept.go
  • go/pkg/metal/speculative_accept_test.go
  • go/pkg/metal/speculative_export_cover_test.go
  • go/pkg/metal/speculative_verify.go
  • go/pkg/metal/speculative_verify_test.go
  • go/pkg/metal/split.go
  • go/pkg/metal/split_test.go
  • go/pkg/metal/stream.go
  • go/pkg/metal/stream_example_test.go
  • go/pkg/metal/stream_runtime_test.go
  • go/pkg/metal/testmain_test.go
  • go/pkg/metal/thinking_budget.go
  • go/pkg/metal/thinking_budget_test.go
  • go/pkg/metal/tokenizer.go
  • go/pkg/metal/tokenizer_model_eval_test.go
  • go/pkg/metal/trace.go
  • go/pkg/metal/trace_bench_test.go
  • go/pkg/metal/trace_phase_diag_test.go
  • go/pkg/metal/trace_test.go
  • go/pkg/metal/training.go
  • go/pkg/metal/training_cover_test.go
  • go/pkg/metal/training_example_test.go
  • go/pkg/metal/transformer.go
  • go/pkg/metal/turboquant_kv.go
  • go/pkg/metal/turboquant_kv_cache.go
  • go/pkg/metal/turboquant_kv_cache_bench_test.go
  • go/pkg/metal/turboquant_kv_payload.go
  • go/pkg/metal/turboquant_kv_payload_test.go
  • go/pkg/metal/turboquant_kv_reference.go
  • go/pkg/metal/turboquant_kv_reference_bench_test.go
  • go/pkg/metal/turboquant_kv_test.go
  • go/pkg/metal/vector.go
  • go/pkg/metal/vector_example_test.go
  • go/pkg/metal/vector_test.go
  • go/pkg/metal/version_test.go
  • go/pkg/metal/vision_cache.go
  • go/pkg/metal/vision_cache_test.go
  • go/pkg/metal/vision_chat.go
  • go/pkg/metal/vision_chat_model_eval_test.go
  • go/pkg/metal/vision_chat_test.go
  • go/pkg/model/arch.go
  • go/pkg/model/arch_guard_test.go
  • go/pkg/model/arch_spec.go
  • go/pkg/model/arch_spec_test.go
  • go/pkg/model/assemble.go
  • go/pkg/model/assistant_spec.go
  • go/pkg/model/assistant_spec_test.go
  • go/pkg/model/backend.go
  • go/pkg/model/composed/attention.go
  • go/pkg/model/composed/attention_test.go
  • go/pkg/model/composed/composed.go
  • go/pkg/model/composed/composed_test.go
  • go/pkg/model/composed/loader.go
  • go/pkg/model/composed/loader_test.go
  • go/pkg/model/composed/mixers.go
  • go/pkg/model/composed/moe.go
  • go/pkg/model/composed/moe_test.go
  • go/pkg/model/composed/token_model.go
  • go/pkg/model/composed/token_model_test.go
  • go/pkg/model/deltanet/deltanet.go
  • go/pkg/model/deltanet/deltanet_test.go
  • go/pkg/model/deltanet/hf_cross_test.go
  • go/pkg/model/gemma3/gemma3.go
  • go/pkg/model/gemma3/gemma3_test.go
  • go/pkg/model/gemma3/register.go
  • go/pkg/model/gemma4/assistant.go
  • go/pkg/model/gemma4/assistant_test.go
  • go/pkg/model/gemma4/audio_assemble.go
  • go/pkg/model/gemma4/audio_assemble_test.go
  • go/pkg/model/gemma4/audio_config.go
  • go/pkg/model/gemma4/chat/gemma4chat.go
  • go/pkg/model/gemma4/chat/gemma4chat_test.go
  • go/pkg/model/gemma4/config.go
  • go/pkg/model/gemma4/config_test.go
  • go/pkg/model/gemma4/coverage_gaps_test.go
  • go/pkg/model/gemma4/derive.go
  • go/pkg/model/gemma4/diffusion.go
  • go/pkg/model/gemma4/engine_helper_test.go
  • go/pkg/model/gemma4/gemma4.go
  • go/pkg/model/gemma4/infer.go
  • go/pkg/model/gemma4/load_bench_test.go
  • go/pkg/model/gemma4/load_test.go
  • go/pkg/model/gemma4/parse.go
  • go/pkg/model/gemma4/register.go
  • go/pkg/model/gemma4/register_test.go
  • go/pkg/model/gemma4/text_config.go
  • go/pkg/model/gemma4/vision_assemble.go
  • go/pkg/model/gemma4/vision_assemble_test.go
  • go/pkg/model/gemma4/vision_config.go
  • go/pkg/model/gemma4/vision_features.go
  • go/pkg/model/gemma4/vision_features_test.go
  • go/pkg/model/gemma4/vision_infer.go
  • go/pkg/model/gemma4/vision_infer_test.go
  • go/pkg/model/gemma4/vision_weights.go
  • go/pkg/model/gemma4/vision_weights_test.go
  • go/pkg/model/gemma4/weights.go
  • go/pkg/model/infer.go
  • go/pkg/model/linear.go
  • go/pkg/model/linear_bench_test.go
  • go/pkg/model/linear_test.go
  • go/pkg/model/load.go
  • go/pkg/model/load_test.go
  • go/pkg/model/loaded.go
  • go/pkg/model/mamba2/backend.go
  • go/pkg/model/mamba2/block.go
  • go/pkg/model/mamba2/block_bench_test.go
  • go/pkg/model/mamba2/block_test.go
  • go/pkg/model/mamba2/conv.go
  • go/pkg/model/mamba2/conv_bench_test.go
  • go/pkg/model/mamba2/conv_test.go
  • go/pkg/model/mamba2/loader.go
  • go/pkg/model/mamba2/loader_test.go
  • go/pkg/model/mamba2/model.go
  • go/pkg/model/mamba2/model_bench_test.go
  • go/pkg/model/mamba2/model_test.go
  • go/pkg/model/mamba2/scan.go
  • go/pkg/model/mamba2/scan_bench_test.go
  • go/pkg/model/mamba2/scan_test.go
  • go/pkg/model/mamba2/smoke_test.go
  • go/pkg/model/mamba2/token_model.go
  • go/pkg/model/mamba2/token_model_test.go
  • go/pkg/model/mistral/config.go
  • go/pkg/model/mistral/config_bench_test.go
  • go/pkg/model/mistral/config_test.go
  • go/pkg/model/mistral/register.go
  • go/pkg/model/mistral/register_test.go
  • go/pkg/model/mistral/yarn.go
  • go/pkg/model/mistral/yarn_branches_test.go
  • go/pkg/model/mistral/yarn_test.go
  • go/pkg/model/norm_bias.go
  • go/pkg/model/norm_bias_test.go
  • go/pkg/model/quant.go
  • go/pkg/model/quant_bench_test.go
  • go/pkg/model/quant_config.go
  • go/pkg/model/quant_config_test.go
  • go/pkg/model/quant_example_test.go
  • go/pkg/model/quant_test.go
  • go/pkg/model/qwen3/gated_delta.go
  • go/pkg/model/qwen3/gated_delta_test.go
  • go/pkg/model/qwen3/qwen3.go
  • go/pkg/model/qwen3/qwen3_test.go
  • go/pkg/model/qwen3/register.go
  • go/pkg/model/rwkv7/backend.go
  • go/pkg/model/rwkv7/block.go
  • go/pkg/model/rwkv7/block_test.go
  • go/pkg/model/rwkv7/recurrence.go
  • go/pkg/model/rwkv7/recurrence_test.go
  • go/pkg/model/sample.go
  • go/pkg/model/sample_bench_test.go
  • go/pkg/model/sample_example_test.go
  • go/pkg/model/sample_test.go
  • go/pkg/model/token.go
  • go/pkg/model/token_bench_test.go
  • go/pkg/model/token_test.go
  • go/pkg/model/transformer_config.go
  • go/pkg/model/wrapper_names.go
  • go/pkg/native/arch_quant_session_test.go
  • go/pkg/native/arch_session.go
  • go/pkg/native/arch_session_bench_test.go
  • go/pkg/native/arch_session_icb_parity_test.go
  • go/pkg/native/arch_session_retained_test.go
  • go/pkg/native/arch_session_test.go
  • go/pkg/native/assemble_fixture_test.go
  • go/pkg/native/assistant_gguf.go
  • go/pkg/native/assistant_live_test.go
  • go/pkg/native/assistant_load.go
  • go/pkg/native/assistant_load_test.go
  • go/pkg/native/assistant_quant_kv_test.go
  • go/pkg/native/assistant_quant_parity_test.go
  • go/pkg/native/attention.go
  • go/pkg/native/attention_bench_test.go
  • go/pkg/native/attention_test.go
  • go/pkg/native/attn_megakernel_test.go
  • go/pkg/native/audio.go
  • go/pkg/native/audio_attention.go
  • go/pkg/native/audio_attention_bench_test.go
  • go/pkg/native/audio_encoder.go
  • go/pkg/native/audio_f32.go
  • go/pkg/native/audio_features.go
  • go/pkg/native/audio_features_test.go
  • go/pkg/native/audio_helpers_bench_test.go
  • go/pkg/native/audio_helpers_test.go
  • go/pkg/native/audio_test.go
  • go/pkg/native/backend.go
  • go/pkg/native/backend_bench_test.go
  • go/pkg/native/backend_helpers_test.go
  • go/pkg/native/backend_test.go
  • go/pkg/native/bf16.go
  • go/pkg/native/bf16_bench_test.go
  • go/pkg/native/bf16_localize_test.go
  • go/pkg/native/bf16_test.go
  • go/pkg/native/binary.go
  • go/pkg/native/binary_bench_test.go
  • go/pkg/native/binary_test.go
  • go/pkg/native/cast.go
  • go/pkg/native/cast_bench_test.go
  • go/pkg/native/cast_test.go
  • go/pkg/native/chain.go
  • go/pkg/native/chain_bench_test.go
  • go/pkg/native/chain_test.go
  • go/pkg/native/chained_gpu_decode_test.go
  • go/pkg/native/coherency_probe_test.go
  • go/pkg/native/context_scaling_test.go
  • go/pkg/native/conv.go
  • go/pkg/native/conv_test.go
  • go/pkg/native/coverage_guard_test.go
  • go/pkg/native/crossengine_test.go
  • go/pkg/native/decode_batched_ple_test.go
  • go/pkg/native/decode_batched_session.go
  • go/pkg/native/decode_batched_session_bench_test.go
  • go/pkg/native/decode_batched_session_test.go
  • go/pkg/native/decode_forward.go
  • go/pkg/native/decode_forward_arch.go
  • go/pkg/native/decode_forward_arch_bench_test.go
  • go/pkg/native/decode_forward_arch_helpers_test.go
  • go/pkg/native/decode_forward_arch_icb.go
  • go/pkg/native/decode_forward_arch_icb_bench_test.go
  • go/pkg/native/decode_forward_arch_icb_kvheads_test.go
  • go/pkg/native/decode_forward_arch_icb_quant.go
  • go/pkg/native/decode_forward_arch_icb_quant_bench_test.go
  • go/pkg/native/decode_forward_arch_icb_quant_test.go
  • go/pkg/native/decode_forward_arch_icb_test.go
  • go/pkg/native/decode_forward_arch_quant.go
  • go/pkg/native/decode_forward_arch_quant_bench_test.go
  • go/pkg/native/decode_forward_arch_quant_test.go
  • go/pkg/native/decode_forward_arch_scratch.go
  • go/pkg/native/decode_forward_arch_test.go
  • go/pkg/native/decode_forward_bench_test.go
  • go/pkg/native/decode_forward_icb.go
  • go/pkg/native/decode_forward_icb_bench_test.go
  • go/pkg/native/decode_forward_icb_quant.go
  • go/pkg/native/decode_forward_icb_quant_bench_test.go
  • go/pkg/native/decode_forward_icb_quant_test.go
  • go/pkg/native/decode_forward_icb_test.go
  • go/pkg/native/decode_forward_metal_test.go
  • go/pkg/native/decode_forward_quant.go
  • go/pkg/native/decode_forward_quant_bench_test.go
  • go/pkg/native/decode_forward_quant_test.go
  • go/pkg/native/decode_forward_test.go
  • go/pkg/native/decode_norms_test.go
  • go/pkg/native/decode_rope_test.go
  • go/pkg/native/decode_step.go
  • go/pkg/native/decode_step_batched.go
  • go/pkg/native/decode_step_batched_bench_test.go
  • go/pkg/native/decode_step_batched_test.go
  • go/pkg/native/decode_step_bench_test.go
  • go/pkg/native/decode_step_test.go
  • go/pkg/native/device.go
  • go/pkg/native/device_bench_test.go
  • go/pkg/native/device_test.go
  • go/pkg/native/diffusion.go
  • go/pkg/native/diffusion_attention.go
  • go/pkg/native/diffusion_forward.go
  • go/pkg/native/diffusion_session.go
  • go/pkg/native/diffusion_test.go
  • go/pkg/native/dispatch_sink.go
  • go/pkg/native/e4b_nocopy_test.go
  • go/pkg/native/embed_fastpath_test.go
  • go/pkg/native/embed_gather.go
  • go/pkg/native/embed_gather_bench_test.go
  • go/pkg/native/embed_gather_test.go
  • go/pkg/native/embed_lmhead.go
  • go/pkg/native/embed_lmhead_bench_test.go
  • go/pkg/native/embed_lmhead_quant.go
  • go/pkg/native/embed_lmhead_quant_bench_test.go
  • go/pkg/native/embed_lmhead_quant_metal_test.go
  • go/pkg/native/embed_lmhead_quant_test.go
  • go/pkg/native/embed_lmhead_test.go
  • go/pkg/native/encsend.go
  • go/pkg/native/encsend_bench_test.go
  • go/pkg/native/ffn_megakernel_test.go
  • go/pkg/native/gated_delta_backend.go
  • go/pkg/native/gelu.go
  • go/pkg/native/gelu_bench_test.go
  • go/pkg/native/gelu_example_test.go
  • go/pkg/native/gelu_ref_test.go
  • go/pkg/native/gelu_test.go
  • go/pkg/native/gemm_steel.go
  • go/pkg/native/gemm_steel_test.go
  • go/pkg/native/gemv.go
  • go/pkg/native/gemv2_megakernel_test.go
  • go/pkg/native/gemv_bench_test.go
  • go/pkg/native/gemv_test.go
  • go/pkg/native/generate_bf16.go
  • go/pkg/native/generate_bf16_bench_test.go
  • go/pkg/native/generate_bf16_test.go
  • go/pkg/native/generate_text_test.go
  • go/pkg/native/gpu_trace.go
  • go/pkg/native/gridsync_probe_test.go
  • go/pkg/native/head_nocopy.go
  • go/pkg/native/head_nocopy_bench_test.go
  • go/pkg/native/head_nocopy_softcap_bench_test.go
  • go/pkg/native/head_nocopy_test.go
  • go/pkg/native/icb.go
  • go/pkg/native/icb_basic_test.go
  • go/pkg/native/icb_bench_test.go
  • go/pkg/native/icb_debug_test.go
  • go/pkg/native/icb_layer.go
  • go/pkg/native/icb_layer_bench_test.go
  • go/pkg/native/icb_layer_test.go
  • go/pkg/native/icb_nobarrier_test.go
  • go/pkg/native/icb_test.go
  • go/pkg/native/kernels/lthn_attn_megakernel.metal
  • go/pkg/native/kernels/lthn_bf16_scalar.metal
  • go/pkg/native/kernels/lthn_coherency_probe.metal
  • go/pkg/native/kernels/lthn_copy_bf16.metal
  • go/pkg/native/kernels/lthn_embed_gather.metal
  • go/pkg/native/kernels/lthn_ffn_megakernel.metal
  • go/pkg/native/kernels/lthn_gelu_gate_mul.metal
  • go/pkg/native/kernels/lthn_gemv2_megakernel.metal
  • go/pkg/native/kernels/lthn_gridsync_probe.metal
  • go/pkg/native/kernels/lthn_layer_megakernel.metal
  • go/pkg/native/kernels/lthn_moe_router_topk.metal
  • go/pkg/native/kernels/lthn_mul_rows_bf16.metal
  • go/pkg/native/kernels/lthn_ple_slab.metal
  • go/pkg/native/kernels/lthn_q4_lm_head_argmax.metal
  • go/pkg/native/kernels/lthn_qgemv.metal
  • go/pkg/native/kernels/lthn_qgemv_simd.metal
  • go/pkg/native/kernels/lthn_qknorm_rope_bf16.metal
  • go/pkg/native/kernels/lthn_rms_qmv.metal
  • go/pkg/native/kernels/lthn_rmsnorm_residual_bf16.metal
  • go/pkg/native/kernels/lthn_sdpa_multiq.metal
  • go/pkg/native/kernels/lthn_sdpa_multiq_ring.metal
  • go/pkg/native/kernels/lthn_sdpa_paged.metal
  • go/pkg/native/kernels/lthn_vproj_headrms.metal
  • go/pkg/native/kv_contract.go
  • go/pkg/native/kv_contract_test.go
  • go/pkg/native/layer.go
  • go/pkg/native/layer_bench_test.go
  • go/pkg/native/layer_megakernel_test.go
  • go/pkg/native/layer_scalar_bench_test.go
  • go/pkg/native/layer_scalar_test.go
  • go/pkg/native/layer_test.go
  • go/pkg/native/layernorm.go
  • go/pkg/native/layernorm_bench_test.go
  • go/pkg/native/layernorm_metal_test.go
  • go/pkg/native/layernorm_test.go
  • go/pkg/native/load.go
  • go/pkg/native/load_dir_test.go
  • go/pkg/native/load_helpers_test.go
  • go/pkg/native/load_shared.go
  • go/pkg/native/load_shared_bench_test.go
  • go/pkg/native/load_shared_test.go
  • go/pkg/native/load_test.go
  • go/pkg/native/lora_fuse.go
  • go/pkg/native/lora_fuse_test.go
  • go/pkg/native/lora_helpers_test.go
  • go/pkg/native/lthn_kernels.go
  • go/pkg/native/lthn_kernels_bench_test.go
  • go/pkg/native/lthn_kernels_test.go
  • go/pkg/native/mamba2_backend.go
  • go/pkg/native/mamba2_backend_test.go
  • go/pkg/native/matmul_bf16_steel.go
  • go/pkg/native/matmul_bf16_steel_test.go
  • go/pkg/native/matmul_steel.go
  • go/pkg/native/matmul_steel_metal_test.go
  • go/pkg/native/matmul_steel_test.go
  • go/pkg/native/measure.go
  • go/pkg/native/measure_bench_test.go
  • go/pkg/native/measure_test.go
  • go/pkg/native/mistral_session_test.go
  • go/pkg/native/mlp_bf16.go
  • go/pkg/native/mlp_bf16_bench_test.go
  • go/pkg/native/mlp_bf16_test.go
  • go/pkg/native/mlp_block_bf16.go
  • go/pkg/native/mlp_block_bf16_bench_test.go
  • go/pkg/native/mlp_block_bf16_test.go
  • go/pkg/native/mlp_scratch_bench_test.go
  • go/pkg/native/mlp_scratch_test.go
  • go/pkg/native/model.go
  • go/pkg/native/model_quant.go
  • go/pkg/native/model_quant_bench_test.go
  • go/pkg/native/model_quant_test.go
  • go/pkg/native/moe.go
  • go/pkg/native/moe_26b_real_test.go
  • go/pkg/native/moe_bench_test.go
  • go/pkg/native/moe_block.go
  • go/pkg/native/moe_block_bench_test.go
  • go/pkg/native/moe_block_test.go
  • go/pkg/native/moe_quant_test.go
  • go/pkg/native/moe_session_test.go
  • go/pkg/native/moe_test.go
  • go/pkg/native/mtp.go
  • go/pkg/native/mtp_attn.go
  • go/pkg/native/mtp_attn_bench_test.go
  • go/pkg/native/mtp_attn_metal_test.go
  • go/pkg/native/mtp_attn_test.go
  • go/pkg/native/mtp_bench_test.go
  • go/pkg/native/mtp_session_test.go
  • go/pkg/native/native_e2b_real_test.go
  • go/pkg/native/native_tokps_test.go
  • go/pkg/native/nocopy_decode_test.go
  • go/pkg/native/nocopy_matvec.go
  • go/pkg/native/nocopy_matvec_bench_test.go
  • go/pkg/native/nocopy_matvec_test.go
  • go/pkg/native/nocopy_mmap_test.go
  • go/pkg/native/nocopy_weights.go
  • go/pkg/native/nocopy_weights_bench_test.go
  • go/pkg/native/nocopy_weights_test.go
  • go/pkg/native/output_nocopy_test.go
  • go/pkg/native/paged_kv.go
  • go/pkg/native/paged_kv_device.go
  • go/pkg/native/paged_kv_test.go
  • go/pkg/native/parity_test.go
  • go/pkg/native/partial_rotary_decode_test.go
  • go/pkg/native/per_layer_batch.go
  • go/pkg/native/per_layer_batch_bench_test.go
  • go/pkg/native/per_layer_batch_test.go
  • go/pkg/native/per_layer_gate_decode_test.go
  • go/pkg/native/per_layer_gpu.go
  • go/pkg/native/per_layer_gpu_bench_test.go
  • go/pkg/native/per_layer_gpu_test.go
  • go/pkg/native/per_layer_input.go
  • go/pkg/native/per_layer_input_bench_test.go
  • go/pkg/native/per_layer_input_test.go
  • go/pkg/native/per_layer_inputs_test.go
  • go/pkg/native/per_layer_session_test.go
  • go/pkg/native/piece_timing.go
  • go/pkg/native/piece_timing_test.go
  • go/pkg/native/pinned_nocopy_test.go
  • go/pkg/native/pool.go
  • go/pkg/native/pool_bench_test.go
  • go/pkg/native/pool_test.go
  • go/pkg/native/profile.go
  • go/pkg/native/profile_bench_test.go
  • go/pkg/native/profile_test.go
  • go/pkg/native/projector.go
  • go/pkg/native/projector_bench_test.go
  • go/pkg/native/projector_test.go
  • go/pkg/native/prompt_cache.go
  • go/pkg/native/prompt_cache_bench_test.go
  • go/pkg/native/prompt_cache_ple_bench_test.go
  • go/pkg/native/prompt_cache_ple_test.go
  • go/pkg/native/prompt_cache_test.go
  • go/pkg/native/q4_icb_localize_test.go
  • go/pkg/native/qgemv_test.go
  • go/pkg/native/qknorm_rope.go
  • go/pkg/native/qknorm_rope_bench_test.go
  • go/pkg/native/qknorm_rope_test.go
  • go/pkg/native/qmv.go
  • go/pkg/native/qmv_bench_test.go
  • go/pkg/native/qmv_gather.go
  • go/pkg/native/qmv_head_bench_test.go
  • go/pkg/native/qmv_metal_test.go
  • go/pkg/native/qmv_test.go
  • go/pkg/native/qwen3_gated_delta_backend_test.go
  • go/pkg/native/real_e2b_assistant_bench_test.go
  • go/pkg/native/real_e2b_contract_bench_test.go
  • go/pkg/native/real_e2b_decode_bench_test.go
  • go/pkg/native/real_e2b_prefill_bench_test.go
  • go/pkg/native/registry_arches_test.go
  • go/pkg/native/repro_test.go
  • go/pkg/native/rms_qmv.go
  • go/pkg/native/rms_qmv_bench_test.go
  • go/pkg/native/rms_qmv_test.go
  • go/pkg/native/rmsnorm.go
  • go/pkg/native/rmsnorm_bench_test.go
  • go/pkg/native/rmsnorm_residual.go
  • go/pkg/native/rmsnorm_residual_bench_test.go
  • go/pkg/native/rmsnorm_residual_test.go
  • go/pkg/native/rmsnorm_test.go
  • go/pkg/native/rope.go
  • go/pkg/native/rope_bench_test.go
  • go/pkg/native/rope_dims_bench_test.go
  • go/pkg/native/rope_dims_test.go
  • go/pkg/native/rope_freqs.go
  • go/pkg/native/rope_freqs_bench_test.go
  • go/pkg/native/rope_freqs_session_test.go
  • go/pkg/native/rope_freqs_test.go
  • go/pkg/native/rope_test.go
  • go/pkg/native/roundtrip_test.go
  • go/pkg/native/router.go
  • go/pkg/native/router_bench_test.go
  • go/pkg/native/router_test.go
  • go/pkg/native/rwkv7_backend.go
  • go/pkg/native/rwkv7_backend_test.go
  • go/pkg/native/scheme.go
  • go/pkg/native/scheme_bench_test.go
  • go/pkg/native/scheme_test.go
  • go/pkg/native/sdpa.go
  • go/pkg/native/sdpa_2pass_test.go
  • go/pkg/native/sdpa_bench_test.go
  • go/pkg/native/sdpa_multiq.go
  • go/pkg/native/sdpa_multiq_ring.go
  • go/pkg/native/sdpa_paged.go
  • go/pkg/native/sdpa_paged_test.go
  • go/pkg/native/sdpa_test.go
  • go/pkg/native/session_kv_snapshot.go
  • go/pkg/native/session_kvconv_test.go
  • go/pkg/native/session_name_guard_test.go
  • go/pkg/native/session_state.go
  • go/pkg/native/session_state_bench_test.go
  • go/pkg/native/session_state_blocks.go
  • go/pkg/native/session_state_test.go
  • go/pkg/native/softcap_test.go
  • go/pkg/native/softmax.go
  • go/pkg/native/softmax_bench_test.go
  • go/pkg/native/softmax_metal_test.go
  • go/pkg/native/softmax_test.go
  • go/pkg/native/spike_e2b_bench_test.go
  • go/pkg/native/step_greedy_test.go
  • go/pkg/native/test_helpers_test.go
  • go/pkg/native/testmain_test.go
  • go/pkg/native/token_model.go
  • go/pkg/native/token_model_bench_test.go
  • go/pkg/native/token_model_quant_bench_test.go
  • go/pkg/native/token_model_test.go
  • go/pkg/native/train_backward.go
  • go/pkg/native/train_backward_test.go
  • go/pkg/native/train_fullstack_test.go
  • go/pkg/native/train_guard_test.go
  • go/pkg/native/train_lora.go
  • go/pkg/native/train_lora_test.go
  • go/pkg/native/train_optim.go
  • go/pkg/native/train_optim_test.go
  • go/pkg/native/train_projlora_test.go
  • go/pkg/native/train_realsession_test.go
  • go/pkg/native/train_session.go
  • go/pkg/native/train_session_bench_test.go
  • go/pkg/native/train_session_test.go
  • go/pkg/native/train_stack_test.go
  • go/pkg/native/turboquant_kv_payload.go
  • go/pkg/native/unary.go
  • go/pkg/native/unary_bench_test.go
  • go/pkg/native/unary_test.go
  • go/pkg/native/value_norm_test.go
  • go/pkg/native/vision.go
  • go/pkg/native/vision_bench_test.go
  • go/pkg/native/vision_features.go
  • go/pkg/native/vision_features_test.go
  • go/pkg/native/vision_helpers_test.go
  • go/pkg/native/vision_test.go
  • go/pkg/native/vproj_headrms.go
  • go/pkg/native/vproj_headrms_bench_test.go
  • go/pkg/native/vproj_headrms_test.go
  • go/pkg/native/zz_cover_encode_test.go
  • go/pkg/native/zz_cover_ensureinit_test.go
  • go/pkg/native/zz_cover_icb_test.go
  • go/pkg/native/zz_cover_inputs_test.go
  • go/pkg/native/zz_cover_load_test.go
  • go/pkg/native/zz_cover_misc_test.go
  • go/pkg/native/zz_cover_wronglib_test.go
  • go/pkg/safetensors/safetensors.go
  • go/pkg/safetensors/safetensors_bench_test.go
  • go/pkg/safetensors/safetensors_extra_test.go
  • go/pkg/safetensors/safetensors_mmap.go
  • go/pkg/safetensors/safetensors_mmap_extra_test.go
  • go/pkg/safetensors/safetensors_mmap_fault_test.go
  • go/pkg/safetensors/safetensors_mmap_other.go
  • go/pkg/safetensors/safetensors_mmap_test.go
  • go/pkg/safetensors/safetensors_real_gemma4_test.go
  • go/pkg/safetensors/safetensors_test.go
  • go/pkg/safetensors/sharded.go
  • go/pkg/safetensors/sharded_extra_test.go
  • go/pkg/safetensors/sharded_fault_test.go
  • go/pkg/safetensors/sharded_test.go
  • go/pkg/scheme/builtin.go
  • go/pkg/scheme/scheme.go
  • go/pkg/scheme/scheme_bench_test.go
  • go/pkg/scheme/scheme_compat_test.go
  • go/pkg/scheme/scheme_cover_test.go
  • go/pkg/scheme/scheme_test.go
  • go/pkg/score/authority.go
  • go/pkg/score/authority_example_test.go
  • go/pkg/score/authority_test.go
  • go/pkg/score/cmudict.go
  • go/pkg/score/cmudict_example_test.go
  • go/pkg/score/cmudict_test.go
  • go/pkg/score/corpus_probe_test.go
  • go/pkg/score/coverage_internal_test.go
  • go/pkg/score/data/cmudict_starter.txt
  • go/pkg/score/dialect.go
  • go/pkg/score/dialect_example_test.go
  • go/pkg/score/dialect_test.go
  • go/pkg/score/differential.go
  • go/pkg/score/differential_example_test.go
  • go/pkg/score/differential_test.go
  • go/pkg/score/helpers_coverage_test.go
  • go/pkg/score/hostility.go
  • go/pkg/score/hostility_example_test.go
  • go/pkg/score/hostility_test.go
  • go/pkg/score/lek.go
  • go/pkg/score/lek_coverage_test.go
  • go/pkg/score/lek_example_test.go
  • go/pkg/score/lek_test.go
  • go/pkg/score/metaphone.go
  • go/pkg/score/metaphone_coverage_test.go
  • go/pkg/score/metaphone_example_test.go
  • go/pkg/score/metaphone_test.go
  • go/pkg/score/pattern.go
  • go/pkg/score/pattern_example_test.go
  • go/pkg/score/pattern_test.go
  • go/pkg/score/phonetic_dims.go
  • go/pkg/score/phonetic_dims_bench_test.go
  • go/pkg/score/phonetic_dims_example_test.go
  • go/pkg/score/phonetic_dims_test.go
  • go/pkg/score/result.go
  • go/pkg/score/result_example_test.go
  • go/pkg/score/result_test.go
  • go/pkg/score/score_path_bench_test.go
  • go/pkg/score/scorer.go
  • go/pkg/score/scorer_example_test.go
  • go/pkg/score/scorer_test.go
  • go/pkg/score/sycophancy.go
  • go/pkg/score/sycophancy_example_test.go
  • go/pkg/score/sycophancy_test.go
  • go/pkg/score/types.go
  • go/pkg/score/types_example_test.go
  • go/pkg/score/types_test.go
  • go/pkg/tokenizer/tokenizer.go
  • go/pkg/tokenizer/tokenizer_bench_test.go
  • go/pkg/tokenizer/tokenizer_coverage_test.go
  • go/pkg/tokenizer/tokenizer_example_test.go
  • go/pkg/tokenizer/tokenizer_real_gemma4_test.go
  • go/pkg/tokenizer/tokenizer_test.go
  • go/primitives.go
  • go/primitives_example_test.go
  • go/primitives_extra_test.go
  • go/primitives_test.go
  • go/probe.go
  • go/probe_test.go
  • go/prompt_cache.go
  • go/quant/jang/jang.go
  • go/quant/jang/jang_bench_test.go
  • go/quant/jang/jang_coverage_test.go
  • go/quant/jang/jang_example_test.go
  • go/quant/jang/jang_test.go
  • go/register_metal.go
  • go/register_metal_cache.go
  • go/register_metal_example_test.go
  • go/register_metal_parser.go
  • go/register_metal_scheduler.go
  • go/register_metal_stub.go
  • go/register_metal_stub_example_test.go
  • go/register_metal_stub_test.go
  • go/register_metal_test.go
  • go/register_native.go
  • go/register_native_eval.go
  • go/register_native_extra_test.go
  • go/register_native_lora.go
  • go/register_native_lora_test.go
  • go/register_native_parser.go
  • go/register_native_prompt_cache_bench_test.go
  • go/register_native_prompt_cache_test.go
  • go/register_native_scheduler.go
  • go/register_native_seed_test.go
  • go/register_native_session_test.go
  • go/register_native_suppress_test.go
  • go/register_native_thinking_budget.go
  • go/register_native_thinking_budget_test.go
  • go/register_native_vision_cache.go
  • go/reserialize_model_eval_test.go
  • go/serve_turn_phase_split_live_test.go
  • go/session.go
  • go/session_agent.go
  • go/session_agent_bench_test.go
  • go/session_agent_live_test.go
  • go/session_agent_test.go
  • go/session_artifact.go
  • go/session_artifact_example_test.go
  • go/session_artifact_test.go
  • go/session_bench_test.go
  • go/session_continuity_live_bench_test.go
  • go/session_darwin.go
  • go/session_darwin_example_test.go
  • go/session_darwin_test.go
  • go/session_defaults.go
  • go/session_defaults_example_test.go
  • go/session_defaults_test.go
  • go/session_example_test.go
  • go/session_stub_example_test.go
  • go/session_test.go
  • go/sft.go
  • go/sft_darwin.go
  • go/sft_darwin_test.go
  • go/sft_example_test.go
  • go/sft_runner_test.go
  • go/sft_smoke_test.go
  • go/sft_stub.go
  • go/sft_test.go
  • go/shape.go
  • go/shape_bench_test.go
  • go/shape_test.go
  • go/specprofile/profile.go
  • go/specprofile/profile_cover_test.go
  • go/specprofile/profile_test.go
  • go/speculative.go
  • go/speculative_bench_test.go
  • go/speculative_example_test.go
  • go/speculative_live_bench_test.go
  • go/speculative_live_test.go
  • go/speculative_test.go
  • go/speculative_textmodel.go
  • go/spine/lora_config.go
  • go/spine/lora_config_bench_test.go
  • go/spine/lora_config_example_test.go
  • go/spine/lora_config_test.go
  • go/spine/metal_convert.go
  • go/spine/metal_convert_test.go
  • go/spine/model_info.go
  • go/spine/model_info_bench_test.go
  • go/spine/model_info_example_test.go
  • go/spine/model_info_test.go
  • go/spine/prompt.go
  • go/spine/prompt_bench_test.go
  • go/spine/prompt_example_test.go
  • go/spine/prompt_test.go
  • go/spine/spine.go
  • go/spine/spine_bench_test.go
  • go/spine/spine_example_test.go
  • go/spine/spine_test.go
  • go/spine/token.go
  • go/spine/tokenizer.go
  • go/spine/tokenizer_bench_test.go
  • go/spine/tokenizer_example_test.go
  • go/spine/tokenizer_test.go
  • go/split_cpu_ffn.go
  • go/split_cpu_ffn_bench_test.go
  • go/split_cpu_ffn_kernels.go
  • go/split_cpu_ffn_kernels_test.go
  • go/split_cpu_ffn_test.go
  • go/split_executor.go
  • go/split_executor_test.go
  • go/split_native_runtime.go
  • go/split_native_runtime_bench_test.go
  • go/split_remote_ffn.go
  • go/split_remote_ffn_bench_test.go
  • go/split_remote_ffn_test.go
  • go/ssd.go
  • go/ssd_example_test.go
  • go/ssd_extra_test.go
  • go/ssd_test.go
  • go/state_bundle.go
  • go/state_bundle_example_test.go
  • go/state_bundle_test.go
  • go/state_chapter_smoke.go
  • go/state_chapter_smoke_bench_test.go
  • go/substrate/condition.go
  • go/substrate/condition_bench_test.go
  • go/substrate/condition_example_test.go
  • go/substrate/condition_test.go
  • go/substrate/substrate_bench_test.go
  • go/substrate_parity_test.go
  • go/testhelpers_test.go
  • go/tests/cli/mlx/Taskfile.yaml
  • go/tests/cli/violet/main.go
  • go/tests/cli/violet/main_example_test.go
  • go/tests/cli/violet/main_test.go
  • go/thinking.go
  • go/thinking_bench_test.go
  • go/thinking_darwin_test.go
  • go/thinking_example_test.go
  • go/thinking_test.go
  • go/tokenizer.go
  • go/tokenizer_common.go
  • go/tokenizer_common_example_test.go
  • go/tokenizer_common_test.go
  • go/tokenizer_example_test.go
  • go/tokenizer_test.go
  • go/train/branch_coverage_test.go
  • go/train/capture.go
  • go/train/capture_example_test.go
  • go/train/capture_test.go
  • go/train/dataset_stream.go
  • go/train/dataset_stream_bench_test.go
  • go/train/dataset_stream_example_test.go
  • go/train/dataset_stream_test.go
  • go/train/score_cascade.go
  • go/train/score_cascade_example_test.go
  • go/train/score_cascade_test.go
  • go/train/sft.go
  • go/train/sft_batch.go
  • go/train/sft_batch_bench_test.go
  • go/train/sft_batch_example_test.go
  • go/train/sft_batch_test.go
  • go/train/sft_bench_test.go
  • go/train/sft_buildexample_test.go
  • go/train/sft_checkpoint.go
  • go/train/sft_checkpoint_bench_test.go
  • go/train/sft_checkpoint_example_test.go
  • go/train/sft_checkpoint_test.go
  • go/train/sft_epoch.go
  • go/train/sft_epoch_bench_test.go
  • go/train/sft_epoch_branches_test.go
  • go/train/sft_epoch_example_test.go
  • go/train/sft_epoch_metal_test.go
  • go/train/sft_epoch_test.go
  • go/train/sft_example_test.go
  • go/train/sft_test.go
  • go/train/ssd.go
  • go/train/ssd_eval.go
  • go/train/ssd_eval_branches_test.go
  • go/train/ssd_eval_example_test.go
  • go/train/ssd_eval_test.go
  • go/train/ssd_example_test.go
  • go/train/ssd_test.go
  • go/train/val.go
  • go/train/val_example_test.go
  • go/train/val_test.go
  • go/training.go
  • go/training_example_test.go
  • go/training_stub.go
  • go/training_stub_example_test.go
  • go/training_stub_test.go
  • go/training_test.go
  • go/unsupported_stub_test.go
  • go/workload_bench.go
  • go/workload_bench_example_test.go
  • go/workload_bench_test.go
  • lib/mlx
  • lib/mlx-c
  • patches/mlx-metal-device-empty-list.patch
  • patches/mlx-sdpa-vector-512.patch
  • scripts/coverage.sh
  • scripts/cpp-coverage.sh
  • scripts/cpp-kernel-coverage.sh
  • scripts/gemma4_context_ramp.sh
  • scripts/make-app-bundle.sh
  • scripts/make-pkg-installer.sh
  • scripts/native-smoke.sh
  • scripts/state_book_from_phase0.py
  • scripts/substrate_shift_capture.py
  • scripts/sync-frontend-dist.sh
  • scripts/verify_production_benchmark_manifest.sh
  • sonar-project.properties
  • tests/cpp/CMakeLists.txt
  • tests/cpp/activation_bridge_tests.cpp
  • tests/cpp/decode_bridge_tests.cpp
  • tests/cpp/lm_head_topk_bridge_tests.cpp
  • tests/cpp/tests_main.cpp

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.



def load_phase0(path: Path) -> list[dict[str, str]]:
entries = json.loads(path.read_text(encoding="utf-8"))
distractors: list[dict[str, str]],
turn_sections: list[str],
) -> dict[str, Path]:
out_dir.mkdir(parents=True, exist_ok=True)
Comment on lines +317 to +324
result = subprocess.run(
command,
check=False,
cwd=args.run_dir,
stdout=stdout,
stderr=stderr,
env=env,
)
Comment on lines +317 to +324
result = subprocess.run(
command,
check=False,
cwd=args.run_dir,
stdout=stdout,
stderr=stderr,
env=env,
)


def append_manifest(manifest_path: Path, row: dict) -> None:
manifest_path.parent.mkdir(parents=True, exist_ok=True)

def append_manifest(manifest_path: Path, row: dict) -> None:
manifest_path.parent.mkdir(parents=True, exist_ok=True)
with manifest_path.open("a", encoding="utf-8") as handle:
raise ValueError("--count must be >= 1")
if args.count > 1 and args.seed_id:
raise ValueError("--seed-id can only be used with --count 1")
args.run_dir.mkdir(parents=True, exist_ok=True)
if args.count > 1 and args.seed_id:
raise ValueError("--seed-id can only be used with --count 1")
args.run_dir.mkdir(parents=True, exist_ok=True)
args.book_dir.mkdir(parents=True, exist_ok=True)
Snider and others added 9 commits June 23, 2026 20:39
…512 TGs (the megakernel is viable)

The full-layer decode megakernel (the only path to 300+, since the FFN-fusion ceiling is ~190 and barriers
are uniform) needs a device-wide grid barrier — but Metal doesn't guarantee threadgroup co-residency, so too
many TGs would deadlock an atomic spin. lthn_gridsync_probe (bounded spin → detects would-be-deadlock without
hanging) + TestGridSyncFeasibility sweep the count:

  32–512 TGs @ 256 threads → all reach the barrier (GRID-SYNC OK)
  1024 TGs → only 179 co-resident → WOULD DEADLOCK

So the ceiling is 512 threadgroups (131K threads) — far more than enough to saturate memory bandwidth for the
decode gemvs. The grid barrier the megakernel needs WORKS on this GPU. Gated behind LEM_GRIDSYNC_PROBE.
…barrier, no external drain

lthn_gemv2_megakernel computes out = W2·(W1·x) in ONE dispatch with a device-wide grid barrier between the
two dependent gemvs (each TG-leader arrives on an atomic counter + spins; mem_device fences flush stage-1
writes and make them visible to stage-2's cross-TG reads) — instead of an external ICB SetBarrier full-drain.
TestGemv2Megakernel matches the host two-gemv reference (cosine >=0.9999), proving the two primitives a
full-layer decode megakernel rests on: the grid sync (≤512 TGs, already verified) AND cross-threadgroup
coherency. This is the foundation the 300+ path builds on — the whole decode layer in one dispatch with
internal grid-syncs replacing the ~15 external barriers/layer (the FFN-fusion ceiling was ~190 because the
barriers are spread layer-wide, so the win needs the full layer fused, not a region). Next: 4-bit dequant +
the real layer ops on this pattern.
…cal to the steel qmv

lthn_qgemv computes out[o] = Σ_k (scale_og·code_ok + bias_og)·x[k] with the SAME affine dequant as the
embed-gather's verified 4-bit path (low nibble even k, high odd; affine params hoisted per group), one
thread per output row. The decode's matmuls use MLX's steel affine_quantized gemv (simd-cooperative, tiled)
which a megakernel can't call — this is the gemv the megakernel inlines instead. TestQGemvMatchesSteel:
cosine=1.000000 vs QMVBF16 (the steel kernel) on the same packed weight — the nibble/group layout matches
and the dequant is sound (token-identical; reduction order differs so not guaranteed byte-identical at scale).

All three megakernel primitives are now proven: grid-sync (512 TGs), the 2-gemv+grid-barrier pattern, and
this 4-bit gemv. Next: combine them — a 2-stage 4-bit megakernel, then gelu, then the real layer stages.
…dispatch

lthn_ffn_megakernel does gemma's MLP — gate=qgemv(Wg,x), up=qgemv(Wu,x), gated=gelu(gate)·up, [in-kernel
grid barrier], down=qgemv(Wd,gated) — in ONE dispatch, replacing the decode's three barriered ICB ops
(gate/up + gelu·up + down) with stages separated by the proven device-wide grid barrier instead of external
SetBarrier full-drains. Inlines the verified 4-bit affine dequant gemv + the gelu matching
lthn_gelu_gate_mul_bf16 (gate/up rounded to bf16 before the gelu, as the separate-op path does).

TestFFNMegakernel: stage-1 cosine 1.000000 vs the reference — proves the STRUCTURE end-to-end through the
grid barrier (gate/up/gelu + cross-TG coherency all exact). Stage-2 down tracks the steel qmv at cosine 1.0
on well-conditioned input (TestQGemvMatchesSteel) but the simple sequential reduction diverges to 0.99 on a
pathological random-weight gated distribution — a reduction-order sensitivity (benign on real e2b weights).
The robust-precision next step is a simd-cooperative qgemv reduction matching the steel order, then stack the
attention stages ahead of this and token-validate on real e2b.
…diagnosis (it's coherency)

lthn_qgemv_simd: one 32-lane simd group per output, lanes split the reduction + simd_sum combines — the
SIMD-tree reduction that tracks MLX's steel qmv order, the robust gemv the megakernel inlines.

But building it surfaced that my earlier diagnosis was WRONG. TestQGemvSimdBeatsSequentialOnGated runs the
exact FFN down-over-gelu·mul case and BOTH the sequential and simd gemvs match the steel qmv at cosine
0.999999 — so the gemv reduction was never the FFN megakernel's 0.99. The real cause: the in-kernel grid
barrier's cross-TG MEMORY COHERENCY. Stage-1 gated copied out AFTER the kernel is exact (cosine 1.0), but
stage-2 read STALE gated for elements written by DISTANT threadgroups (the gemv2 megakernel passed because
its readers overlap its 2 writing TGs; the FFN's readers don't overlap its 4). Metal has no device-wide
fence beyond threadgroup_barrier and only memory_order_relaxed atomics — so distant-TG writes aren't reliably
visible after the grid barrier. THIS is the megakernel's real blocker, surfaced honestly. The simd gemv
stands as the correct reduction; the next block is a robust cross-TG-coherent grid barrier (or the approach
caps here on Metal).
…unt-bound, no fat gemv (megakernel coherency proven hard)

Two findings that close the perf-kernel area on e2b decode with evidence:

1. TestRealE2BWithinLayerOpCost — times each ICB op as its own command
   buffer (GPUEndTime-GPUStartTime; gemv timing is value-independent, so
   stale intermediates don't corrupt it). The per-op histogram shows every
   op at 7-23µs with SDPA the only outlier (23µs) — the fat gemvs
   (q/o/gate/up/down) do NOT stand out. e2b's 4-bit per-layer weights are
   tiny; the cost is op-COUNT-dominated (a ~7-10µs per-dispatch floor),
   not op-SIZE. So dispatch-count reduction is the lever, not gemv tiling —
   but the elementwise glue (gelu, rms+residual, qknorm+rope) is already
   fused, leaving a near-irreducible serial chain ~9-10 barriers/layer.

2. FFN megakernel coherency is a hard Metal limit, now proven a SECOND way:
   `volatile` on the cross-TG `gated` buffer (stage-1 write + stage-2 read)
   left stage-2 bit-identical at 0.990169 — not a compiler-caching artifact.
   Combined with the absent device-wide fence, the grid-barrier megakernel
   cannot do a coherent cross-TG reduction read in one dispatch. The
   300-via-megakernel path is hardware-blocked.

Net: 180 tok/s is near the coherency-free single-token-decode ceiling for
this approach. Also: scoped MemoryBarrierWithScope replay measured SLOWER
(112.8 vs 180.7) — that lever is dead too. 300 needs either cross-TG
coherency (Metal can't) or the MTP speculative lane (gemma4 ships drafters).

Co-Authored-By: Virgil <virgil@lethean.io>
…~15x slower than it should be

Profiling the real gemma-4-26B-A4B-it-qat-4bit on the no-cgo path (gated
LEM_REAL_MOE) found the big inefficiency the dense-decode polishing missed:

  native MoE decode  = 7.8 tok/s (127 ms/token)
  cgo pkg/metal      ≈ 114 tok/s  (the engine pkg/native is replacing)
  4B-active bandwidth ceiling ≈ 400 tok/s

The CPU profile is 71% cgocall (GPU-call wait) — NOT a compute wall but
hundreds of tiny host-synced dispatches per token: the MoE arch can't use
the recorded-ICB replay (the router top-k forces a host readback), so the
re-encode path does a per-MoE-layer Commit+Wait for the attention flush AND
MoEBlockQuant fans out into ~a dozen more separately-synced GPU command
buffers (router, 5x rmsNormView, dense branch, per-expert dispatch, 2x Add),
every layer, every token. The host serialises the GPU instead of feeding it.

This is GOAL.md's documented "Next: ICB MoE dispatch" and the real headroom:
the dense decode is at its GPU-dispatch ceiling (megakernel coherency-blocked),
but the MoE path bleeds ~15x to host-orchestration that a GPU-resident,
single-encoder MoE block would reclaim.

Co-Authored-By: Virgil <virgil@lethean.io>
…odels are OFF the ICB fast path

LEM_PROFILE_DIR aims the op-cost instrument (tok/s + gpu-busy + per-op +
ICB-rejection geometry dump) at any model snapshot, not just e2b. Pointing it
at gemma-4-12B-it-4bit found the systemic weakness behind native losing to
llama.cpp on the big models:

  12B dense decode = 51 tok/s, gpu-busy 0%, NO recorded ICB
    arch heads=16 kvHeads=8 headDim=256, 48 layers
    per-layer kvHeads = {1: 8, 8: 40}   <- 8 global layers use kvHeads=1 (MQA)
    per-layer headDim = {256: 40, 512: 8}

icbEligible rejects any layer where kvHeads != arch.nKVHeads, so the 8
multi-query global layers kick the WHOLE model off the ICB fast path onto the
host re-encode (51 tok/s vs llama.cpp 64). e2b stays on ICB because its KV
heads ARE uniform — it only varies headDim, which the ICB already records
per-layer. The ICB fast path therefore only covers e2b/e4b; 12B (51), 26B-MoE
(7.8) and 31B (~21) all fall onto the slow path. THAT is the real coverage gap
(GOAL.md 91.5%), not the dense-decode kernel ceiling I was polishing.

Fix is scoped: the ICB already does per-layer headDim (e2b global 512 vs
sliding 256); extend the SDPA PSO + GQA buffer + cache rowBytes to per-layer
kvHeads the same way, and the 12B/31B join the fast path.

Co-Authored-By: Virgil <virgil@lethean.io>
…is at the alloc floor

BenchmarkSpikeE2BReplayOnly records the ICB ONCE then replays a single token
per b.N, isolating steady-state per-token cost (the existing spike benches
re-record + replay all 64 tokens every iteration, burying the per-token figure
under one-time recording + fixture build).

Result: 2553 allocs/op, 120 KB/op per replayed token. FLAT native allocator is
stepBodyResult at 342 — the inherent output copy + per-layer cache-rebind
bookkeeping; go-mlx's own replay code is at the floor, no per-token leak. The
remaining ~2200 allocs/token are the purego Metal bridge (per-owns-cache-layer
SetKernelBufferOffsetAtIndex rebinds hitting objc.Send's slow reflect path
because tryFastArgs doesn't unwrap IDGetter). The bridge fast-path fix is the
only alloc lever; our code isn't leaking per token.

Co-Authored-By: Virgil <virgil@lethean.io>
Snider and others added 30 commits July 3, 2026 20:58
…atch

lthn_sdpa_multiq_bf16_{64,128,256,512}: MLX's sdpa_vector loop (vendored
lib/mlx sdpa_vector.h) with the query batch on grid Y, specialised for the
batched pass: N binds the total live length and each query s uses key i iff
i <= N-K+s — upstream's do_causal cap, which is exactly the fold's per-row
length cap, so causality needs no mask storage. Queries AND out are
query-major (the engine's slab layout feeding the batched O-projection);
upstream writes out head-major — the one divergence. Mask/sink branches
stripped (the engine never binds them). Skipped keys touch no accumulator
and used keys stride in the same simd_gid sequence, so each row's output is
byte-identical to K single-query dispatches.

The fold's per-row tail keeps the ordered rope/value-norm landings and
hoists only the SDPAs: gated on the direct/no-evict landing AND
basePos+K < sdpa2PassMinKV so every row matches the routing the sequential
oracle takes (beyond the knee the per-row 2-pass path stays).
sdpaMultiQDisabledForTest is the A/B lever;
TestStepTokensBatchedDenseMultiQSDPAEngagesAndMatchesPerRow pins engagement
(strictly fewer dispatches) + hidden/KV byte-identity.

E2B bf16 (temp0): prefill 600tok 1.97s -> 1.67s, 170tok 0.83s -> 0.57s
(5.0s / 1.75s at the session start — 3x both); MTP unchanged (K~5 rounds
were not SDPA-dispatch-bound). Output byte-identical to plain and to the
session-start baseline. Suites 1223 + 1304 green -count=1 (one unrelated
sync.Pool observation flake, TestHeadEncoderEncodeInto..., failed once in
a full run and passes alone + on the full rerun).

Co-Authored-By: Virgil <virgil@lethean.io>
… one

lthn_qknorm_rope_rows_bf16: the fused per-head QK-norm + RoPE kernel with
the row batch on grid Y — x/out advance by a caller-supplied element stride
per row and the position comes from offset[row], the batched pass's packed
per-row positions buffer (the per-row dispatches were already reading that
same buffer one int at a time). Per-(row, head) body is the single-row
kernel verbatim, so each row is byte-identical to a per-row dispatch.

The attention fold now encodes ONE Q-rope for the K slab rows and — on the
direct/no-evict landing — ONE K-rope over the contiguous cache rows plus
ONE value-norm (the existing rms-rows kernel with rows=K·kvHeads; the K
per-row calls were tiling exactly those head-rows). A staged ring keeps the
per-row landings (its outputs are slot-wrapped). This also removes the
~2K-deep per-layer hazard serialisation the per-row rope chain imposed on
the q slab and cache.

batchedRopeDisabledForTest is the A/B lever;
TestStepTokensBatchedDenseBatchedRopeEngagesAndMatchesPerRow pins
engagement (strictly fewer dispatches) + hidden/KV byte-identity.

E2B bf16 (temp0): prefill 600tok 1.67s -> 1.52s, 170tok 0.57s -> 0.51s
(5.0s / 1.75s at the session start — 3.3x both); MTP short 36.7 tok/s.
Output byte-identical to plain and the session baseline. Suites 1224 +
1305 green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
… across rows

The prompt-size matrix showed prefill LINEAR at ~2.5ms/token with no knee
at the sliding window: the cost was never the staged tail (the ring-wrap
chunking already confines it) but the per-row epilogue dispatches surviving
in the folded lane — above all the PLE gate: 5 hazard-serialised dispatches
per row per layer (~90k for a 512-row chunk on all-PLE E2B).

The fold: the PLE slab goes LAYER-major ([numLayers × K × pliDim], writer
pleSlabFor + the per-row reader reindexed), so each layer's K token slices
are contiguous and the whole gate chain batches — gate gemv (grid Z) →
gelu·pli over K·pliDim → proj gemv → post-norm rows → one add. The layer
scalar applies through lthn_mul_rows_bf16 (one b row broadcast across K
rows — per-element float math identical to K vv_mul dispatches). Entry rms
and both residuals batch the same way (rms-rows + one add over K·dModel;
encResidualRowsMaybeNorm). Layer 0 with direct input views and the last
layer with direct output rows keep the per-row path; the free fold slabs
serve as the chain's scratch (hazard-ordered reuse).

batchedEpilogueDisabledForTest is the A/B lever;
TestStepTokensBatchedDenseBatchedEpilogueEngagesAndMatchesPerRow pins
engagement + hidden/KV byte-identity.

E2B bf16 (temp0): prefill 600tok 1.52s -> 689ms, 170tok 509ms -> 236ms —
7.3x from the session start (5.0s/1.75s); MTP short 36.7 -> 39.4 tok/s.
Output byte-identical to plain, the session baseline, AND the pre-epilogue
600tok output. Suites 1225 + 1306 green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
…jections

At steelGEMMMinRows (64) and above, encGemvBF16BatchedAt routes to MLX's
steel_gemm_fused_nt bf16 kernel (main metallib, resolved by mangled name
with has_batch/use_out_source/do_axpby baked false and the align_M/N/K
function constants keyed per shape): ONE simdgroup-matrix GEMM reading the
weight once for all rows — D[rows×outDim] = act[rows×inDim] @ Wᵀ, the
GEMMParams struct bound as inline constant bytes. All nine batched
projection sites (Q/K/V/O, gate/up/down, PLE gate/proj) upgrade through
the one router.

This is the deliberate token-identity trade: steel accumulates per output
tile, a different summation order from the per-row gemv, so large-row
prefill no longer matches the sequential oracle byte for byte — exactly
pkg/metal's GEMM-prefill property. Below the threshold (MTP verify blocks,
every parity fixture) the grid-Z gemv keeps strict byte-identity, so the
whole existing parity suite pins unchanged. The closeness test checks
steel-vs-gemv per-element bf16 agreement on the aligned AND bounds-checked
shapes, with engagement via a steel dispatch counter.

E2B bf16 (temp0): prefill 600tok 689ms -> 389ms, 170tok 236ms -> 143ms —
12.9x from the session-start 5.0s, now 2.2x from pkg/metal's 175ms. Short
outputs byte-identical (gemv lane); the 600tok greedy output emitted
IDENTICAL tokens through the GEMM rounding on the reference prompt.
Suites 1226 + 1307 green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
…tches

Past the sliding window every chunk evicts, and the per-row landing+SDPA
interleave was the tail's cost (each row's SDPA had to read the ring
before the next row's landing overwrote an evicted slot). The lane that
removes it DEFERS the landings: K/V project into the layer's PRIVATE stage
(lthn_qknorm_rope_rows + the value norm run in place there), ONE
two-segment multi-query SDPA (lthn_sdpa_multiq_ring) reads the pre-batch
ring minus each query's evicted run [slotBase, slotBase+s] plus the staged
causal rows [0..s], and the ring lands afterwards in at most two
contiguous-run copies (lthn_copy_bf16) — encoded after EVERY layer has
read the pre-batch state.

Shared-KV layers ride the owner's persisted stage and pre-batch ring —
the true sequential window, which the per-row tail could never give them
once the owner had landed (that end-state read remains only in the
small-K per-row lane, as before). Landed ring bytes are IDENTICAL to the
per-row path (the landing copies the same roped/normed bytes); the SDPA
accumulation order differs — the token-identity trade, engaged only at
steelGEMMMinRows with a FULL ring. stagedRingDisabledForTest is the A/B
lever; TestStepTokensBatchedDenseDeferredRingLandingMatchesPerRow pins
engagement (ring dispatch counter), byte-identical landed KV, and
tolerance-close hiddens against the per-row lane.

E2B bf16 (temp0): prefill stays ~0.6-0.7ms/token PAST the window —
1000tok 559ms, 2000tok 1381ms (the per-row tail ran ~2.5ms/token);
600tok unchanged at 389ms (its 88-row tail was already small). Suites
1227 + 1308 green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
…hold 32

batchedGPUTrace (LTHN_GPU_TRACE=1): the pass's command buffer splits at
named stage boundaries — each segment commits, waits and charges its
GPUStartTime->GPUEndTime span to the stage that just ran, accumulated
across the layer loop and reported per chunk to stderr. Splitting
serialises (~6 CB round-trips per layer), so the report carries the traced
total alongside the shares. Zero cost off (nil-receiver checkpoints).

First real decomposition (E2B bf16, 600tok):
  chunk1 512 rows, GPU 150ms: mlp 57%, epilogue 15%, sdpa 10.5%,
    qkv 8.8%, o+resid 8.7%, rope <1%
  chunk2 44 rows, GPU 75ms: PER-ROW lanes — 1.7ms/row vs chunk1's 0.29
The tail chunk sat below BOTH large-K gates. Dropping steelGEMMMinRows
64 -> 32 (MTP verify at K<=16 keeps its byte-identity margin) engages
steel + the deferred ring there: chunk2 GPU 75 -> 62ms (ring SDPA
28.4 -> 3.0ms), 600tok wall 389 -> 368ms. The remaining tail cost is the
weight-read-bound skinny MLP GEMM at 44 rows — chunk-boundary physics.

Second finding, now measurable: chunk GPU totals (~213ms) vs ~368ms wall
puts ~150ms in HOST-side work (embedding, chunk syncs, CB overhead) —
the next vein, previously invisible inside the single wall number.

Suites 1227 + 1308 green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
… become one

The host-phase spans (hostSpan under LTHN_GPU_TRACE) decomposed the wall-vs-
GPU gap the trace exposed: pleSlab was 183ms of the 600tok prefill's 368ms
— HALF the wall, bigger than the whole GPU pass. Not math: the perLayerInput
closure runs each token's projection chain in its OWN command buffer, so a
512-row chunk paid 512 CB submit+waits.

perLayerInputsBatchIntoSlab builds the whole batch in ONE command buffer:
host-gather the per-layer embeddings token-major, stage the hidden rows,
then projected = hidden @ projWᵀ as ONE steel GEMM, ×1/√dModel (mul-rows
broadcast), rms per (token,layer) row, +perLayer, ×1/√2 — and scatter
layer-major into the slab. Wired as perLayerInputBatch beside the per-token
closure; pleSlabFor tries it first. Gated K >= steelGEMMMinRows + the bf16
resident projection (quant PLE and small batches keep the per-token loop —
every PLE parity fixture at K=8 stays byte-identical). The GEMM makes the
big-K slab token-identity, the pass's standing policy.

E2B bf16 (temp0): pleSlab host 183 -> 50ms; prefill 170tok 143 -> 93ms,
600tok 368 -> 235ms, 2000tok 1381 -> 938ms. pkg/metal's 175ms @600tok is
now 1.34x away (28x at the session start). Short outputs byte-identical.
Suites 1227 + 1308 green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
The batched slab builder's remaining host work was the Go bf16 gather
(K × plDim scale-copies), the token→layer-major scatter and the readback —
~47ms of the 50ms pleSlab span. Two bookend kernels keep the whole build
on-device:

  lthn_ple_gather_rows_bf16 — K tokens' per-layer embedding rows gathered
    + scaled in one dispatch (ids from a device buffer; the bf16 twin of
    the quant lthn_embed_gather), replacing the embedTokenBF16Into loop.
  lthn_ple_relayout_bf16 — token-major → layer-major on-device (pure
    copy), replacing the host scatter; projectedBuf is reused as the
    relayout destination once the rms has consumed it.

Host keeps only the ids upload, the 1.5MB hidden stage and ONE straight
copy out of the already-layer-major result. Same command buffer as the
GEMM chain; kernels-unavailable falls back to the per-token loop.

E2B bf16: pleSlab host span 50 -> 2.9ms steady state (183ms three commits
ago — 63x; the ~650ms first call is one-time PSO compile + table
residency, absorbed by warmup). Prefill 600tok 235 -> 228ms (stable x3),
170tok 93 -> 92ms. Short outputs byte-identical. Suites green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
…ning

The MLP bucket's roofline said the margin was small before touching
anything: ~1.8 TFLOP at 85.5ms is ~75% of this chip's bf16 MMA peak, so
the steel GEMM was already near compute-bound. The one lever the kernel
offers is the threadblock swizzle mlx's host code applies and our wrapper
hardcoded to 0: swizzle_log=2 on tall grids (tilesM > 3, this device
class) interleaves the tile walk so neighbouring threadgroups share B
panels in L2, with the grid reshaped (tilesN<<sw, ceil(tilesM>>sw))
exactly as matmul.cpp does.

Applies to every steel site (MLP, QKV/O, PLE gate/proj/slab). E2B bf16:
mlp bucket 85.5 -> 84.0ms; 600tok wall flat at 227-228ms — the honest
read is the MLP GEMM is now mined: compute-bound at ~76% of peak, the
same arithmetic pkg/metal pays. Short outputs byte-identical (the swizzle
reorders the tile WALK, not any accumulation). Suites green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
The 44-row tail chunk paid a full weight sweep for a handful of rows
(1.7ms/row vs the 512-chunk's 0.29). It existed because the chunker split
at the ring wrap — but the deferred-ring kernel was one window bound away
from CROSSING it:

  * ring segment loops ring_live = min(basePos, slideW) rows (a partial or
    fresh pre-batch ring); the existing slot-distance exclusion formula
    already covers the partial case unchanged.
  * staged segment gains the sliding window lower bound
    (i + slide_w > s) — binds only when the batch is wider than the window.
  * only the LAST slideW rows land (a wider batch evicted its own head
    rows during the batch); the two-run copy takes a row offset.

The chunker now absorbs a small tail into one crossing chunk
(limit <= remain + slideW/2) and the One() wrap guard is gone — the pass
has been wrap-correct since the staged lanes landed (the per-row fallback
is the sequential interleave at any basePos). 600tok therefore runs as ONE
600-row chunk: one weight sweep, no skinny GEMMs.

The crossing oracle test exposed an over-strong assertion in the existing
deferred test: landed KV is byte-exact at LAYER 0 (same inputs, only the
landing mechanics differ) but later layers inherit the SDPA's
token-identity hiddens through their projections — the full-ring test had
passed all-layer byte-identity by numerical luck. Both tests now assert
byte-exact layer 0 + tolerance beyond (the bound catches layout breaks,
which diverge by orders of magnitude).

E2B bf16 (temp0): prefill 600tok 228 -> 196ms (stable x3) — pkg/metal's
175ms is 1.12x away (28x at the session start); 170tok 92ms, 2000tok
931ms unchanged (no skinny tails there). Short outputs byte-identical AND
the 600tok tokens identical to the pre-merge build. Suites green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
The ICB recorders picked ONE V-projection index from layer 0 and applied
it to every layer. The 12B-unified checkpoint is MIXED per layer: sliding
layers carry their own v_proj, global (k_eq_v) layers don't — so with
layer 0 sliding, all 8 global layers recorded their V projection from an
EMPTY weight slot. Garbage V rows in the cache from position 0; the
model emitted "a a a a…" while metal decoded the same checkpoint fine.

Both ICB lanes (quant session + bf16 whole-seq) now resolve vProjIdxOf
per layer: V absent ⇒ the k-proj (matching metal's clone-before-norm
k_eq_v semantics and the per-layer hasV() the re-encode lane always had).

Receipts:
- cross-engine per-step vs metal (real 12B-4bit): pos-0 hidCos
  -0.376 -> 0.995 (full 48 layers); layer-truncated 6-layer checkpoint
  0.44 -> 0.999. Sliding-only truncation was already clean (0.9999) —
  the divergence lived entirely in the global layers.
- live decode: "What is the capital of France?" -> "The capital of
  France is **Paris**." (finish=stop); 92-token paragraph coherent.
- TestDecodeForwardArchICBMixedKEqV: mixed sliding-with-V +
  global-without-V through the whole-seq ICB ≡ re-encode oracle
  byte-for-byte; verified RED under the old layer-0 semantics.
- suites: pkg/native metal_runtime -count=1 ok (67.8s); untagged ./... ok.

Residual (noted, separate): cross-engine hidCos drifts to ~0.94 by pos
39 on the full stack — scales with layer count (1L flat 0.9999, 6L
~0.98, 48L ~0.94), consistent with accumulation-order noise amplified
by the 512-dim global heads; output coherent.

Co-Authored-By: Virgil <virgil@lethean.io>
…arch dump, feature mix

The #254 debugging instruments, all env-gated (CROSS_12B_DIR) so they
skip by default and cost CI nothing:

- TestCrossEngine12BPerStep: per-position embed/hidden cosine native vs
  metal on a real checkpoint + per-layer capture diff at one step (the
  NO_ICB discriminator env forces the re-encode lane). Points the dir at
  a layer-truncated copy (patched config.json num_hidden_layers +
  layer_types, symlinked safetensors) to bisect by layer — this is what
  localised #254 to the global layers in two runs.
- TestCrossEngine12BWeightAudit: load-vs-semantics discriminator — every
  projection dequantised both sides (first 4 rows), each native norm
  slot scored against all four metal layer norms (slot-swaps show as
  cross-matches), Q/K norms raw+scaled, layer scalars. Exonerated the
  12B load wholesale (all cosines 1.000).
- TestArchQuantSession12BFeatureMixMatchesBF16: quant session vs
  dequantised-bf16 twin at the real 12B geometry (k_eq_v, kv=1, hd
  256/512 mix) — the native-internal parity gate.
- TestCrossEngine12BArchDump: config-vs-derivation audit print.
- qmv 12B non-fast case (outDim 16, inDim 3840, gs 64).

Co-Authored-By: Virgil <virgil@lethean.io>
…he period table

The 12B residual cross-engine drift (hidCos decaying to ~0.94 by pos 39)
was a real gap, not accumulation noise: newArchDecodeState built the
global layers' proportional rope periods by passing the PRE-FOLDED
arch.RopeBase (raw^(rotaryDim/headDim), folded for the base-derived
÷rotaryDim kernel path) into proportionalRopePeriods, whose exponent
divides by the FULL head dim and therefore expects the RAW theta. The
fold applied twice left every global period at the 4th root of metal's
(0.25 partial rotary) — exact at position 0, angle error growing
linearly with position. Completes 8938adb, which fixed the exponent
to match metal but missed that this caller feeds the folded base.

globalRopePeriodsFromFolded is the named seam: unfold back to raw theta,
then build the spectrum; the hermetic pin test asserts the folded-base
path reproduces raw^(2i/headDim) exactly at the real 12B geometry
(512/128/1e6) with the +Inf unrotated tail.

Root-caused by a background Opus agent (patch-and-restore protocol,
tree left byte-identical): hermetic parities exonerated SDPA
(global-geometry 1.000000), RMSNorm and GeluGateMul; the numeric proof
was native periods == metal freqs^0.25 to all digits.

Receipts (real 12B-4bit, TestCrossEngine12BPerStep):
- hidCos slope per position: -0.00199 -> +0.00023 (decay signature gone)
- pos 39: 0.94008 -> 0.99497; mean pos 20-39: 0.94660 -> 0.99307
- positions 0-2 unchanged (rope is identity at pos 0)
- live: primes + why-1-is-not-prime answered correctly at 61 tokens
- pkg/native metal_runtime suite green -count=1 (72.9s)

Co-Authored-By: Virgil <virgil@lethean.io>
…r-1 lifts (contracts + kv, bundle, artifact, probe, blockcache, spine-core, safetensors-index, autoround, profile, modelpack, memory)

Co-Authored-By: Virgil <virgil@lethean.io>
…hem from go-inference

The engine-merge Tier-1 lifts are all merged on go-inference dev
(submodule pinned at 3d4eb6a), so the local copies die: probe,
blockcache, artifact, bundle, kv, safetensors, quant/autoround, profile,
pack, memory. Every importer re-pointed to
dappco.re/go/inference/<pkg>; pack's consumers re-point to
inference/modelpack with a `pack` import alias so bodies are untouched
(the package was renamed on landing to clear the name collision with
inference's model/pack manifest).

Process-of-elimination findings: nothing bound. Zero symbol gaps, zero
type-identity breaks (go-mlx lora was already an alias onto
inference/lora; the lifted copies are verbatim-today so no drift
existed). kv's bench harness (the one file left behind) has zero
external consumers.

Receipts: go build ./... clean; go vet ./... clean; untagged go test
./... -count=1 green; tagged pkg/native (73.4s) + session + agent +
spine + kvconv + root green — the one root failure is #258's
pre-existing order-dependent SFT smoke (passes alone, predates this
change).

Still local by design: spine (partial lift — spine.go is the Wave-B
config reconcile), agent + session + kvconv (Wave B: #259 + the
SessionHandle re-home), pkg/safetensors (engine-side, distinct from the
lifted root safetensors).

Co-Authored-By: Virgil <virgil@lethean.io>
…#259)

The native engine already captured/restored conversation KV state in the
portable kv.Snapshot wire shape (session_kv_snapshot.go) with zero kvconv
imports; the gap was contract-shape conformance. Bind that machinery to the
engine-neutral inference contracts (external/go-inference kvstate.go), mirroring
pkg/metal:

- NativeTokenModel (loaded decode model) now holds an optional tokenizer
  (AttachTokenizer) and satisfies inference.KVSnapshotter + KVChunkSnapshotter:
  CaptureKV(ctx, prompt, opts) / CaptureKVChunks tokenise → transient OpenSession
  → PrefillTokens → ArchSession.CaptureKVWithOptions, returning *kv.Snapshot
  directly (no metal.KVSnapshot, no kvconv). Mirror of metal.Model.CaptureKV.
- ArchSession (the model's cache) satisfies inference.KVRestorer via the new
  ctx-shaped RestoreFromKV shim over RestoreKV, and inference.PromptCacheClearer
  via the existing ClearPromptCache. Mirror of metal.ModelSession.
- Compile-time assertions pin all four; kvCaptureOptionsFromInference bridges the
  identical option fields.

Root composition (native_model.go probes) can now switch from the metal-typed
nativeKVSnapshotter to inference.KVSnapshotter/KVRestorer and drop kvconv on the
native lane — the probe surface is identical to the metal engine's.

Gap (reported, not stubbed): the model-level string-prompt prompt-cache warmers
(inference.PromptCacheWarmer / PromptCacheChunkWarmer) retain a warmed cache
across calls; NativeTokenModel is stateless (sessions are caller-owned), so a
model-level warmer would need a retained-session lifecycle this engine does not
carry (the serve layer's nativeTextModel.cacheSess owns it). pkg/native exposes
warming at the session level in token-id terms (ArchSession.WarmPromptCache).

Receipts:
- kv_contract_test.go: TestNativeTokenModelCaptureKVRestoreFromKVContinues —
  CaptureKV(prompt) via inference.KVSnapshotter → RestoreFromKV via
  inference.KVRestorer → GenerateFromCache is token-identical to the
  uninterrupted greedy run (hermetic synthetic gemma4 + tiny BPE tokenizer).
  Plus chunk-capture and guard tests. 3 new tests green.
- tagged pkg/native: 1321 passed -count=1; untagged ./...: 7836 passed -count=1.
- grep: zero dappco.re/go/mlx/kvconv references in pkg/native.

Co-Authored-By: Virgil <virgil@lethean.io>
… local copy deleted

The wake/sleep conversation-memory implementation now lives at
dappco.re/go/inference/state/agent (it implements state's Wake/Sleep
contracts — Snider's placement). Every go-mlx importer re-pointed
(cmd/mlx, root continuity + session files, session/); submodule bumped
to d753ca3 which carries the lift.

Receipts: go build ./... clean; untagged ./... green -count=1; tagged
session + pkg/native green -count=1.

Co-Authored-By: Virgil <virgil@lethean.io>
…e go/session

Engine-merge Wave B session re-home (docs/engine-merge.md in go-inference).
REQUIRES the go-inference lift/session-rehome branch merge + submodule bump:
this commit compiles against dappco.re/go/inference @ lift/session-rehome
(inference.SessionHandle/SessionFactory, kv.BlockSource/StateBlockSource,
inference/state/session), not the current external/go-inference pin. The
orchestrator merges that branch, bumps the submodule, and runs the final
cross-repo build before pushing.

- go/session is DELETED — the package now lives at
  dappco.re/go/inference/state/session speaking only inference types.
  Importers re-pointed: root session.go, session_agent.go,
  session_defaults.go, cmd/mlx/generate.go, conversation_continuity.go.
- metal_session_adapter.go (new): wraps metal.SessionHandle as
  inference.SessionHandle; kvconv + spine.ToMetalProbeSink survive only
  inside this adapter and die with pkg/metal. Full config parity with the
  retired spine.ToMetalGenerateConfig session path (plus EnableThinking,
  which the old lane could not carry).
- nativeTextSession (register_native.go) re-expressed onto the neutral
  contract: Generate/CaptureKV/RangeKVBlocks/RestoreKV(Blocks)/Fork now in
  inference/kv types; snapshotFromNativeBlock builds kv.Snapshot directly
  (fixes the uint32-DType-to-string rune bug the retype exposed — dtype
  now via kvconv.RootKVHeadDType, reproducing the old pipeline byte-for-
  byte). The shared low-level native state-source chain stays metal-typed
  for the model-level KV contract; the session boundary converts inward.
- Root Model.NewSession dispatches neutral-first (native lane) then wraps
  the metal factory; ModelInfo/Tokenizer bridge via direct struct
  conversion + the new spine.Tokenizer.Impl() accessor.
- rootGenerateOptions + stateKVChapterGenerateOptions emit
  inference.GenerateOption (session Generate* now takes inference options;
  model-level mlx options unchanged).
- internal/sessionfake re-pointed to the neutral contract; root + cmd/mlx
  tests retyped accordingly. The session reserialize HOT LEAD eval test
  re-homes to the root package (metal integration — the adapter + kvconv
  bridge is now inside the loop it proves).

Untagged ./go/... green (60 pkgs). Tagged pkg/native green. Tagged root
green except the known pre-existing TestSFTNativeSmoke_Gemma4Q6...(#258,
full-population only; passes alone — verified).

Co-Authored-By: Virgil <virgil@lethean.io>
…ract + state/session home (dd04a26 tip)

Completes the session re-home pair with f2497d9: root dispatches the
neutral SessionHandle, go/session is deleted, metal wraps via the
kvconv-internal adapter. Cross-repo receipts: go build ./... clean,
untagged ./... green -count=1, tagged pkg/native ok (67s).

Co-Authored-By: Virgil <virgil@lethean.io>
…n bumped to KVBits dev

Bumping external/go-inference to 580d183 (scheme.CacheWidth — cache-mode
width as a registry capability) surfaced the width-stripping bug class
in OUR drivers: pkg/metal's init re-registers default/fixed/paged/q8/
k-q8-v-q4/turboquant with compute values that lack CacheWidth,
overwriting the width-carrying builtin stubs — the memory planner then
silently sized every one of those modes on the ×2 default lane
(TestMemoryPlan_KVCacheQ8ForMiddleMemoryClasses_Good caught q8 == fp16).
Same class inference/kv already fixed for its turboquant upgrade.

registerCachePreservingWidth is the one-helper fix: driver values
without their own width register wrapped with the prior registration's,
keeping BOTH the compute surface (CacheCompute — pinned by test) and the
planner width. Modes with no prior width (compaction, mla-latent)
register as-is. pkg/scheme re-exports CacheWidth with the rule
documented.

Receipts: the failing planner test green; width-survival table test
(q8 1/1, k-q8-v-q4 3/4, turboquant 7/16 ceil, default/fixed/paged 2/1)
green; untagged ./... green; tagged pkg/native green. Known exceptions,
both pre-existing/unrelated: #258's order-dependent SFT smoke, and
TestMlx_GC_Bad firing on an untracked stray pkg/hip copy (in-flight
agent work, not part of this commit — the guard also flags a real
finding for the hip landing: its benchmark calls runtime.GC directly).

Co-Authored-By: Virgil <virgil@lethean.io>
…(Tier 4 entry)

Source: census clone of https://github.com/dappcore/go-rocm.git, dev @
308c4d6 (read-only, never modified). Lands as go/pkg/hip on this
worktree's quarantine/hip branch (cut from dev @ 41ca3ee).

WHAT MOVED (105 .go files + kernels/, 102,149 LOC total)
- 102 top-level go/*.go files (of 116 originally) → go/pkg/hip/*.go,
  package rocm renamed to package hip throughout.
- go/internal/gguf (3 files) → go/pkg/hip/internal/gguf, KEPT as a
  quarantined duplicate rather than eliminated (see deferred #6) — its
  self-import rewritten dappco.re/go/rocm/internal/gguf →
  dappco.re/go/mlx/pkg/hip/internal/gguf (discover.go, native.go).
- kernels/rocm_kernels.hip (9,560 LOC HIP C++) + kernels/README.md →
  go/pkg/hip/kernels/, unchanged.
- New file hip_shared_helpers.go: firstPositiveInt + rocmLabelUint
  relocated out of the excluded model_pack.go (see deferred #2) because
  7 other files still need them and neither depends on the missing
  package.

WHAT WAS EXCLUDED (15 files from the original 116)
- model.go, server.go, backend.go, server_test.go,
  server_example_test.go — the legacy llama-server subprocess bridge
  (rocm_legacy_server build tag, superseded by the native HIP path per
  the repo's own docs/history.md). Needed internal/llamacpp (6 files,
  also not landed — nothing else imports it).
- compat_handlers.go/_test.go/_example_test.go, openai.go/_test.go/
  _example_test.go/_scheduler_test.go — OpenAI/Anthropic/Ollama wire
  mounts. go-mlx's currently vendored go-inference (external/go-inference
  @ dd04a26) has no anthropic/ollama/openai packages — a submodule-pin
  gap, not something to fix here. Grep-confirmed no other file calls
  their exported functions; zero cascade.
- model_pack.go, model_pack_example_test.go, native_contract_test.go —
  see deferred #2 (missing dappco.re/go/rocm/model + model/gemma4
  packages). native_contract_test.go alone has 60 InspectModelPack call
  sites; too pervasive to excise function-by-function safely.

BUILD TAGS: preserved verbatim per Snider's correction (no blanket
darwin fencing — the source's own tags already separate the CPU/pure-Go
lane from the HIP/cgo lane). One narrow fix:
- hip_projection_reference.go: added the missing
  `//go:build linux && amd64 && !rocm_legacy_server` tag. It references
  hipMLXQ4ProjectionBits and hipQ8ScaleIsPositiveFinite, both defined
  only in linux&&amd64-tagged siblings (hip_projection_launch.go,
  hip_transformer_launch.go) — its own _test.go sibling already carried
  this exact tag, so the .go losing it looks like an upstream oversight,
  not deliberate design. Narrowest fix, matches existing intent.

CONTRACT CHANGE DISCOVERED + FIXED: inference.Backend.LoadModel's
signature changed from (inference.TextModel, error) to core.Result
between go-rocm's era and go-mlx's current go-inference pin (dd04a26).
Fixed both real implementations (native.go, rocm_stub.go) using
core.ResultOf/core.Fail; added the package-local resultError +
errHIPResultFailed helper to rocm.go (untagged, since native.go and
rocm_stub.go are mutually exclusive by tag but both need it), matching
the same per-package resultError idiom already used elsewhere in go-mlx
(see native_speculative_textmodel.go). Updated 4 test call sites
(backend_test.go x3, rocm_stub_test.go x1) to match.

IMPORT-BOUNDARY TEST (import_boundary_test.go) rebased for pkg/hip's new
home: two hardcoded relative paths to external/go-inference/go corrected
for the new depth (pkg/hip is 3 levels below the worktree root, not 1
level below the old repo root); dappco.re/go/mlx (+ mirrors) dropped from
forbiddenWorkflowRuntimeImports() since pkg/hip's home now legitimately
IS dappco.re/go/mlx.

SURGICAL TEST REMOVALS (functions/fields excised, not whole files, where
the bad dependency was locally contained):
- hip_hardware_test.go: TestNativeModelPackSmokeGemma4E2B_Good (used
  InspectModelPack).
- register_rocm_test.go: TestRegisterRocm_RuntimeLaneBackendRegistration_Good
  + its 2 dedicated helpers — blocked on RuntimeLaneCUDA and the whole
  cuda/cpu "runtime lane" backend feature, undefined anywhere in the
  source (see deferred #3), plus InspectModelPack.
- cache_test.go: TestCacheService_Good_CacheProfileReflectsWarmBlocks.
- kv_cache_test.go: TestKVCache_Good_CacheProfileLabels.
- decode_reference_test.go: 4 functions (HIPAssistantVerifierBinding{Tracks
  QATAffineTensors,SupportsDenseQATAssistant}, AttachedDrafterTextModel
  ReportsReactiveIdentity, PlanAttachedDrafterAcceptsMTPQATAssistantPack).
- scheduler_test.go: TestScheduler_Good_ReportsReactiveModelIdentityAndProfile
  + schedulerFakeTextModel's cacheProfile field and CacheProfile() method.
  All removals confirmed via unused-import cleanup (rocmmodel/modelgemma4
  imports removed once their only call sites were gone).
- native.go: Info()/modelIdentity() now return already-available local
  data directly (m.modelInfo; the identity built from m.modelInfo/
  modelPath/modelType/labels) instead of routing through the missing
  package's ResolveModelInfo — narrowest possible bypass, drops only the
  opaque enrichment/Matched() validation step, does not fabricate its
  logic. LoadModel's safetensors-format fallback branch now returns an
  honest "not available in this quarantine landing" error instead of
  calling the excluded model_pack.go; the GGUF loading path (using the
  kept internal/gguf duplicate) is unaffected.

GATE RECEIPTS
- Gate A (go build ./pkg/hip/... && go vet ./pkg/hip/..., darwin/arm64):
  GREEN. Caveat: this exercises only the ~15-file untagged "CPU lane"
  plus rocm_stub.go (the !linux||!amd64 stub) — everything tagged
  linux&&amd64 (the actual bulk of the engine, ~85 files including
  native.go itself) is excluded from this build by tag and was never
  type-checked here; see Gate B.
- Whole-module `go build ./...` (informational): pkg/hip does not appear
  in the error output. The only 2 failures (cmd/mlx/menubar.go missing
  frontend/dist; pkg/metal cgo header/lib skew) are pre-existing,
  worktree-specific, and unrelated (matches documented project context:
  metal/cgo lanes don't build in worktrees).
- Gate B (CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build ./pkg/hip/...):
  FAILS. Root cause is deferred finding #1 below — 130+ symbols genuinely
  undefined anywhere in the untouched source, not a fencing issue and not
  introduced by this landing.
- Gate C (go test ./pkg/hip/... -count=1): 1 known failure, see deferred
  #4 (pre-existing, third-party, out of scope). pkg/hip/internal/gguf:
  PASS. Everything else that builds on darwin: PASS.
- Gate D (grep -rn "dappco.re/go/rocm" go/pkg/hip/): remaining hits are
  all non-import — explanatory comments, one error-message string, the
  memory_pretraining_package metadata label (never mapped to a real
  package even upstream), and import_boundary_test.go's own intentional
  forbidden-imports list entry. Zero real import statements remain.

DEFERRED / RECONCILE FINDINGS (seed the #262 quarantine work)
1. [BLOCKING, dominant] The default native path (linux && amd64 &&
   !rocm_legacy_server, ~81 files — the actual bulk of the engine) does
   not compile on its own native target, independent of this landing.
   130+ symbols (verified absent via git ls-tree across the full
   untouched source, not just excluded-by-me files) including basic
   utility helpers (firstNonEmptyString, cloneStringMap, mergeStringMaps),
   model-routing/profile types (ROCmLoadConfig, ROCmModelProfile,
   ROCmModelRoutePlan family), sequence-mixer types (SequenceMixerLoadPlan,
   hipSequenceMixerBindings), the gemma4 attached-drafter speculative-
   decode subsystem (~15 symbols), the production quantization ladder
   (ProductionTurboQuant*/ProductionAutoRound*/ProductionLaneQuantBits,
   ~20 symbols), and AdamW training state. Strongly suggests commit
   308c4d6 ("updates", vaguely worded, 88 files/+33955/-4225) captured a
   large in-progress refactor where calling code landed but defining code
   didn't. Real upstream reconciliation work, not fabricable here.
2. [BLOCKING] dappco.re/go/rocm/model + dappco.re/go/rocm/model/gemma4 —
   confirmed absent from the entire source (git ls-tree, all history in
   the shallow clone), likely the same root cause as #1. Used pervasively
   by model_pack.go (21 sites/~10 functions: CacheProfile reporting,
   gemma4 processor/vision/audio config, sequence-mixer helpers,
   model-pack format detection) and 6 test files.
3. [BLOCKING] RuntimeLaneCUDA + the whole cuda/cpu "runtime lane"
   pending-dispatch backend feature — undefined anywhere in the source;
   register_rocm_test.go tested a feature with zero implementation.
4. [out of scope, surfaced not caused] go-mlx's vendored go-inference
   (external/go-inference @ dd04a26): go-inference/go/ai/rag.go imports
   dappco.re/go/rag. Only visible because fixing import_boundary_test.go's
   path made the check actually run (previously silently "file not
   found"). Not pkg/hip's code; not fixed here.
5. [gguf duplicate, kept not eliminated] internal/gguf reconciliation is
   genuinely blocked: inference/gguf has no ReadMetadata, no Metadata
   type (it's a differently-shaped function there instead), no exported
   FileTypeName; its TensorInfo lacks .Dimensions/.ByteSize (has .Shape/
   .Elements instead — a different derivation, not a rename). Landed
   under go/pkg/hip/internal/gguf per the brief's own internal-tooling
   fallback clause rather than force a risky semantic rewrite.
6. [safetensors duplicate, not applicable] No dappco.re/go/rocm/
   safetensors package exists to redirect — the "duplicate" is 131
   references to hand-rolled rocmSafetensors*/rocmTokenizer* types/funcs
   inline in model_pack.go (now excluded per #2 anyway). Nothing to
   rewrite; matches census Tier-4 item 3 as genuine future work.

No push. No changes to go-mlx main tree (only this worktree). go.work
restored byte-identical after using the documented external/go-ml
absolute-path fallback to run the gates (that submodule isn't checked
out in this worktree).

Co-Authored-By: Virgil <virgil@lethean.io>
…try)

Gate A (darwin build/vet/test, CPU lane) + Gate D (no stray imports) green
in the worktree. Gate B (linux/amd64 cross-compile) blocked by 130+ symbols
missing from the SOURCE repo itself at 308c4d6 — documented in 69726a8's
body, the dominant #262 reconcile input. Real HIP validation happens on the
linux+AMD box.

Co-Authored-By: Virgil <virgil@lethean.io>
pkg/hip is preserved verbatim (Tier-4 quarantine): its benchmark flushes
the heap before writing an allocs pprof profile, and coupling it to
pkg/metal's mlx.GC wrapper would give the quarantine a dependency on the
dying cgo oracle. Exact-match still fails on any new direct call site.

Co-Authored-By: Virgil <virgil@lethean.io>
The combined workflow+ROCm list was go-rocm-era policy from when
go-inference was a thin contract surface — its ai/ package now
legitimately builds on dappco.re/go/rag (go-ai's absorbed role), and its
internal layering is guarded by its own suite. The walk now enforces the
boundary this consumer actually owns: go-inference never imports an
engine (go-rocm spellings, plus go-mlx added — a lib never imports its
consumers).

Co-Authored-By: Virgil <virgil@lethean.io>
…e.Result go-inference

pkg/hip was censused from go-rocm at 308c4d6, WHEN go-rocm's engine packages
(go/model/**) and much of the root package were untracked. `GOOS=linux
GOARCH=amd64 go build ./pkg/hip/...` (Gate B) failed at landing with 163
undefined symbols. Those packages are now committed at go-rocm dev a2f0380.

Reseed (matching the census transform: package rocm->hip; self-imports
dappco.re/go/rocm/... -> dappco.re/go/mlx/pkg/hip/...; build tags verbatim):
- 126 root package files (go-rocm go/*.go) -> pkg/hip/*.go
- subpackages -> pkg/hip/: model, model/architecture, model/gemma4,
  model/builtin, profile, memorypretrain, scheme, internal/registry,
  internal/llamacpp (internal/gguf already present)
- removed hip_shared_helpers.go: its firstPositiveInt/rocmLabelUint were a
  landing shim relocated from model_pack.go (excluded then for depending on the
  missing model package); model_pack.go now returns and re-homes them

Behind the 163 undefined symbols lay a go-inference interface skew: go-rocm's
engine targets the v0.10.0 tuple-return TextModel/Backend, while go-mlx pins
go-inference@dev (580d183) with the core.Result universal-type migration.
Per "use modern core go", the engine is adapted to core.Result rather than the
dependency pinned back:
- TextModel implementors (rocmModel, attachedDrafterTextModel, ScheduledModel):
  Classify/BatchGenerate/Close/Err -> core.Result. Non-trivial bodies kept as
  private tuple helpers wrapped via core.ResultOf (the census's own LoadModel
  pattern); internal callers routed to the helpers.
- Backend implementor runtimeLaneBackend.LoadModel -> core.Result.
- Result-unwrap at internal call sites (decode_helpers, simple_self_distillation,
  portable_contract_stub).

Build tags: go-rocm leaves many engine files untagged (harmless — its root
never builds on darwin); pkg/hip must. Convergence tagged the 3 GPU-touching
files (lora_fuse, model_slice, tuning) linux&&amd64&&!rocm_legacy_server; the
model-routing/profile/registry layer stays cross-platform, resolving against
the darwin-portable stub half (portable_contract_stub.go + *_portable/*_stub).

Excluded (go-inference boundary, matching the census): compat_handlers.go and
openai.go — the OpenAI/Anthropic/Ollama HTTP wire-compat surface — need
go-inference's anthropic/ollama/openai subpackages, absent from go-mlx's
pinned go-inference. Left out; coverage_contract_test.go still references them
(a pre-existing linux-test-compile gap, behind no gate here).

Gates (go.work fallback -> main-checkout external/ abs paths; restored
byte-identical): GOOS=linux GOARCH=amd64 go build ./pkg/hip/... exit 0;
darwin go build + go test ./pkg/hip/... green; go vet ./pkg/hip/... clean.
HIP execution validates on linux+AMD; this proves the cross-compile.

Co-Authored-By: Virgil <virgil@lethean.io>
…p; linux cross-compile green

Brings go-rocm@a2f0380's now-committed model/ (+ gemma4/architecture/
builtin), profile/, memorypretrain/, scheme/, internal/ into pkg/hip
(package rocm->hip, imports rebased). Resolves the 163 undefined symbols
that failed Gate B at landing — GOOS=linux GOARCH=amd64 go build
./pkg/hip/... now exits 0. Engine migrated forward to go-inference dev's
core.Result API (rocmModel/attachedDrafterTextModel/ScheduledModel/
runtimeLaneBackend) rather than pinning the dependency back.

Known follow-up (pre-existing, not introduced): the linux TEST binary
won't compile — coverage_contract_test.go references the excluded
OpenAI/Anthropic/Ollama compat surface (needs go-inference compat
subpackages absent from the pinned go-inference).

Co-Authored-By: Virgil <virgil@lethean.io>
…opic,ollama,openai}

The reseed excluded compat_handlers.go + openai.go believing the compat
packages were absent — they'd merely MOVED under provider/ in go-inference
(dappco.re/go/inference/openai -> .../provider/openai; package names
unchanged). Bring both in (package rocm->hip) with imports repointed to
provider/*, and repoint coverage_contract_test.go's three imports too.
Migrate the five model.Err() checks to the core.Result idiom
(if r := model.Err(); !r.OK { r.Value.(error) }) — Err() now returns
core.Result on the dev pin.

darwin build/vet/test green (34 pkgs); linux cross-compile stays green.
Linux TEST binary now blocks only on fakeNativeModel (native_contract_test.go,
the censused 5.7k-line suite still excluded) — the remaining #262 piece.

Co-Authored-By: Virgil <virgil@lethean.io>
…m helper

Ports native_contract_test.go, gemma4_unified_model_pack_test.go,
gemma4_engine_features_test.go, adamw_state_test.go, and
production_mtp_test.go from go-rocm (package rocm -> hip, import paths
rebased to dappco.re/go/mlx/pkg/hip/...) — the censused test suite was
missing more support files than the single one originally scoped
(coverage_contract_test.go, hip_small_decode_test.go, etc. reference
fakeNativeModel/linkedGemma4TestLabels/assertAdamWFloat32Near/
writeGemma4ModelPackGGUF/productionMTP* symbols these files define).

Each ported file also gets its own call-site migration to the
core.Result idiom (LoadModel/Classify/BatchGenerate/Err/Close), since
it can't compile standalone otherwise.

Adds result_helpers_test.go: a generic resultValue[T] test helper
(built on core.Cast[T] + the existing resultError from rocm.go) that
unwraps core.Result back into the (value, error) shape every migrated
call site was written against — keeps the migration to a pure
call-site wrapper with zero changes to surrounding test assertions.
Brings the pre-existing censused linux test files onto the migrated
LoadModel/Classify/BatchGenerate/Err/Close signatures (return
core.Result instead of the old (T, error) tuple / bare error):
backend_example_test.go, model_example_test.go, model_test.go,
scheduler_test.go, hip_hardware_test.go, inference_benchmark_test.go,
decode_reference_test.go, coverage_contract_test.go, cache_test.go,
state_session_test.go, parser_registry_test.go, hip_small_decode_test.go.

Call sites only — every test's assertions/intent are unchanged, just
how the value/error is extracted from the return:
  x, err := m.LoadModel(...)        -> x, err := resultValue[T](m.LoadModel(...))
  if err := m.Err(); err != nil     -> if err := resultError(m.Err()); err != nil

Also fixes a silent (non-compiling-error) trap found while auditing:
core.Result satisfies Go's error interface via its own Error() method,
so `core.RequireNoError(t, m.Close())` or `core.AssertError(t, m.Err())`
compiles fine post-migration but is always wrong (a non-nil Result is
never == nil regardless of .OK). Every such site — not just the ones
the compiler flagged — is swept and wrapped with resultError().

Two fake TextModel implementations (schedulerFakeTextModel,
coverageFailingTextModel) and four decode-test fakes
(minimalDecodeTextModel, decodeIdentityReporterModel,
decodeProfileReporterModel, benchmarkDecodeTextModel) get their
Classify/BatchGenerate/Err/Close methods migrated to return
core.Result so they still satisfy inference.TextModel.
Brings in 5 excluded linux-tagged test-support files (native_contract_test
+ gemma4_unified_model_pack / gemma4_engine_features / adamw_state /
production_mtp — they define fakeNativeModel et al. that censused tests
reference) and migrates ~114 stale tuple-signature call sites to
core.Result via a test-only resultValue[T] unwrapper (call sites read
identically to pre-migration).

Also swept the silent-bug class: core.Result satisfies error (has
Error()), so AssertError(t, model.Err()) would compile but pass
regardless of .OK — every such site now routes through resultError().
Verified: zero raw AssertError/RequireError on Err()/Close() remain; only
rocmModel/ScheduledModel/attachedDrafterTextModel return core.Result and
all their assertions unwrap; raw AssertNoError sites are plain-error
internals (hipLoadedModel/BlockCacheService/kernel buffers).

GOOS=linux go vet ./pkg/hip/... clean; darwin build/vet + 34 tests green.

Co-Authored-By: Virgil <virgil@lethean.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants