Skip to content

fix(capabilities): accept MLA architectures when full geometry is exposed#137

Open
mvkorobkov wants to merge 1 commit into
chrishayuk:mainfrom
mvkorobkov:fix/capabilities-accept-mla-with-geometry
Open

fix(capabilities): accept MLA architectures when full geometry is exposed#137
mvkorobkov wants to merge 1 commit into
chrishayuk:mainfrom
mvkorobkov:fix/capabilities-accept-mla-with-geometry

Conversation

@mvkorobkov
Copy link
Copy Markdown

Summary

PR #96 added MLA absorption (mla_absorb::absorb) to the f32 weight writer, but the entry-point gate (ensure_standard_attention_supported) was never updated to let MLA architectures through. Result: the absorption path is unreachable from larql extract --level inference/attention/all — the CLI refuses MLA up front.

Reproduces with any DeepSeek-V2/V3/Kimi-K2 GGUF:
```
$ larql extract --level inference DeepSeek-V2-Lite-Chat.Q4_K.gguf
Error: unsupported architecture 'deepseek' for extract pipeline:
multi-head latent attention (MLA) is not implemented
```

Fix

Narrow the gate to keep rejecting only MLA archs whose pre-absorption geometry fields are missing (where there is no defensible split for qk_head_dim). Complete-geometry MLA archs pass through, and write_f32 runs the absorption code PR #96 already shipped.

```rust
// after this PR:
if arch.uses_mla() {
let has_geom = arch.mla_qk_nope_head_dim().is_some()
&& arch.mla_qk_rope_head_dim().is_some()
&& arch.mla_v_head_dim().is_some();
if !has_geom { return Err(...); }
}
```

End-to-end verification

Built locally on top of #135 (which surfaces the MLA fields from GGUF metadata), extracted DeepSeek-V2-Lite-Chat Q4_K (10.4 GB single-file GGUF, 27 layers, kv_lora=512, qk_nope=128, qk_rope=64, v_head=128) → 1.13 GB inference-level vindex. index.json confirms extract_level: \"inference\", family: \"deepseek\", model_type: \"deepseek_v2\". attn_weights.bin is 216 MB (post-absorption standard QKVO).

Tests

  • mla_with_full_geometry_is_accepted_so_absorption_can_runensure_standard_attention_supported accepts complete-geometry MLA
  • extract_level_inference_accepts_mla_with_full_geometry — drives the CLI-facing gate at both Inference and All levels
  • All existing rejection tests stay green: incomplete-geometry MLA (the existing mla_arch() fixture has none of qk_nope/qk_rope/v_head) remains rejected. 9/9 pass.

Stacking

This is the third in a small stack (#135#136 → this) that together restore the full DeepSeek-V2/V3/Kimi-K2 extraction pipeline. Each PR is logically independent — any merge order works — but the chain only completes end-to-end when all three land.

…osed

PR chrishayuk#96 wired MLA absorption (`mla_absorb::absorb`) into the f32 weight
writer: when an architecture reports `uses_mla() == true` AND exposes
all three of `mla_qk_nope_head_dim` / `mla_qk_rope_head_dim` /
`mla_v_head_dim`, the writer fuses the four low-rank tensors
(q_a / q_b / kv_a / kv_b) into standard dense Q/K/V/O at write time.
The on-disk manifest after that is a standard Q/K/V/O vindex —
exactly what `ensure_standard_attention_supported` is gating against.

But the gate still hard-rejected every `uses_mla()` arch, so the
absorption path was unreachable from `larql extract --level
inference/attention/all` for any DeepSeek-V2/V3/Kimi-K2 model. The CLI
failed before the writer could even try:

  $ larql extract --level inference DS-V2-Lite-Chat.Q4_K.gguf
  Error: unsupported architecture 'deepseek' for extract pipeline:
  multi-head latent attention (MLA) is not implemented

This commit narrows the gate to keep rejecting *only* MLA archs whose
geometry fields are missing (where absorption can't safely guess a
qk_head_dim split). Complete-geometry MLA archs pass through, and
`write_f32` runs the absorption path that PR chrishayuk#96 already shipped.

End-to-end verification: built locally on top of chrishayuk#135 (which surfaces
the MLA fields from GGUF metadata), extracted DeepSeek-V2-Lite-Chat
Q4_K (10.4 GB, 27 layers, kv_lora=512, qk_nope=128, qk_rope=64,
v_head=128) → 1.13 GB inference-level vindex with `attn_weights.bin`
sized at 216 MB (post-absorption standard QKVO).

Tests:
- `mla_with_full_geometry_is_accepted_so_absorption_can_run` — proves
  the lower-level `ensure_standard_attention_supported` accepts the
  complete-geometry case.
- `extract_level_inference_accepts_mla_with_full_geometry` — drives
  the CLI-facing gate at Inference AND All levels.
- All existing rejection tests still pass: incomplete-geometry MLA
  (the existing `mla_arch()` fixture has none of qk_nope/qk_rope/v_head)
  remains rejected at every level above Browse. 9/9 pass.

Combined with chrishayuk#133 / chrishayuk#135 / chrishayuk#136 this completes `larql extract` for
DeepSeek-V2 family GGUFs (and clears the last gate for Kimi K2 once
chrishayuk#135 + chrishayuk#136 merge).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant