fix(capabilities): accept MLA architectures when full geometry is exposed by mvkorobkov · Pull Request #137 · chrishayuk/larql

mvkorobkov · 2026-05-24T12:00:47Z

Summary

PR #96 added MLA absorption (mla_absorb::absorb) to the f32 weight writer, but the entry-point gate (ensure_standard_attention_supported) was never updated to let MLA architectures through. Result: the absorption path is unreachable from larql extract --level inference/attention/all — the CLI refuses MLA up front.

Reproduces with any DeepSeek-V2/V3/Kimi-K2 GGUF:
```
$ larql extract --level inference DeepSeek-V2-Lite-Chat.Q4_K.gguf
Error: unsupported architecture 'deepseek' for extract pipeline:
multi-head latent attention (MLA) is not implemented
```

Fix

Narrow the gate to keep rejecting only MLA archs whose pre-absorption geometry fields are missing (where there is no defensible split for qk_head_dim). Complete-geometry MLA archs pass through, and write_f32 runs the absorption code PR #96 already shipped.

```rust
// after this PR:
if arch.uses_mla() {
let has_geom = arch.mla_qk_nope_head_dim().is_some()
&& arch.mla_qk_rope_head_dim().is_some()
&& arch.mla_v_head_dim().is_some();
if !has_geom { return Err(...); }
}
```

End-to-end verification

Built locally on top of #135 (which surfaces the MLA fields from GGUF metadata), extracted DeepSeek-V2-Lite-Chat Q4_K (10.4 GB single-file GGUF, 27 layers, kv_lora=512, qk_nope=128, qk_rope=64, v_head=128) → 1.13 GB inference-level vindex. index.json confirms extract_level: \"inference\", family: \"deepseek\", model_type: \"deepseek_v2\". attn_weights.bin is 216 MB (post-absorption standard QKVO).

Tests

mla_with_full_geometry_is_accepted_so_absorption_can_run — ensure_standard_attention_supported accepts complete-geometry MLA
extract_level_inference_accepts_mla_with_full_geometry — drives the CLI-facing gate at both Inference and All levels
All existing rejection tests stay green: incomplete-geometry MLA (the existing mla_arch() fixture has none of qk_nope/qk_rope/v_head) remains rejected. 9/9 pass.

Stacking

This is the third in a small stack (#135 → #136 → this) that together restore the full DeepSeek-V2/V3/Kimi-K2 extraction pipeline. Each PR is logically independent — any merge order works — but the chain only completes end-to-end when all three land.

…osed PR chrishayuk#96 wired MLA absorption (`mla_absorb::absorb`) into the f32 weight writer: when an architecture reports `uses_mla() == true` AND exposes all three of `mla_qk_nope_head_dim` / `mla_qk_rope_head_dim` / `mla_v_head_dim`, the writer fuses the four low-rank tensors (q_a / q_b / kv_a / kv_b) into standard dense Q/K/V/O at write time. The on-disk manifest after that is a standard Q/K/V/O vindex — exactly what `ensure_standard_attention_supported` is gating against. But the gate still hard-rejected every `uses_mla()` arch, so the absorption path was unreachable from `larql extract --level inference/attention/all` for any DeepSeek-V2/V3/Kimi-K2 model. The CLI failed before the writer could even try: $ larql extract --level inference DS-V2-Lite-Chat.Q4_K.gguf Error: unsupported architecture 'deepseek' for extract pipeline: multi-head latent attention (MLA) is not implemented This commit narrows the gate to keep rejecting *only* MLA archs whose geometry fields are missing (where absorption can't safely guess a qk_head_dim split). Complete-geometry MLA archs pass through, and `write_f32` runs the absorption path that PR chrishayuk#96 already shipped. End-to-end verification: built locally on top of chrishayuk#135 (which surfaces the MLA fields from GGUF metadata), extracted DeepSeek-V2-Lite-Chat Q4_K (10.4 GB, 27 layers, kv_lora=512, qk_nope=128, qk_rope=64, v_head=128) → 1.13 GB inference-level vindex with `attn_weights.bin` sized at 216 MB (post-absorption standard QKVO). Tests: - `mla_with_full_geometry_is_accepted_so_absorption_can_run` — proves the lower-level `ensure_standard_attention_supported` accepts the complete-geometry case. - `extract_level_inference_accepts_mla_with_full_geometry` — drives the CLI-facing gate at Inference AND All levels. - All existing rejection tests still pass: incomplete-geometry MLA (the existing `mla_arch()` fixture has none of qk_nope/qk_rope/v_head) remains rejected at every level above Browse. 9/9 pass. Combined with chrishayuk#133 / chrishayuk#135 / chrishayuk#136 this completes `larql extract` for DeepSeek-V2 family GGUFs (and clears the last gate for Kimi K2 once chrishayuk#135 + chrishayuk#136 merge).

mvkorobkov mentioned this pull request May 24, 2026

fix(gguf): fall back to expert_feed_forward_length for MoE-only configs #138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(capabilities): accept MLA architectures when full geometry is exposed#137

fix(capabilities): accept MLA architectures when full geometry is exposed#137
mvkorobkov wants to merge 1 commit into
chrishayuk:mainfrom
mvkorobkov:fix/capabilities-accept-mla-with-geometry

mvkorobkov commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvkorobkov commented May 24, 2026

Summary

Fix

End-to-end verification

Tests

Stacking

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant