fix(gguf): fall back to expert_feed_forward_length for MoE-only configs by mvkorobkov · Pull Request #138 · chrishayuk/larql

mvkorobkov · 2026-05-24T12:19:31Z

Summary

DeepSeek-V4 family (and other MoE-only GGUFs) emit only `{arch}.expert_feed_forward_length` and never set the global `{arch}.feed_forward_length` — there's no dense FFN layer above the per-expert size. The current loader reads only the global key, so `intermediate_size` comes back as 0 and config validation rejects the model:

```
$ larql extract --level browse \
DeepSeek-V4-Flash-Q3_K_M-00001-of-00003.gguf
Error: failed to load GGUF model: config validation failed:
[ConfigValidationError { field: "intermediate_size",
message: "must be greater than 0" }]
```

Fix

In every llama.cpp-supported MoE architecture the per-expert FFN inner dim matches what a non-MoE variant would call `intermediate_size` (DS-V2-Lite-Chat happens to emit both at the same value when MoE is active). Fall back to the per-expert key only when the global key is absent.

```rust
let intermediate_size = {
let global = get_arch_u32(GGUF_FEED_FORWARD_LENGTH);
if global > 0 { global } else { get_arch_u32(GGUF_EXPERT_FEED_FORWARD_LENGTH) }
};
```

Tests

`test_gguf_to_config_json_falls_back_to_expert_feed_forward_length_on_moe`: synthesises DS-V4-Flash-shaped metadata (no `feed_forward_length`, `expert_feed_forward_length: 2048`); verifies `intermediate_size: 2048` and that the validated detection path now accepts the config.
`test_gguf_to_config_json_prefers_global_feed_forward_length_when_both_present`: when both keys are emitted, global still wins — no behaviour change for non-MoE-only models.

281/281 larql-models tests pass.

Context

Part of the DeepSeek-V2/V3/Kimi-K2/V4 GGUF chain I've been working through: #133 (merged) for GGUF input, #135 for MLA metadata, #136 for multi-shard splits, #137 for the MLA capabilities gate. This is the last "small" piece — the only remaining blocker after this is the per-tensor streaming GGUF reader needed to fit 500 GB+ models in available RAM, which is a separate larger PR.

DeepSeek-V4 family (and other MoE-only GGUFs) emit only `{arch}.expert_feed_forward_length` — they never set the global `{arch}.feed_forward_length` because there is no dense FFN layer above the per-expert size. The current loader reads only the global key, so `intermediate_size` came back as `0` and config validation rejected the model with: Error: failed to load GGUF model: config validation failed: [ConfigValidationError { field: "intermediate_size", message: "must be greater than 0" }] repro: `larql extract --level browse \ DeepSeek-V4-Flash-Q3_K_M-00001-of-00003.gguf` The HF config exposes `intermediate_size` as a single number, and in every llama.cpp-supported architecture the per-expert FFN inner dim matches what a non-MoE variant would call `intermediate_size` (DS-V2-Lite-Chat happens to emit both at the same value when MoE is active). Falling back to the per-expert key on absence of the global key is the correct semantic. Tests: - `test_gguf_to_config_json_falls_back_to_expert_feed_forward_length_on_moe`: synthesises DS-V4-Flash-shaped metadata (no `feed_forward_length`, `expert_feed_forward_length: 2048`); verifies `intermediate_size: 2048` and that the validated detection path now accepts the config. - `test_gguf_to_config_json_prefers_global_feed_forward_length_when_both_present`: when both keys are emitted (some hybrid configs do), the global key still wins — no behaviour change for non-MoE-only models. 281/281 larql-models tests pass. This unblocks DeepSeek-V4-Flash extraction. Combined with chrishayuk#135 / chrishayuk#136 / chrishayuk#137, the only remaining piece for full Kimi-K2/DS-V4-Flash inference extraction is the per-tensor streaming GGUF reader (still in the in-memory load path; tracks separately).

DeepSeek-V4 family emits only `{arch}.expert_feed_forward_length` — never the global `{arch}.feed_forward_length` — because no dense FFN layer exists above the per-expert size. The current loader reads only the global key, so `intermediate_size` came back as `0` and config validation rejected: Error: failed to load GGUF model: config validation failed: [ConfigValidationError { field: "intermediate_size", message: "must be greater than 0" }] This is the same fix as upstream PR chrishayuk#138, applied directly to this branch so DS-V4-Flash can flow through the streaming-GGUF path. (chrishayuk#138 will land independently; this commit is no-op once it merges.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gguf): fall back to expert_feed_forward_length for MoE-only configs#138

fix(gguf): fall back to expert_feed_forward_length for MoE-only configs#138
mvkorobkov wants to merge 1 commit into
chrishayuk:mainfrom
mvkorobkov:fix/gguf-intermediate-size-moe-fallback

mvkorobkov commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvkorobkov commented May 24, 2026

Summary

Fix

Tests

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant