Skip to content

fix(gguf): fall back to expert_feed_forward_length for MoE-only configs#138

Open
mvkorobkov wants to merge 1 commit into
chrishayuk:mainfrom
mvkorobkov:fix/gguf-intermediate-size-moe-fallback
Open

fix(gguf): fall back to expert_feed_forward_length for MoE-only configs#138
mvkorobkov wants to merge 1 commit into
chrishayuk:mainfrom
mvkorobkov:fix/gguf-intermediate-size-moe-fallback

Conversation

@mvkorobkov
Copy link
Copy Markdown

Summary

DeepSeek-V4 family (and other MoE-only GGUFs) emit only `{arch}.expert_feed_forward_length` and never set the global `{arch}.feed_forward_length` — there's no dense FFN layer above the per-expert size. The current loader reads only the global key, so `intermediate_size` comes back as 0 and config validation rejects the model:

```
$ larql extract --level browse \
DeepSeek-V4-Flash-Q3_K_M-00001-of-00003.gguf
Error: failed to load GGUF model: config validation failed:
[ConfigValidationError { field: "intermediate_size",
message: "must be greater than 0" }]
```

Fix

In every llama.cpp-supported MoE architecture the per-expert FFN inner dim matches what a non-MoE variant would call `intermediate_size` (DS-V2-Lite-Chat happens to emit both at the same value when MoE is active). Fall back to the per-expert key only when the global key is absent.

```rust
let intermediate_size = {
let global = get_arch_u32(GGUF_FEED_FORWARD_LENGTH);
if global > 0 { global } else { get_arch_u32(GGUF_EXPERT_FEED_FORWARD_LENGTH) }
};
```

Tests

  • `test_gguf_to_config_json_falls_back_to_expert_feed_forward_length_on_moe`: synthesises DS-V4-Flash-shaped metadata (no `feed_forward_length`, `expert_feed_forward_length: 2048`); verifies `intermediate_size: 2048` and that the validated detection path now accepts the config.
  • `test_gguf_to_config_json_prefers_global_feed_forward_length_when_both_present`: when both keys are emitted, global still wins — no behaviour change for non-MoE-only models.

281/281 larql-models tests pass.

Context

Part of the DeepSeek-V2/V3/Kimi-K2/V4 GGUF chain I've been working through: #133 (merged) for GGUF input, #135 for MLA metadata, #136 for multi-shard splits, #137 for the MLA capabilities gate. This is the last "small" piece — the only remaining blocker after this is the per-tensor streaming GGUF reader needed to fit 500 GB+ models in available RAM, which is a separate larger PR.

DeepSeek-V4 family (and other MoE-only GGUFs) emit only
`{arch}.expert_feed_forward_length` — they never set the global
`{arch}.feed_forward_length` because there is no dense FFN layer
above the per-expert size. The current loader reads only the global
key, so `intermediate_size` came back as `0` and config validation
rejected the model with:

    Error: failed to load GGUF model: config validation failed:
    [ConfigValidationError { field: "intermediate_size",
     message: "must be greater than 0" }]

repro: `larql extract --level browse \
            DeepSeek-V4-Flash-Q3_K_M-00001-of-00003.gguf`

The HF config exposes `intermediate_size` as a single number, and
in every llama.cpp-supported architecture the per-expert FFN inner
dim matches what a non-MoE variant would call `intermediate_size`
(DS-V2-Lite-Chat happens to emit both at the same value when MoE
is active). Falling back to the per-expert key on absence of the
global key is the correct semantic.

Tests:
- `test_gguf_to_config_json_falls_back_to_expert_feed_forward_length_on_moe`:
  synthesises DS-V4-Flash-shaped metadata (no `feed_forward_length`,
  `expert_feed_forward_length: 2048`); verifies `intermediate_size: 2048`
  and that the validated detection path now accepts the config.
- `test_gguf_to_config_json_prefers_global_feed_forward_length_when_both_present`:
  when both keys are emitted (some hybrid configs do), the global key
  still wins — no behaviour change for non-MoE-only models.

281/281 larql-models tests pass.

This unblocks DeepSeek-V4-Flash extraction. Combined with chrishayuk#135 / chrishayuk#136 /
chrishayuk#137, the only remaining piece for full Kimi-K2/DS-V4-Flash inference
extraction is the per-tensor streaming GGUF reader (still in the
in-memory load path; tracks separately).
mvkorobkov pushed a commit to mvkorobkov/larql that referenced this pull request May 24, 2026
DeepSeek-V4 family emits only `{arch}.expert_feed_forward_length` —
never the global `{arch}.feed_forward_length` — because no dense FFN
layer exists above the per-expert size. The current loader reads only
the global key, so `intermediate_size` came back as `0` and config
validation rejected:

  Error: failed to load GGUF model: config validation failed:
  [ConfigValidationError { field: "intermediate_size",
   message: "must be greater than 0" }]

This is the same fix as upstream PR chrishayuk#138, applied directly to this
branch so DS-V4-Flash can flow through the streaming-GGUF path. (chrishayuk#138
will land independently; this commit is no-op once it merges.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant