fix(gguf): fall back to expert_feed_forward_length for MoE-only configs#138
Open
mvkorobkov wants to merge 1 commit into
Open
fix(gguf): fall back to expert_feed_forward_length for MoE-only configs#138mvkorobkov wants to merge 1 commit into
mvkorobkov wants to merge 1 commit into
Conversation
DeepSeek-V4 family (and other MoE-only GGUFs) emit only
`{arch}.expert_feed_forward_length` — they never set the global
`{arch}.feed_forward_length` because there is no dense FFN layer
above the per-expert size. The current loader reads only the global
key, so `intermediate_size` came back as `0` and config validation
rejected the model with:
Error: failed to load GGUF model: config validation failed:
[ConfigValidationError { field: "intermediate_size",
message: "must be greater than 0" }]
repro: `larql extract --level browse \
DeepSeek-V4-Flash-Q3_K_M-00001-of-00003.gguf`
The HF config exposes `intermediate_size` as a single number, and
in every llama.cpp-supported architecture the per-expert FFN inner
dim matches what a non-MoE variant would call `intermediate_size`
(DS-V2-Lite-Chat happens to emit both at the same value when MoE
is active). Falling back to the per-expert key on absence of the
global key is the correct semantic.
Tests:
- `test_gguf_to_config_json_falls_back_to_expert_feed_forward_length_on_moe`:
synthesises DS-V4-Flash-shaped metadata (no `feed_forward_length`,
`expert_feed_forward_length: 2048`); verifies `intermediate_size: 2048`
and that the validated detection path now accepts the config.
- `test_gguf_to_config_json_prefers_global_feed_forward_length_when_both_present`:
when both keys are emitted (some hybrid configs do), the global key
still wins — no behaviour change for non-MoE-only models.
281/281 larql-models tests pass.
This unblocks DeepSeek-V4-Flash extraction. Combined with chrishayuk#135 / chrishayuk#136 /
chrishayuk#137, the only remaining piece for full Kimi-K2/DS-V4-Flash inference
extraction is the per-tensor streaming GGUF reader (still in the
in-memory load path; tracks separately).
mvkorobkov
pushed a commit
to mvkorobkov/larql
that referenced
this pull request
May 24, 2026
DeepSeek-V4 family emits only `{arch}.expert_feed_forward_length` —
never the global `{arch}.feed_forward_length` — because no dense FFN
layer exists above the per-expert size. The current loader reads only
the global key, so `intermediate_size` came back as `0` and config
validation rejected:
Error: failed to load GGUF model: config validation failed:
[ConfigValidationError { field: "intermediate_size",
message: "must be greater than 0" }]
This is the same fix as upstream PR chrishayuk#138, applied directly to this
branch so DS-V4-Flash can flow through the streaming-GGUF path. (chrishayuk#138
will land independently; this commit is no-op once it merges.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DeepSeek-V4 family (and other MoE-only GGUFs) emit only `{arch}.expert_feed_forward_length` and never set the global `{arch}.feed_forward_length` — there's no dense FFN layer above the per-expert size. The current loader reads only the global key, so `intermediate_size` comes back as 0 and config validation rejects the model:
```
$ larql extract --level browse \
DeepSeek-V4-Flash-Q3_K_M-00001-of-00003.gguf
Error: failed to load GGUF model: config validation failed:
[ConfigValidationError { field: "intermediate_size",
message: "must be greater than 0" }]
```
Fix
In every llama.cpp-supported MoE architecture the per-expert FFN inner dim matches what a non-MoE variant would call `intermediate_size` (DS-V2-Lite-Chat happens to emit both at the same value when MoE is active). Fall back to the per-expert key only when the global key is absent.
```rust
let intermediate_size = {
let global = get_arch_u32(GGUF_FEED_FORWARD_LENGTH);
if global > 0 { global } else { get_arch_u32(GGUF_EXPERT_FEED_FORWARD_LENGTH) }
};
```
Tests
281/281 larql-models tests pass.
Context
Part of the DeepSeek-V2/V3/Kimi-K2/V4 GGUF chain I've been working through: #133 (merged) for GGUF input, #135 for MLA metadata, #136 for multi-shard splits, #137 for the MLA capabilities gate. This is the last "small" piece — the only remaining blocker after this is the per-tensor streaming GGUF reader needed to fit 500 GB+ models in available RAM, which is a separate larger PR.