feat(model): mark speculative draft heads (DFlash/MTP) as non-standalone by rjckkkkk · Pull Request #82 · Approaching-AI/AIMA

rjckkkkk · 2026-06-09T06:28:36Z

Problem

The model scanner is filesystem-first: every weight artifact on disk becomes a separate, independently-deployable model card. Two consequences surfaced while onboarding models on a Strix Halo box:

Speculative draft heads get a deploy button. Qwen3.6-35B-A3B-DFlash (safetensors, ~0.9G) and Qwen3.6-35B-A3B-DFlash-Q4_K_M (gguf, ~0.3M) showed up as standalone models you could "部署". A DFlash/MTP head is a speculative-decoding companion of its parent — it cannot run on its own.
(Not in this PR) One logical model appears as many cards — qwen3.6-35b-a3b was scanned as 8 entries (safetensors + bf16/bf16-unfused gguf + q4_k_m/ud-q4_k_m quants + 2 DFlash copies). See Follow-up below.

Fix (scope: #1, the clear correctness bug)

The catalog already names each draft via its parent variant's speculative_config.model (e.g. qwen3.6-35b-a3b.yaml → model: /models/Qwen3.6-35B-A3B-DFlash). So detection is fully knowledge-driven — no new per-model YAML, no hardcoded names (INV-1/2):

knowledge.NormalizeModelKey — lowercases a name and strips quant/precision/layout suffixes (q4_k_m, bf16, ud, unfused, …) while keeping role tokens like dflash, so every on-disk artifact of one logical draft shares a key and stays distinct from the parent. (glm-4.7-flash is left intact — flash is identity, not a quant.)
Catalog.SpeculativeDraftModelKeys — harvests speculative_config.model across all variants into a set of normalized draft keys.
annotateModelsFromCatalog — a scanned model whose normalized name is a draft key gets standalone_deploy=false + ui.role=draft (only when not already set). The embedded UI already hides the deploy button when standalone_deploy=false.

Parent models and quant variants are untouched; DB rows and deploy-by-name paths are unchanged (annotation is applied at model.list time only).

Tests

NormalizeModelKey table cases (incl. the glm-4.7-flash negative case).
SpeculativeDraftModelKeys harvest + nil-catalog guard.
annotateModelsFromCatalog wiring: drafts → non-standalone draft, parent stays deployable.

go test ./..., go build ./..., go vet, gofmt all clean.

Follow-up (separate PR)

Variant grouping for #2 — fold bf16 / bf16-unfused / q4_k_m / ud-q4_k_m of one model into a single card with a variant/quant selector (needs a model.list grouping shape + a UI change; dropping distinct quants outright would be wrong since they're genuinely different deployables). NormalizeModelKey introduced here is the intended base-key primitive for that work.

AIMA's model scanner lists every weight artifact on disk as an independently deployable model. Speculative draft heads (DFlash / MTP) only make sense paired with their parent model for speculative decoding, yet they showed up as standalone models with a deploy button (e.g. Qwen3.6-35B-A3B-DFlash, Qwen3.6-35B-A3B-DFlash-Q4_K_M). The catalog already names each draft via its parent variant's speculative_config.model, so detection needs no new per-model YAML: - knowledge.NormalizeModelKey: lowercases a model name and strips quantization/precision/layout suffixes (q4_k_m, bf16, ud, unfused, ...) while keeping role tokens like "dflash", so all on-disk artifacts of one logical draft share a key and stay distinct from the parent. - Catalog.SpeculativeDraftModelKeys: harvests speculative_config.model across all variants into a set of normalized draft keys. - annotateModelsFromCatalog: a scanned model whose normalized name is a draft key gets standalone_deploy=false + ui.role=draft (only when not already set); the embedded UI already hides the deploy button on standalone_deploy=false. Knowledge-driven (INV-1/2): no hardcoded model names, derived entirely from the catalog. Parent models and quantization variants are unaffected, and DB rows / deploy-by-name paths are untouched (annotation is applied at model.list time only). Tests: NormalizeModelKey table cases (incl. "glm-4.7-flash" not stripped), SpeculativeDraftModelKeys harvest + nil-catalog, and the annotate wiring (drafts -> non-standalone draft, parent stays deployable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(model): mark speculative draft heads (DFlash/MTP) as non-standalone#82

feat(model): mark speculative draft heads (DFlash/MTP) as non-standalone#82
rjckkkkk wants to merge 1 commit into
developfrom
feat/speculative-draft-detection

rjckkkkk commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rjckkkkk commented Jun 9, 2026

Problem

Fix (scope: #1, the clear correctness bug)

Tests

Follow-up (separate PR)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant