Skip to content

Eval bug: dflash: target and drafter vocab are incompatible; DFlash cannot retokenize draft outputs #51

@ovadmani-sudo

Description

@ovadmani-sudo

Name and Version

ggml_cuda_init: found 1 ROCm devices (Total VRAM: 118784 MiB):
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 118784 MiB
version: 9459 (07ac3ce)
built with Clang 21.1.8 for Linux x86_64

Thank you,
ovadmani

Operating systems

Linux

GGML backends

HIP

Hardware

Ryzen395 AI+

Models

qwen3.5-122b-A10

Problem description & steps to reproduce

teh problem only with specific version with any model and ant draft quant

ARGS=(
#-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/MXFP4_MOE/Qwen3.5-122B-A10B-MXFP4_MOE-00001-of-00003.gguf
-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/UD-IQ4_NL/Qwen3.5-122B-A10B-UD-IQ4_NL-00001-of-00003.gguf
#-m /home/ovadm/models/unsloth--Qwen3.5-122B-A10B-GGUF/Q4_K_M/Qwen3.5-122B-A10B-Q4_K_M-00001-of-00003.gguf
--alias local_model
-np 1 #second app
--spec-draft-model /home/ovadm/models/z-lab--Qwen3.5-122B-A10B-DFlash/qwen3.5-dflash.v2.q5_k_m.gguf
--spec-type dflash
-ngl 99

First Bad Commit

it is working fine with: version: 9344 (75ae2a6)

Relevant log output

terminate called after throwing an instance of 'std::runtime_error'
what(): dflash: target and drafter vocab are incompatible; DFlash cannot retokenize draft outputs (target_vocab=248320 drafter_vocab=248320)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions