Eval bug: llama.cpp (AMD GPU): partial layers run on CPU when loading dflash model, cannot fully offload to GPU

### Name and Version

 .\llama-cli.exe --version
ggml_cuda_init: found 1 ROCm devices (Total VRAM: 89976 MiB):
  Device 0: AMD Radeon(TM) 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, VRAM: 89976 MiB
version: 9459 (07ac3cec6)
built with Clang 22.0.0 for Windows AMD64

### Operating systems

Windows

### GGML backends

HIP

### Hardware

OS: Microsoft Windows 11 专业版 10.0.26200 26200
CPU: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S 
GPU: AMD Radeon(TM) 8060S Graphics 4293918720 32.0.31007.5012


### Models

Qwen3.6-27B-Q5_K_S.gguf
Qwen3.6-27B-DFlash-Q4_K_M.gguf

### Problem description & steps to reproduce

`.\llama-server.exe -m C:\Users\xh\Desktop\work\code\novamax\data\models_dir\llm\unsloth\Qwen3.6-27B-GGUF\Qwen3.6-27B-Q5_K_S.gguf --mmproj C:\Users\xh\Desktop\work\code\novamax\data\models_dir\llm\unsloth\Qwen3.5-27B-GGUF\mmproj-BF16.gguf --no-mmproj-offload --spec-draft-model C:\Users\xh\Downloads\Qwen3.6-27B-DFlash-Q4_K_M.gguf --spec-type dflash --spec-dflash-cross-ctx 1024 --host 0.0.0.0 --port 1234 --parallel 1 --kv-unified --n-gpu-layers all  --spec-draft-ngl all  -b 2048 -ub 512 --ctx-size 102400  --cache-type-k q5_0 --cache-type-v q4_1  --flash-attn on --cache-ram 0 --jinja --no-mmap  --mlock --temperature 0.6 --reasoning off --top-p 1.0 --top-k 20 --min-p 0.0 --spec-draft-n-max 8 `

On an AMD GPU system, when loading and running dflash model with llama.cpp, only part of the model layers are offloaded to GPU, while the rest still execute on CPU. GPU usage is partial and CPU load remains high, resulting in slower inference speed.

[dflash_model.log](https://github.com/user-attachments/files/28383312/dflash_model.log)

<img width="1049" height="804" alt="Image" src="https://github.com/user-attachments/assets/a2ef3dd8-ba82-44fc-b39a-16e665e74e05" />

### First Bad Commit

_No response_

### Relevant log output

<details>
<summary>Logs</summary>


```console

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: llama.cpp (AMD GPU): partial layers run on CPU when loading dflash model, cannot fully offload to GPU #45

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: llama.cpp (AMD GPU): partial layers run on CPU when loading dflash model, cannot fully offload to GPU #45

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions