[CI] Fix CT W4A16 e2e memory-wait timeout on Strix Halo CI runner by mgehre-amd · Pull Request #991 · ROCm/vllm

mgehre-amd · 2026-06-08T22:52:38Z

On Strix Halo runners amdsmi reports only 512 MiB dedicated VRAM, of which ~314 MiB (61%) is already occupied by the ROCm baseline before any test runs. The memory-wait threshold is (1 - gpu_memory_utilization), so with gpu_memory_utilization=0.5 the threshold was 50% = 256 MiB — below the 314 MiB floor, making it impossible to satisfy in 120 s.

Lower gpu_memory_utilization to 0.35, giving threshold = 65% = 332 MiB, which is safely above the 314 MiB baseline. vLLM uses unified memory on Strix Halo (PyTorch sees ~28+ GiB), so the KV-cache budget remains positive at this utilization ratio.

Fixes CI failure like https://github.com/ROCm/vllm/actions/runs/27040820558 when running the the runner with 512 MiB VRAM.

On Strix Halo runners amdsmi reports only 512 MiB dedicated VRAM, of which ~314 MiB (61%) is already occupied by the ROCm baseline before any test runs. The memory-wait threshold is (1 - gpu_memory_utilization), so with gpu_memory_utilization=0.5 the threshold was 50% = 256 MiB — below the 314 MiB floor, making it impossible to satisfy in 120 s. Lower gpu_memory_utilization to 0.35, giving threshold = 65% = 332 MiB, which is safely above the 314 MiB baseline. vLLM uses unified memory on Strix Halo (PyTorch sees ~28+ GiB), so the KV-cache budget remains positive at this utilization ratio. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

mgehre-amd requested a review from AndreasKaratzas as a code owner June 8, 2026 22:52

mgehre-amd requested review from amd-callumm and removed request for AndreasKaratzas June 8, 2026 22:52

amd-callumm approved these changes Jun 9, 2026

View reviewed changes

mgehre-amd merged commit 2ead733 into gfx11 Jun 10, 2026
5 of 7 checks passed

mgehre-amd mentioned this pull request Jun 10, 2026

[ROCm][gfx11] Restore TRITON_ATTN priority on gfx11 #994

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Fix CT W4A16 e2e memory-wait timeout on Strix Halo CI runner#991

[CI] Fix CT W4A16 e2e memory-wait timeout on Strix Halo CI runner#991
mgehre-amd merged 1 commit into
gfx11from
matthias.fix-ct-w4a16-memory-wait

mgehre-amd commented Jun 8, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mgehre-amd commented Jun 8, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgehre-amd commented Jun 8, 2026 •

edited by github-actions Bot

Loading