Skip to content

[CI] Fix CT W4A16 e2e memory-wait timeout on Strix Halo CI runner#991

Merged
mgehre-amd merged 1 commit into
gfx11from
matthias.fix-ct-w4a16-memory-wait
Jun 10, 2026
Merged

[CI] Fix CT W4A16 e2e memory-wait timeout on Strix Halo CI runner#991
mgehre-amd merged 1 commit into
gfx11from
matthias.fix-ct-w4a16-memory-wait

Conversation

@mgehre-amd

@mgehre-amd mgehre-amd commented Jun 8, 2026

Copy link
Copy Markdown

On Strix Halo runners amdsmi reports only 512 MiB dedicated VRAM, of which ~314 MiB (61%) is already occupied by the ROCm baseline before any test runs. The memory-wait threshold is (1 - gpu_memory_utilization), so with gpu_memory_utilization=0.5 the threshold was 50% = 256 MiB — below the 314 MiB floor, making it impossible to satisfy in 120 s.

Lower gpu_memory_utilization to 0.35, giving threshold = 65% = 332 MiB, which is safely above the 314 MiB baseline. vLLM uses unified memory on Strix Halo (PyTorch sees ~28+ GiB), so the KV-cache budget remains positive at this utilization ratio.

Fixes CI failure like https://github.com/ROCm/vllm/actions/runs/27040820558 when running the the runner with 512 MiB VRAM.

On Strix Halo runners amdsmi reports only 512 MiB dedicated VRAM, of
which ~314 MiB (61%) is already occupied by the ROCm baseline before any
test runs. The memory-wait threshold is (1 - gpu_memory_utilization), so
with gpu_memory_utilization=0.5 the threshold was 50% = 256 MiB — below
the 314 MiB floor, making it impossible to satisfy in 120 s.

Lower gpu_memory_utilization to 0.35, giving threshold = 65% = 332 MiB,
which is safely above the 314 MiB baseline. vLLM uses unified memory on
Strix Halo (PyTorch sees ~28+ GiB), so the KV-cache budget remains
positive at this utilization ratio.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
@mgehre-amd mgehre-amd requested review from amd-callumm and removed request for AndreasKaratzas June 8, 2026 22:52
@mgehre-amd mgehre-amd merged commit 2ead733 into gfx11 Jun 10, 2026
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants