Add GLM GQA FP8 KV paged attention test by ThomasNing · Pull Request #3609 · ROCm/aiter

ThomasNing · 2026-06-08T10:03:06Z

Summary

Add a GLM-style paged_attention_ragged regression test for GQA FP8 KV cache
Covers BF16 query/output, FP8 K/V cache, NHD layout, page size 1, 96 query heads, 8 KV heads, and head size 128
Compares native FP8-KV decode against the same FP8 values dequantized to BF16

Why

GLM-4.5 style serving uses GQA with 96 Q heads / 8 KV heads / D=128. This pins official AITER coverage for direct FP8 KV decode so framework integrations can route to AITER without dequantizing the cache first.

Test

On MI300X gfx942 with lmsysorg/sglang:v0.5.9-rocm700-mi30x:
python3 -m pytest -q op_tests/test_pa_ragged.py::test_paged_attention_ragged_glm_gqa_fp8_kv_nhd -s
Result: 1 passed, 2 warnings in 25.67s

github-actions · 2026-06-08T10:03:21Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3609 --add-label <label>

ThomasNing · 2026-06-08T10:06:09Z

I should add up one more SGLang patch in the following to prevent it be dequantized.

Add GLM GQA FP8 KV paged attention test

ebc3353

ThomasNing requested a review from a team June 8, 2026 10:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GLM GQA FP8 KV paged attention test#3609

Add GLM GQA FP8 KV paged attention test#3609
ThomasNing wants to merge 1 commit into
ROCm:mainfrom
ThomasNing:thomas/glm-gqa-fp8-pa-ragged-test-upstream

ThomasNing commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

ThomasNing commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ThomasNing commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test

Uh oh!

github-actions Bot commented Jun 8, 2026

🏷️ CI Guide

Uh oh!

ThomasNing commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ThomasNing commented Jun 8, 2026 •

edited

Loading