Skip to content

[Feat] Support FP4 gather_kv_b_proj#3597

Open
qichu-yun wants to merge 1 commit into
mainfrom
gather_kv_b_proj
Open

[Feat] Support FP4 gather_kv_b_proj#3597
qichu-yun wants to merge 1 commit into
mainfrom
gather_kv_b_proj

Conversation

@qichu-yun
Copy link
Copy Markdown

Motivation

Add an FP4 per-1x32 MXFP4 path for gather_kv_b_proj so cached MLA KV expansion can run with raw and preshuffled FP4 weights.

Technical Details

The change introduces a dedicated Triton FP4 gather path that handles per-1x32 MXFP4 block scaling and supports both raw and preshuffled FP4 weight layouts. The dispatcher now detects FP4 weight inputs and routes them to the new kernel path, while preserving the existing behavior for non-FP4 weights.

Test Plan

  • Verified the FP4 gather kernel with synthetic numerical accuracy tests.
  • Compared FP4 gather output against the expected reference behavior.
  • Verified that the existing non-FP4 gather path is still preserved.
  • Checked the updated files with linter diagnostics.

Test Result

Synthetic accuracy validation passed for the new FP4 gather path. No linter errors were reported for the modified files.

Submission Checklist

@qichu-yun qichu-yun requested a review from a team June 8, 2026 06:50
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3597 --add-label <label>

Add an FP4 per-1x32 MXFP4 path for gather_kv_b_proj so cached MLA KV expansion can run with raw and preshuffled FP4 weights.
@Phi-C Phi-C force-pushed the gather_kv_b_proj branch from 91d90ce to 4f16389 Compare June 8, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant