MOE: add AITER_MOE_FORCE_BF16_ACT to force bf16 activations (opt-in) by sphinx07 · Pull Request #3593 · ROCm/aiter

sphinx07 · 2026-06-08T04:59:29Z

Summary

Adds an opt-in environment variable, AITER_MOE_FORCE_BF16_ACT, that forces
bf16 activations for the per_1x32 (MXFP4) SwiGLU MoE paths in fused_moe_.

By default the activation dtype is auto-selected:

separated SwiGLU: bf16 for small M, fp4x2 for large M (prefill)
interleaved / non-separated SwiGLU: fp8 on gfx950 for large M

Make the per_1x32 SwiGLU bf16→fp8 activation threshold configurable via
AITER_MOE_BF16_FP8_BOUND (default 512) to avoid the gfx950 fp8 prefill regression for MXFP4 w4a16. This flag lets them opt in without code changes.

Behavior

AITER_MOE_FORCE_BF16_ACT=1 → q_dtype_a = bf16 on the per_1x32 SwiGLU
paths, for both small and large M.
Unset / 0 (default) → no change; the existing auto-selection logic runs
exactly as before.

The flag is process-scoped and default-off, so it has no impact on existing
users, other models, quant types, or architectures.

Trade-off

Forcing bf16 activations skips the fp4x2/fp8 fast paths, trading throughput for
activation precision. It is intentionally opt-in so only deployments that need
it pay that cost.

Test plan

Default (flag unset): activation dtype selection is unchanged across
per_1x32 SwiGLU separated/interleaved cases (bf16/fp4x2/fp8 as before).
AITER_MOE_FORCE_BF16_ACT=1: verified the MoE dispatch stays on the bf16
activation path for both decode (small M) and prefill (large M) on gfx950
with an MXFP4 W4A16 model; no fp4x2/fp8 activation path taken.

github-actions · 2026-06-08T04:59:50Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3593 --add-label <label>

moe: add AITER_MOE_FORCE_BF16_ACT to force bf16 activations (opt-in)

b0752af

sphinx07 requested a review from a team June 8, 2026 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MOE: add AITER_MOE_FORCE_BF16_ACT to force bf16 activations (opt-in)#3593

MOE: add AITER_MOE_FORCE_BF16_ACT to force bf16 activations (opt-in)#3593
sphinx07 wants to merge 1 commit into
ROCm:mainfrom
sphinx07:main

sphinx07 commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sphinx07 commented Jun 8, 2026

Summary

Behavior

Trade-off

Test plan

Uh oh!

github-actions Bot commented Jun 8, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant