gpt-oss WAs + moe a8w4 gemm support by ahmed-bsod · Pull Request #1067 · ROCm/ATOM

ahmed-bsod · 2026-06-04T05:26:09Z

No description provided.

Copilot

Pull request overview

This PR introduces new environment-driven workarounds and an opt-in MoE compute path aimed at supporting GPT-OSS MXFP4 MoE with an FP8-activation × MXFP4-weight (a8w4) kernel, plus gfx1250-specific reroutes for known HIP kernel faults.

Changes:

Add env toggles for GPT-OSS a8w4 MoE, and gfx1250 workarounds (RMSNorm + sampler fallbacks).
Route temperature sampling to a torch argmax fallback when ATOM_USE_TORCH_SAMPLER is enabled.
Add an a8w4 MoE execution path (weight prep + forward) and a Triton/Gluon RMSNorm reroute when enabled.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`atom/utils/envs.py`	Adds new env toggles for a8w4 MoE and gfx1250 workaround routing.
`atom/model_ops/sampler.py`	Adds a torch greedy-argmax fallback for temperature sampling under a gfx1250 workaround flag.
`atom/model_ops/moe.py`	Adds an opt-in a8w4 path (weight prep + forward dispatch) for MXFP4 MoE.
`atom/model_ops/layernorm.py`	Adds a Triton/Gluon RMSNorm reroute under a gfx1250 workaround flag.
`atom/model_ops/fused_moe_triton.py`	Implements the a8w4 two-GEMM fused-experts path using AITER’s `moe_gemm_a8w4`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ed gfx1250 check anymore since the aiter wrapper handles it

…o rename it here as well

…Type to be swiglu to use the aiter triton a8w4 path

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

+        if self.use_a8w4 and activation == ActivationType.Swiglu:
+            # gpt-oss MXFP4 MoE via AITER fp8-act x mxfp4-weight gluon kernel.
+            # Gated on SwiGLU: the a8w4 fast-path hardcodes gpt-oss SwiGLU, so
+            # non-SwiGLU layers fall through to the matmul_ogs path below.
+            from atom.model_ops.fused_moe_triton import aiter_a8w4_fused_experts
+
+            return aiter_a8w4_fused_experts(
+                x,
+                layer,
+                router_logits,
+                top_k,
+                renormalize,
+            )
+


+        # Opt-in: use AITER's triton/gluon moe_op_gemm_a8w4 (fp8 act x mxfp4 weight) kernel
+        # for the gpt-oss swiglu MoE instead of triton_kernels matmul_ogs. otherwise stays on matmul_ogs.
+        self.use_a8w4 = envs.ATOM_USE_TRITON_A8W4_MOE



+    # Use Triton/gluon moe_op_gemm_a8w4 (fp8 activation x mxfp4 weight) kernel
+    # for the gpt-oss (swiglu) MXFP4 MoE instead of the triton_kernels matmul_ogs
+    # (bf16 x mxfp4) path. falls back to matmul_ogs when off.
+    "ATOM_USE_TRITON_A8W4_MOE": lambda: os.getenv("ATOM_USE_TRITON_A8W4_MOE", "0")
+    == "1",


+    # Use Triton/gluon moe_op_gemm_a8w4 (fp8 activation x mxfp4 weight) kernel
+    # for the gpt-oss (swiglu) MXFP4 MoE instead of the triton_kernels matmul_ogs
+    # (bf16 x mxfp4) path. falls back to matmul_ogs when off.


ahmed-bsod force-pushed the ahmed/gpt-oss-WA-new branch from cbc5e0f to 9a0e5f5 Compare June 4, 2026 19:38

ahmed-bsod marked this pull request as ready for review June 4, 2026 21:05

Copilot AI review requested due to automatic review settings June 4, 2026 21:05

Copilot started reviewing on behalf of ahmed-bsod June 4, 2026 21:05 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

Comment thread atom/model_ops/moe.py Outdated

Comment thread atom/model_ops/moe.py Outdated

Comment thread atom/model_ops/fused_moe_triton.py Outdated

ahmed-bsod added 6 commits June 5, 2026 15:11

WA's needed for gpt-oss accuracy on gfx1250

b408891

add support for triton moe_a8w4

7c85eef

move gen_fake guards to aiter and fix the import names. also don't ne…

7e32a3c

…ed gfx1250 check anymore since the aiter wrapper handles it

add_residual parameter renamed to swiglu_add_residual in aiter need t…

c977f06

…o rename it here as well

black format fixes

d056f08

address copilot comments. add an additional condiition for Activation…

2208dcb

…Type to be swiglu to use the aiter triton a8w4 path

ahmed-bsod force-pushed the ahmed/gpt-oss-WA-new branch from 9a0e5f5 to 02755eb Compare June 5, 2026 15:12

Copilot AI review requested due to automatic review settings June 5, 2026 15:13

ahmed-bsod force-pushed the ahmed/gpt-oss-WA-new branch from 02755eb to 2208dcb Compare June 5, 2026 15:13

Copilot started reviewing on behalf of ahmed-bsod June 5, 2026 15:13 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt-oss WAs + moe a8w4 gemm support#1067

gpt-oss WAs + moe a8w4 gemm support#1067
ahmed-bsod wants to merge 6 commits into
mainfrom
ahmed/gpt-oss-WA-new

ahmed-bsod commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ahmed-bsod commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants