gpt-oss WAs + moe a8w4 gemm support#1067
Open
ahmed-bsod wants to merge 6 commits into
Open
Conversation
cbc5e0f to
9a0e5f5
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces new environment-driven workarounds and an opt-in MoE compute path aimed at supporting GPT-OSS MXFP4 MoE with an FP8-activation × MXFP4-weight (a8w4) kernel, plus gfx1250-specific reroutes for known HIP kernel faults.
Changes:
- Add env toggles for GPT-OSS a8w4 MoE, and gfx1250 workarounds (RMSNorm + sampler fallbacks).
- Route temperature sampling to a torch argmax fallback when
ATOM_USE_TORCH_SAMPLERis enabled. - Add an a8w4 MoE execution path (weight prep + forward) and a Triton/Gluon RMSNorm reroute when enabled.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
atom/utils/envs.py |
Adds new env toggles for a8w4 MoE and gfx1250 workaround routing. |
atom/model_ops/sampler.py |
Adds a torch greedy-argmax fallback for temperature sampling under a gfx1250 workaround flag. |
atom/model_ops/moe.py |
Adds an opt-in a8w4 path (weight prep + forward dispatch) for MXFP4 MoE. |
atom/model_ops/layernorm.py |
Adds a Triton/Gluon RMSNorm reroute under a gfx1250 workaround flag. |
atom/model_ops/fused_moe_triton.py |
Implements the a8w4 two-GEMM fused-experts path using AITER’s moe_gemm_a8w4. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ed gfx1250 check anymore since the aiter wrapper handles it
…o rename it here as well
…Type to be swiglu to use the aiter triton a8w4 path
9a0e5f5 to
02755eb
Compare
02755eb to
2208dcb
Compare
Comment on lines
+1050
to
+1063
| if self.use_a8w4 and activation == ActivationType.Swiglu: | ||
| # gpt-oss MXFP4 MoE via AITER fp8-act x mxfp4-weight gluon kernel. | ||
| # Gated on SwiGLU: the a8w4 fast-path hardcodes gpt-oss SwiGLU, so | ||
| # non-SwiGLU layers fall through to the matmul_ogs path below. | ||
| from atom.model_ops.fused_moe_triton import aiter_a8w4_fused_experts | ||
|
|
||
| return aiter_a8w4_fused_experts( | ||
| x, | ||
| layer, | ||
| router_logits, | ||
| top_k, | ||
| renormalize, | ||
| ) | ||
|
|
Comment on lines
+765
to
768
| # Opt-in: use AITER's triton/gluon moe_op_gemm_a8w4 (fp8 act x mxfp4 weight) kernel | ||
| # for the gpt-oss swiglu MoE instead of triton_kernels matmul_ogs. otherwise stays on matmul_ogs. | ||
| self.use_a8w4 = envs.ATOM_USE_TRITON_A8W4_MOE | ||
|
|
Comment on lines
+38
to
+42
| # Use Triton/gluon moe_op_gemm_a8w4 (fp8 activation x mxfp4 weight) kernel | ||
| # for the gpt-oss (swiglu) MXFP4 MoE instead of the triton_kernels matmul_ogs | ||
| # (bf16 x mxfp4) path. falls back to matmul_ogs when off. | ||
| "ATOM_USE_TRITON_A8W4_MOE": lambda: os.getenv("ATOM_USE_TRITON_A8W4_MOE", "0") | ||
| == "1", |
Comment on lines
+38
to
+40
| # Use Triton/gluon moe_op_gemm_a8w4 (fp8 activation x mxfp4 weight) kernel | ||
| # for the gpt-oss (swiglu) MXFP4 MoE instead of the triton_kernels matmul_ogs | ||
| # (bf16 x mxfp4) path. falls back to matmul_ogs when off. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.