Skip to content

gpt-oss WAs + moe a8w4 gemm support#1067

Open
ahmed-bsod wants to merge 6 commits into
mainfrom
ahmed/gpt-oss-WA-new
Open

gpt-oss WAs + moe a8w4 gemm support#1067
ahmed-bsod wants to merge 6 commits into
mainfrom
ahmed/gpt-oss-WA-new

Conversation

@ahmed-bsod
Copy link
Copy Markdown

No description provided.

@ahmed-bsod ahmed-bsod force-pushed the ahmed/gpt-oss-WA-new branch from cbc5e0f to 9a0e5f5 Compare June 4, 2026 19:38
@ahmed-bsod ahmed-bsod marked this pull request as ready for review June 4, 2026 21:05
Copilot AI review requested due to automatic review settings June 4, 2026 21:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces new environment-driven workarounds and an opt-in MoE compute path aimed at supporting GPT-OSS MXFP4 MoE with an FP8-activation × MXFP4-weight (a8w4) kernel, plus gfx1250-specific reroutes for known HIP kernel faults.

Changes:

  • Add env toggles for GPT-OSS a8w4 MoE, and gfx1250 workarounds (RMSNorm + sampler fallbacks).
  • Route temperature sampling to a torch argmax fallback when ATOM_USE_TORCH_SAMPLER is enabled.
  • Add an a8w4 MoE execution path (weight prep + forward) and a Triton/Gluon RMSNorm reroute when enabled.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
atom/utils/envs.py Adds new env toggles for a8w4 MoE and gfx1250 workaround routing.
atom/model_ops/sampler.py Adds a torch greedy-argmax fallback for temperature sampling under a gfx1250 workaround flag.
atom/model_ops/moe.py Adds an opt-in a8w4 path (weight prep + forward dispatch) for MXFP4 MoE.
atom/model_ops/layernorm.py Adds a Triton/Gluon RMSNorm reroute under a gfx1250 workaround flag.
atom/model_ops/fused_moe_triton.py Implements the a8w4 two-GEMM fused-experts path using AITER’s moe_gemm_a8w4.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread atom/model_ops/moe.py Outdated
Comment thread atom/model_ops/moe.py Outdated
Comment thread atom/model_ops/fused_moe_triton.py Outdated
@ahmed-bsod ahmed-bsod force-pushed the ahmed/gpt-oss-WA-new branch from 9a0e5f5 to 02755eb Compare June 5, 2026 15:12
Copilot AI review requested due to automatic review settings June 5, 2026 15:13
@ahmed-bsod ahmed-bsod force-pushed the ahmed/gpt-oss-WA-new branch from 02755eb to 2208dcb Compare June 5, 2026 15:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread atom/model_ops/moe.py
Comment on lines +1050 to +1063
if self.use_a8w4 and activation == ActivationType.Swiglu:
# gpt-oss MXFP4 MoE via AITER fp8-act x mxfp4-weight gluon kernel.
# Gated on SwiGLU: the a8w4 fast-path hardcodes gpt-oss SwiGLU, so
# non-SwiGLU layers fall through to the matmul_ogs path below.
from atom.model_ops.fused_moe_triton import aiter_a8w4_fused_experts

return aiter_a8w4_fused_experts(
x,
layer,
router_logits,
top_k,
renormalize,
)

Comment thread atom/model_ops/moe.py
Comment on lines +765 to 768
# Opt-in: use AITER's triton/gluon moe_op_gemm_a8w4 (fp8 act x mxfp4 weight) kernel
# for the gpt-oss swiglu MoE instead of triton_kernels matmul_ogs. otherwise stays on matmul_ogs.
self.use_a8w4 = envs.ATOM_USE_TRITON_A8W4_MOE

Comment thread atom/utils/envs.py
Comment on lines +38 to +42
# Use Triton/gluon moe_op_gemm_a8w4 (fp8 activation x mxfp4 weight) kernel
# for the gpt-oss (swiglu) MXFP4 MoE instead of the triton_kernels matmul_ogs
# (bf16 x mxfp4) path. falls back to matmul_ogs when off.
"ATOM_USE_TRITON_A8W4_MOE": lambda: os.getenv("ATOM_USE_TRITON_A8W4_MOE", "0")
== "1",
Comment thread atom/utils/envs.py
Comment on lines +38 to +40
# Use Triton/gluon moe_op_gemm_a8w4 (fp8 activation x mxfp4 weight) kernel
# for the gpt-oss (swiglu) MXFP4 MoE instead of the triton_kernels matmul_ogs
# (bf16 x mxfp4) path. falls back to matmul_ogs when off.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants