[op_tests] Refactor MoE legacy UT into per-quant smoke sweep#3585
Open
zhiding512 wants to merge 4 commits into
Open
[op_tests] Refactor MoE legacy UT into per-quant smoke sweep#3585zhiding512 wants to merge 4 commits into
zhiding512 wants to merge 4 commits into
Conversation
Replace the global CLI-default sweep in test_moe_2stage.py with a QUANT_DEFAULTS table that pins a representative production shape (dim/E/topk/pad/preshuffle/act/strict_accuracy) per quant triple. CLI flags (-dim/-e/-k/-hip/-p) still override the defaults globally when supplied. - _iter_legacy_cases now drives a single itertools.product loop off the per-quant config instead of per-triple if/elif branches. - Kernel-forced activations (a16w4 -> Swiglu, a16wi4 -> Silu) are encoded in the table and ignore -a; other quants honor -a. - strict_accuracy is gated on per quant (enabled for the fp4-weight a4w4 / a8w4-mxfp paths, warn-only elsewhere). - test_fmoe now compares only the real (un-padded) model_dim region, since some kernels leave the padded tail uninitialized/NaN. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors the legacy MoE 2-stage op smoke sweep in op_tests/test_moe_2stage.py from a global CLI-default parameter grid into a per-quant configuration table (QUANT_DEFAULTS), aiming to exercise representative production-like shapes per quantization triple while still allowing CLI flags to override defaults.
Changes:
- Introduces
QUANT_DEFAULTSand rewrites_iter_legacy_cases()to generate cases via a unifieditertools.productloop driven by per-quant defaults. - Encodes kernel-imposed activation constraints in the per-quant table (e.g., a16w4→Swiglu, a16wi4→Silu) and gates
strict_accuracyper-quant. - Updates
test_fmoeaccuracy checking to compare only the unpaddedmodel_dimregion (avoiding NaNs from uninitialized padded tails).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+372
to
+374
| real_model_dim = model_dim - hidden_pad | ||
| out2_ref = out2_ref[:, :real_model_dim] | ||
| out2_ck = out2_ck[:, :real_model_dim] |
Comment on lines
+533
to
+535
| help="""Whether to use pre-shuffle weight mode. If unset, each quant uses | ||
| its per-quant default (only a4w4 varies preshuffle; others require shuffled | ||
| weights for correctness). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the global CLI-default sweep in test_moe_2stage.py with a QUANT_DEFAULTS table that pins a representative production shape (dim/E/topk/pad/preshuffle/act/strict_accuracy) per quant triple. CLI flags (-dim/-e/-k/-hip/-p) still override the defaults globally when supplied.