support modelopt W4A16 NVFP4 export for grouped MoE weights by HollowMan6 · Pull Request #4566 · NVIDIA-NeMo/Megatron-Bridge

HollowMan6 · 2026-06-29T02:47:18Z

What does this PR do ?

Support modelopt W4A16 NVFP4 export for grouped MoE weights

Changelog

Extend AutoBridge.export_hf_weights_modelopt to support quant_mode="w4a16_nvfp4". NVIDIA/Model-Optimizer@a451a2b
Collect ModelOpt quant metadata from the actual quantized weight/module, including NVFP4 weight_scale_2.
Map Megatron ModelOpt metadata onto exported HF names before quantization.
Support grouped MoE expert exports by syncing EP metadata, stacking per-expert metadata, and emitting vLLM ModelOpt fused-MoE tensor names:
- *.experts.w13_weight
- *.experts.w13_weight_scale
- *.experts.w13_weight_scale_2
- *.experts.w2_weight
- *.experts.w2_weight_scale
- *.experts.w2_weight_scale_2
Keep ignored weights unquantized and skip quantizer tensors during export.

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

- Extend AutoBridge.export_hf_weights_modelopt to support quant_mode="w4a16_nvfp4". - Collect ModelOpt quant metadata from the actual quantized weight/module, including NVFP4 weight_scale_2. - Map Megatron ModelOpt metadata onto exported HF names before quantization. - Support grouped MoE expert exports by syncing EP metadata, stacking per-expert metadata, and emitting vLLM ModelOpt fused-MoE tensor names: - *.experts.w13_weight - *.experts.w13_weight_scale - *.experts.w13_weight_scale_2 - *.experts.w2_weight - *.experts.w2_weight_scale - *.experts.w2_weight_scale_2 - Keep ignored weights unquantized and skip quantizer tensors during export. Signed-off-by: Hollow Man <hollowman@opensuse.org>

copy-pr-bot Bot temporarily deployed to public June 29, 2026 02:47 Inactive

copy-pr-bot Bot temporarily deployed to test June 29, 2026 02:48 Inactive

HollowMan6 mentioned this pull request Jun 29, 2026

feat(modelopt): support real NVFP4 QAT rollout for MoE and Mamba NVIDIA-NeMo/RL#2983

Draft

4 tasks

HollowMan6 requested a review from mxinO June 29, 2026 03:01

copy-pr-bot Bot temporarily deployed to public June 29, 2026 03:11 Inactive

copy-pr-bot Bot temporarily deployed to public June 29, 2026 03:12 Inactive

copy-pr-bot Bot temporarily deployed to public June 29, 2026 03:41 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support modelopt W4A16 NVFP4 export for grouped MoE weights#4566

support modelopt W4A16 NVFP4 export for grouped MoE weights#4566
HollowMan6 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
HollowMan6:modelopt

HollowMan6 commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

HollowMan6 commented Jun 29, 2026

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant