feat(modelopt): support real NVFP4 QAT rollout for MoE and Mamba by HollowMan6 · Pull Request #2983 · NVIDIA-NeMo/RL

HollowMan6 · 2026-06-29T03:01:17Z

Waiting for NVIDIA-NeMo/Megatron-Bridge#4566

What does this PR do ?

Add ModelOpt NVFP4 real-quant vLLM reload support for W4A16 fused-MoE rollout weights.

Issues

List issues that this PR closes (syntax):

Resolves Support QAT for low precision RL #1750

Summary

Add Nano3-specific real-quant ignore profile so Mamba, attention, gates/routers, shared experts, norms, and selected sensitive layers stay in BF16.
Wire Megatron real-quant refit through export_hf_weights_modelopt(..., quant_mode="w4a16_nvfp4").
Add Nano3 W4A16 real-quant recipe and nightly test entry.

Usage

Run the Nano3 W4A16 real-quant rollout recipe directly:

uv run --no-sync examples/run_grpo.py \
  --config examples/configs/recipes/llm/grpo-nanov3-30ba3b-4n4g-megatron-qa-nvfp4-w4a16-real.yaml

Or enable real-quant rollout on an existing Megatron + vLLM recipe:

policy:
  quant_cfg: examples/modelopt/quant_configs/nano3_nvfp4_weightonly.yaml
  generation:
    backend: vllm
    quant_cfg: examples/modelopt/quant_configs/nano3_nvfp4_weightonly.yaml
    real_quant: true
    real_quant_ignore: NANO3_NVFP4_IGNORE
    vllm_cfg:
      gpu_memory_utilization: 0.35
      enable_prefix_caching: false

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Signed-off-by: Hollow Man <hollowman@opensuse.org>

copy-pr-bot · 2026-06-29T03:01:21Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

feat(modelopt): support real NVFP4 QAT rollout for MoE and Mamba

3f4e3d8

Signed-off-by: Hollow Man <hollowman@opensuse.org>

HollowMan6 requested a review from mxinO June 29, 2026 03:01

github-actions Bot added the Documentation Improvements or additions to documentation label Jun 29, 2026

HollowMan6 added CI:L1 Run doctests, unit tests, and functional tests Feature labels Jun 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(modelopt): support real NVFP4 QAT rollout for MoE and Mamba#2983

feat(modelopt): support real NVFP4 QAT rollout for MoE and Mamba#2983
HollowMan6 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
HollowMan6:real_quant

HollowMan6 commented Jun 29, 2026

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

HollowMan6 commented Jun 29, 2026

What does this PR do ?

Issues

Summary

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant