Skip to content

feat(modelopt): support real NVFP4 QAT rollout for MoE and Mamba#2983

Draft
HollowMan6 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
HollowMan6:real_quant
Draft

feat(modelopt): support real NVFP4 QAT rollout for MoE and Mamba#2983
HollowMan6 wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
HollowMan6:real_quant

Conversation

@HollowMan6

Copy link
Copy Markdown
Member

Waiting for NVIDIA-NeMo/Megatron-Bridge#4566

What does this PR do ?

Add ModelOpt NVFP4 real-quant vLLM reload support for W4A16 fused-MoE rollout weights.

Issues

List issues that this PR closes (syntax):

Summary

  • Add Nano3-specific real-quant ignore profile so Mamba, attention, gates/routers, shared experts, norms, and selected sensitive layers stay in BF16.
  • Wire Megatron real-quant refit through export_hf_weights_modelopt(..., quant_mode="w4a16_nvfp4").
  • Add Nano3 W4A16 real-quant recipe and nightly test entry.

Usage

Run the Nano3 W4A16 real-quant rollout recipe directly:

uv run --no-sync examples/run_grpo.py \
  --config examples/configs/recipes/llm/grpo-nanov3-30ba3b-4n4g-megatron-qa-nvfp4-w4a16-real.yaml

Or enable real-quant rollout on an existing Megatron + vLLM recipe:

policy:
  quant_cfg: examples/modelopt/quant_configs/nano3_nvfp4_weightonly.yaml
  generation:
    backend: vllm
    quant_cfg: examples/modelopt/quant_configs/nano3_nvfp4_weightonly.yaml
    real_quant: true
    real_quant_ignore: NANO3_NVFP4_IGNORE
    vllm_cfg:
      gpu_memory_utilization: 0.35
      enable_prefix_caching: false

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

image image

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@copy-pr-bot

copy-pr-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@HollowMan6 HollowMan6 requested a review from mxinO June 29, 2026 03:01
@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label Jun 29, 2026
@HollowMan6 HollowMan6 added CI:L1 Run doctests, unit tests, and functional tests Feature labels Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests Documentation Improvements or additions to documentation Feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support QAT for low precision RL

1 participant