Skip to content

feat: R3 gym notq router replay#2915

Open
zyzhou5 wants to merge 4 commits into
NVIDIA-NeMo:mainfrom
zyzhou5:r3-gym-notq-router-replay
Open

feat: R3 gym notq router replay#2915
zyzhou5 wants to merge 4 commits into
NVIDIA-NeMo:mainfrom
zyzhou5:r3-gym-notq-router-replay

Conversation

@zyzhou5

@zyzhou5 zyzhou5 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

Adds no-TQ async Gym path support for R3 router replay.

When policy.router_replay.enabled=true, async Gym rollouts now request routed expert indices from Gym, validate that they are present, and convert them into Nemo-RL message logs. The routed expert indices are sliced across prompt and generation tokens for each Gym model call so the existing R3 training path can consume them.

This also updates the async OpenAI-compatible vLLM endpoint to return routed expert indices in chat response messages when requested.

@zyzhou5 zyzhou5 requested review from a team as code owners June 24, 2026 19:11
@copy-pr-bot

copy-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@zyzhou5 zyzhou5 added the CI:L1 Run doctests, unit tests, and functional tests label Jun 24, 2026
@zyzhou5 zyzhou5 changed the title R3 gym notq router replay feat: R3 gym notq router replay Jun 24, 2026
@zyzhou5

zyzhou5 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 763fb26

@zyzhou5

zyzhou5 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test c289b97

@zyzhou5

zyzhou5 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test df70c93

@zyzhou5

zyzhou5 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test f594c2d

Comment thread nemo_rl/models/generation/vllm/vllm_worker_async.py
Comment thread nemo_rl/models/generation/vllm/utils.py
ZhiyuLi-Nvidia
ZhiyuLi-Nvidia previously approved these changes Jun 25, 2026

@ZhiyuLi-Nvidia ZhiyuLi-Nvidia left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 nit.
LGTM.

@zyzhou5

zyzhou5 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 3879443

@zyzhou5 zyzhou5 added the r0.7.0 label Jun 26, 2026
@zyzhou5 zyzhou5 requested a review from a team as a code owner June 26, 2026 20:00
@zyzhou5

zyzhou5 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test f15074c

@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: f15074c (PR #2915 from r3-gym-notq-router-replay)

✅ Submodules that are properly updated:

Gym: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@zyzhou5 zyzhou5 force-pushed the r3-gym-notq-router-replay branch from f15074c to 338f8fc Compare June 26, 2026 20:06
@zyzhou5

zyzhou5 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 338f8fc

@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 338f8fc (PR #2915 from r3-gym-notq-router-replay)

✅ Submodules that are properly updated:

Gym: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@zyzhou5

zyzhou5 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 20f7389

@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 20f7389 (PR #2915 from r3-gym-notq-router-replay)

✅ Submodules that are properly updated:

Gym: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@zyzhou5

zyzhou5 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test fe5f434

@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: fe5f434 (PR #2915 from r3-gym-notq-router-replay)

✅ Submodules that are properly updated:

Gym: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@github-actions

Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 2ff1ffc (PR #2915 from r3-gym-notq-router-replay)

✅ Submodules that are properly updated:

Gym: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@zyzhou5

zyzhou5 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test 2ff1ffc

zyzhou5 added 4 commits June 29, 2026 09:47
Signed-off-by: Zeyu Zhou <zezhou@nvidia.com>
Signed-off-by: Zeyu Zhou <zezhou@nvidia.com>
Signed-off-by: Zeyu Zhou <zezhou@nvidia.com>
Signed-off-by: Zeyu Zhou <zezhou@nvidia.com>
@zyzhou5 zyzhou5 added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:L1 Run doctests, unit tests, and functional tests labels Jun 29, 2026
@zyzhou5 zyzhou5 force-pushed the r3-gym-notq-router-replay branch from 2ff1ffc to c25a0bf Compare June 29, 2026 17:00
@zyzhou5

zyzhou5 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test c25a0bf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) r0.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants