Skip to content

Fix eos issues in sglang / vllm samplers during on-policy rollout#1148

Merged
copybara-service[bot] merged 2 commits intogoogle:mainfrom
precur-ai:fix_eos_issue_sgl_vllm
Feb 25, 2026
Merged

Fix eos issues in sglang / vllm samplers during on-policy rollout#1148
copybara-service[bot] merged 2 commits intogoogle:mainfrom
precur-ai:fix_eos_issue_sgl_vllm

Conversation

@yixinw
Copy link
Collaborator

@yixinw yixinw commented Feb 25, 2026

The existing implementation that leverages sglang/vllm samplers during rollout has some issues with the eos (e.g., <|im_end|> in Qwen models) which can make it problematic. This PR addresses this by removing the unnecessary (sometimes wrong) eos tokens in the input, to make RL more stable and effective.

On one benchmark that we experimented, we found this change has lead to significance in terms of the learning efficiency and quality.

353071771984433_ pic

Checklist

  • I have added all the necessary unit tests for my change.
  • I have verified that my change does not break existing code and all unit tests pass.
  • I have added all appropriate doc-strings/documentation.
  • My PR is based on the latest changes of the main branch (if unsure, rebase the code).
  • I have signed the Contributor License Agreement.
  • I have followed Contribution Guidelines.

@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Collaborator

@tianshub tianshub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah wow! thanks for the fix Bethany, this is huge improvement.

@copybara-service copybara-service bot merged commit 7a1ae80 into google:main Feb 25, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants