Skip to content

[vLLM-ATOM] Enable DBO for vLLM plugin#1103

Draft
kliuae wants to merge 17 commits into
ROCm:mainfrom
kliuae:kliuae/plugin_enable_dbo_merge_merge
Draft

[vLLM-ATOM] Enable DBO for vLLM plugin#1103
kliuae wants to merge 17 commits into
ROCm:mainfrom
kliuae:kliuae/plugin_enable_dbo_merge_merge

Conversation

@kliuae
Copy link
Copy Markdown
Contributor

@kliuae kliuae commented Jun 5, 2026

Motivation

This PR enables DBO for vLLM-ATOM. This PR requires DP+EP enablement and currently contains changes from the enablement PR.

Technical Details

Test Plan

Test Result

deepseek-ai/DeepSeek-R1-0528 DP8+EP+DBO

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match _ 0.953 _ 0.0058
strict-match 5 exact_match _ 0.950 _ 0.0060

openai/gpt-oss-120b DP2+EP+DBO

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match _ 0.4723 _ 0.0138
strict-match 5 exact_match _ 0.3237 _ 0.0129

Submission Checklist

zejunchen-zejun and others added 15 commits May 28, 2026 17:38
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
@zejunchen-zejun zejunchen-zejun requested a review from gbyu-amd June 5, 2026 12:18
"""Relax vLLM's DeepEP-only gate so ATOM plugin mode can run DBO over mori.

vLLM has a hard-assert that when DBO is enabled, the all2all backend is one
of the two DeepEP backends (deepep_low_latency or deepep_high_throughput)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hard assert still exist in vLLM 0.22.0 version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in vLLM v0.22.0 this still exists

_orig_post_init(self)
finally:
if spoofed:
pc.all2all_backend = restore_backend
Copy link
Copy Markdown
Collaborator

@zejunchen-zejun zejunchen-zejun Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We restore the all2all backend to avoid vllm create another mori all2all manager, after that, the sys will have 2 mori all2all managers, while it could not happen as there could be an unsupported error when specifying --all2all-backend=mori --enable-dbo because it is not supported in vllm for DBO for now.

Could you have a recipe about atom-vllm DBO usage? The users may specify a mori all2all manager when launch vllm server, but got error, what argument should atom-vllm frontend user specify

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the restore_backend here is set to falling back to AgRs on vLLM side for all cases. So when users explicitly specify mori as all2all_backend, on vLLM side it does not construct another mori a2a manager. Effectively all all2all_backend specified at vLLM frontend will be swapped out by the lightweight AgRs, but sure I can use a recipe to articulate this.

@zejunchen-zejun
Copy link
Copy Markdown
Collaborator

Thank you for help enabling DBO.
Could you help:

  • check the DBO perf gain with high concurrency, you can use GPTOSS model
  • add deepseekv3.2 DBO into atom-vllm nightly accuracy check

kliuae added 2 commits June 8, 2026 13:31
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants