Skip to content

[AMD] improve dsr1 fp4 disagg perf on mi355x#983

Open
billishyahao wants to merge 15 commits intomainfrom
amd/mi355x-dsfp4-march30
Open

[AMD] improve dsr1 fp4 disagg perf on mi355x#983
billishyahao wants to merge 15 commits intomainfrom
amd/mi355x-dsfp4-march30

Conversation

@billishyahao
Copy link
Copy Markdown
Collaborator

@billishyahao billishyahao commented Mar 31, 2026

The new patch is adding the following optimization:

  • "Bump SGL mori image to March 27"
  • "Add more low latency sweep configs"
  • "Enable v2 mxfp4 DSR1 0528 model"
  • "Enable fp4 disp feature on mori"

billishyahao and others added 12 commits March 16, 2026 08:36
…transformers v5

Transformers v5 incorrectly rebuilds pre_tokenizer/decoder components for
models like DeepSeek-R1 that use LlamaTokenizerFast with a non-Llama
tokenizer architecture. The sglang server fixes this at startup, but the
benchmark client loads the tokenizer without these fixes, causing a ~5x
token count inflation (e.g. 7000 tokens -> 35000 tokens) and false
performance regressions in TTFT and throughput benchmarks.

Apply the same tokenizer fixes (pre_tokenizer/decoder restoration and
add_bos_token recovery) that sglang server applies, so client and server
tokenize identically. No-op on transformers v4.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you


dsr1-fp8-mi355x-sglang-disagg:
image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0227-2
image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0327
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @billishyahao

in early march, you said that after consulting with @HaiShaw and others in the org that by End of March, you would be using upstream images. Can u please update this use upstream nightly images instead of second class forks?

lets ensure that we work towards amd being an first class platform on sglang instead of continuing to submit second class forks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants