[Do not merge][AMD]Update cli args for qwen3.5#980
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
| description: | ||
| - "Update cli args of Qwen3.5 FP8 and BF16 SGLang benchmarks for MI355X to achieve better performance" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/942 | ||
|
|
There was a problem hiding this comment.
🟡 The new perf-changelog entry references pr-link #942, but the CLI arg changes it describes (adding --tokenizer-worker-num, --enable-aiter-allreduce-fusion, --cuda-graph-max-bs, etc.) are physically introduced in this PR (#980). The pr-link should point to #980 to maintain accurate changelog cross-referencing.
Extended reasoning...
What the bug is: The perf-changelog.yaml entry added by this PR (#980) contains pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/942. The description says 'Update cli args of Qwen3.5 FP8 and BF16 SGLang benchmarks for MI355X to achieve better performance', which is precisely what this PR does.
The specific code path: The diff in this PR adds seven new CLI arguments to both qwen3.5_bf16_mi355x.sh and qwen3.5_fp8_mi355x.sh: --tokenizer-worker-num 6, --enable-aiter-allreduce-fusion, --cuda-graph-max-bs $CONC, --context-length $CONTEXT_LENGTH, --disable-radix-cache, --max-prefill-tokens $MAX_PREFILL_TOKENS, and --scheduler-recv-interval 30. These are the exact changes described in the perf-changelog entry.
Why existing code doesn't prevent it: The perf-changelog is a manually maintained YAML file. There's no automated validation that a pr-link matches the PR number that introduces the described changes.
Addressing the refutation: The refutation argues that because the PR title contains '[Do not merge]', PR #942 is the intended merge vehicle, making the pr-link intentional. However, this reasoning is speculative. Looking at the existing changelog history, PR #942 has no prior entry—there is an entry pointing to PR #943 (a separate SGLang image update), but nothing for #942. If #942 were the real merge vehicle, it would already have its own changelog entry reflecting any prior changes. The '[Do not merge]' prefix is commonly used for testing/review branches, and doesn't conclusively establish that #942 will absorb these exact changes. If this PR is ever merged or its changes cherry-picked into another PR with a different number, the #942 link will be a broken cross-reference.
Impact: If the changelog entry is merged with a link to #942, anyone trying to trace the origin of these CLI arg changes will land on the wrong PR. This is a documentation/traceability issue with no functional impact.
How to fix: Change pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/942 to pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/980 (or whichever PR number this branch ultimately merges as).
Step-by-step proof:
- PR [Do not merge][AMD]Update cli args for qwen3.5 #980 modifies
qwen3.5_bf16_mi355x.shadding 7 new CLI args. - PR [Do not merge][AMD]Update cli args for qwen3.5 #980 modifies
qwen3.5_fp8_mi355x.shadding the same 7 CLI args. - PR [Do not merge][AMD]Update cli args for qwen3.5 #980 adds a perf-changelog entry describing 'Update cli args of Qwen3.5 FP8 and BF16 SGLang benchmarks for MI355X'.
- The pr-link in that entry is
pull/942, notpull/980. - Searching the existing perf-changelog shows no prior entry for PR [WIP][AMD]feat: enhance Qwen benchmark scripts with additional parameters #942, ruling out that [WIP][AMD]feat: enhance Qwen benchmark scripts with additional parameters #942 is a known related PR.
|
/sweep test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys qwen3.5-bf16-mi355x-sglang qwen3.5-fp8-mi355x-sglang |
|
@zhentaocc Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23739126783 |
|
/sweep test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys qwen3.5-bf16-mi355x-sglang qwen3.5-fp8-mi355x-sglang |
|
@zhentaocc Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23751700196 |
|
@zhentaocc these runs are hanging and clogging up the CI queue: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23751700196/job/69195336352 |
| --tokenizer-worker-num 6 \ | ||
| --enable-aiter-allreduce-fusion \ | ||
| --cuda-graph-max-bs $CONC \ | ||
| --context-length $CONTEXT_LENGTH \ | ||
| --disable-radix-cache \ | ||
| --max-prefill-tokens $MAX_PREFILL_TOKENS \ | ||
| --scheduler-recv-interval 30 \ |
There was a problem hiding this comment.
awesome work! @zhentaocc
can u create a PR first in the sglang cookbook https://github.com/sgl-project/sgl-cookbook before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
seems like the sglang recipe dont have it yet
https://cookbook.sglang.io/autoregressive/Qwen/Qwen3.5

|
/sweep test-config --config-files .github/configs/amd-master.yaml --runner-config .github/configs/runners.yaml --config-keys qwen3.5-bf16-mi355x-sglang qwen3.5-fp8-mi355x-sglang |
|
@zhentaocc Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23790183854 |
* Added CONTEXT_LENGTH and MAX_PREFILL_TOKENS variables for better configuration. * Updated launch_server command with new options: --tokenizer-worker-num, --enable-aiter-allreduce-fusion, --cuda-graph-max-bs, --context-length, --disable-radix-cache, --max-prefill-tokens, and --scheduler-recv-interval.
… benchmark configurations for MI355X, enhancing performance with updated CLI arguments.
….yaml to v0.5.9, ensuring compatibility with recent changes.
… and BF16 SGLang benchmarks on MI355X, ensuring accurate tracking of performance enhancements.
… configurations and adjust perf-changelog.yaml to reflect the changes, ensuring accurate performance tracking and compatibility.
…ngelog.yaml to reflect improved CLI arguments for MI355X, ensuring better performance tracking.
…ter and adjusting memory fraction. Updated launch_server command to include data-parallel-size and improved context length handling for better performance.
…chmarks, increasing conc-end values and adding new entries for improved performance tuning on MI355X and MI300X.
…cripts for MI355X to streamline configuration and improve performance.
a350dc3 to
6fe8422
Compare
New perf data:
BF16 local results
conc 64, 1k1k, TPUT 501.29->661.46 tokens/s/gpu, 31.95% boost. @functionstackx
FP8 local test results
conc 64, 1k1k, TPUT 708.75tokens/s/gpu @functionstackx