Skip to content

feat(mesh): add v4 1p1d/2p1d slurm scripts with nightly docker image#1124

Open
Jasen2201 wants to merge 1 commit into
mainfrom
Jasen/add_dsv4_slurm
Open

feat(mesh): add v4 1p1d/2p1d slurm scripts with nightly docker image#1124
Jasen2201 wants to merge 1 commit into
mainfrom
Jasen/add_dsv4_slurm

Conversation

@Jasen2201
Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings June 8, 2026 02:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new SLURM job scripts under atom/mesh/scripts/ to run DeepSeek-V4-Pro PD-disaggregated (prefill/decode) benchmarks on ATOM using a specified nightly ROCm Docker image, including an optional GSM8K accuracy run and a serving benchmark loop.

Changes:

  • Add a 1P+1D (TP-only) SLURM script for DeepSeek-V4-Pro PD disaggregation on ATOM.
  • Add a 2P+1D (DP-attention enabled) SLURM script for DeepSeek-V4-Pro PD disaggregation on ATOM.
  • Include end-to-end orchestration: node discovery, container launch, server/router startup, readiness checks, GSM8K eval, and benchmarking.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
atom/mesh/scripts/ds_v4_2p_tp8_1d_tp8_atom_dpa_slurm.sh New 3-node (2 prefill + 1 decode) DP-attention PD-disaggregation SLURM workflow using a nightly Docker image.
atom/mesh/scripts/ds_v4_1p_tp8_1d_tp8_atom_tp_slurm.sh New 2-node (1 prefill + 1 decode) pure-TP PD-disaggregation SLURM workflow using a nightly Docker image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +21 to +23
# Usage:
# mkdir -p /it-share/yajizhan/slurm_logs
# sbatch ds_v4_2p_tp8_1d_tp8_atom_dpa_slurm.sh
Comment on lines +20 to +22
# Usage:
# mkdir -p /it-share/yajizhan/slurm_logs
# sbatch ds_v4_1p_tp8_1d_tp8_atom_tp_slurm.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants