Skip to content

starter task: MVP port mi355 deepseek disagg recipe to mi300 #982

@functionstackx

Description

@functionstackx

after porting mi355 to mi325, port to mi300

  1. dsr1-fp8-mi355x-sglang-disagg:
    image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0227-2
    model: deepseek-ai/DeepSeek-R1-0528
    model-prefix: dsr1
    runner: mi355x-disagg
    precision: fp8
    framework: sglang-disagg
    multinode: true
    disagg: true
    seq-len-configs:
    - isl: 1024
    osl: 1024
    search-space:
    # non-MTP configurations
    # "Top of curve" (1 prefill workers each at DEP8 and 1 decode workers at DEP16)
    - spec-decoding: "none"
    conc-list: [ 1024, 2048 ]
    prefill:
    num-worker: 1
    tp: 8
    ep: 1
    dp-attn: false
    additional-settings:
    - "PREFILL_NODES=1"
    decode:
    num-worker: 1
    tp: 8
    ep: 8
    dp-attn: true
    additional-settings:
    - "DECODE_NODES=2"
    - "DECODE_MTP_SIZE=0"
    # "Middle of curve" (1 prefill workers each at TP8 and 2 decode workers at DEP8)
    - spec-decoding: "none"
    conc-list: [ 1536, 1024, 512 ]
    prefill:
    num-worker: 1
    tp: 8
    ep: 1
    dp-attn: false
    additional-settings:
    - "PREFILL_NODES=1"
    decode:
    num-worker: 2
    tp: 8
    ep: 8
    dp-attn: true
    additional-settings:
    - "DECODE_NODES=2"
    - "DECODE_MTP_SIZE=0"
    (mi355 disagg fp8 deepseek for non-mtp & mtp) port over to mi325 (CDNA3)
  2. https://github.com/SemiAnalysisAI/InferenceX/blob/main/benchmarks/multi_node/dsr1_fp8_mi355x_sglang-disagg.sh (that then calls generate sweep py which calls this launcher script) . This uses (image: rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-0227-2 but @JordanNanos u probably need to find the mi30x evquilaent of this. check the upstream nightly images have MoRI included https://hub.docker.com/r/lmsysorg/sglang-daily/tags, if not build using this. https://github.com/akao-amd/sglang/blob/main/docker/rocm.Dockerfile . ensure that u build it with the correct NIC)
  3. which calls the files in here https://github.com/SemiAnalysisAI/InferenceX/tree/main/benchmarks/multi_node/amd_utils (which is based on bill's repo, it might be easier as first attempt to use bill's repo to locally run it https://github.com/billishyahao/sglang_disagg without the abstractions of runners/generate config .py/etc)

probably start doing 1k/1k on 1P1D first since it is an faster debugging loop

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions