Skip to content

[Issue]: vLLM serve with tensor-parallel-size=8 on Kubernetes + vGPU fails: NCCL TCPStore broken pipe, EngineCore initialization failed #42389 #2170

@sumufuyun

Description

@sumufuyun

How is this issue impacting you?

Lower performance than expected

Share Your Debug Logs

1

Steps to Reproduce the Issue

vllm-project/vllm#42389

NCCL Version

2.28.9

Your platform details

No response

Error Message & Behavior

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions