Skip to content

[Question]: Does NCCL have any plan to provide QP counter metrics of GDAKI 's QPs? #2181

@baymaxhuang

Description

@baymaxhuang

Question

AFAIK, the QPs of GDAKI are finally created by DEVX, and we could not see any counters of DEVX's QP in /sys/class/infiniband/mlx5_xxx/ports/1/hw_counters/ such as out_of_sequence, req_cqe_error and local_ack_timeout_err. If we want to get the counter of DEVX's QPs, we should bind a qp counter set id to the QPs created by DEVX. Does NCCL have any plan to provide the QP counter metrics of GDAKI 's QPs so that we could easily observe the network issues when using GDAKI?

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions