Open
Conversation
Summary: # Context Add a comms_id to PyTorch profiler traces that uniquely identifies each collective/P2P communication operation across all ranks. This enables trace analysis tools to correlate the same operation across different ranks for debugging distributed training performance. How comms_id is computed comms_id = hash(pg_name, seqNumber, isP2P, globalRankStart, globalRankStride, worldSize) - pg_name — identifies the process group - seqNumber — per-PG operation counter, identifies which operation within the PG - isP2P — distinguishes P2P ops (send/recv) from collectives (allreduce, etc.), since they use separate sequence number counters - globalRankStart, globalRankStride, worldSize — encodes the communicator topology, disambiguating cases where one PG creates multiple communicators (e.g., comm splits) Changes by layer 1. Data model (ParamCommsUtils.hpp/.cpp) — Added seqNumber, isP2P fields to ParamCommsDebugInfo, the class that carries communication metadata through the profiling stack. 2. Hash computation (profiler/util.cpp/.h) — In saveNcclMeta(), computes comms_id from the 6 fields above and emits it as "Comms Id" in the profiler metadata map. 3. Trace output (output_json.cpp) — Kineto reads "Comms Id" from the metadata and writes it into the Chrome trace JSON, making it visible in trace viewers. 4. Tests (comms_id.cpp, CuptiActivityProfilerTest.cpp) — 9 unit tests covering: - Storage/retrieval of seqNumber and isP2P - Default values - End-to-end: comms_id appears in saveNcclMeta() output with correct hash - Determinism across instances - Uniqueness across different PG names, sequence numbers, P2P vs collective, and communicator topologies Differential Revision: D95659539
ycui1984
added a commit
to ycui1984/pytorch
that referenced
this pull request
Mar 7, 2026
Summary: X-link: pytorch/kineto#1286 # Context Add a comms_id to PyTorch profiler traces that uniquely identifies each collective/P2P communication operation across all ranks. This enables trace analysis tools to correlate the same operation across different ranks for debugging distributed training performance. How comms_id is computed comms_id = hash(pg_name, seqNumber, isP2P, globalRankStart, globalRankStride, worldSize) - pg_name — identifies the process group - seqNumber — per-PG operation counter, identifies which operation within the PG - isP2P — distinguishes P2P ops (send/recv) from collectives (allreduce, etc.), since they use separate sequence number counters - globalRankStart, globalRankStride, worldSize — encodes the communicator topology, disambiguating cases where one PG creates multiple communicators (e.g., comm splits) Changes by layer 1. Data model (ParamCommsUtils.hpp/.cpp) — Added seqNumber, isP2P fields to ParamCommsDebugInfo, the class that carries communication metadata through the profiling stack. 2. Hash computation (profiler/util.cpp/.h) — In saveNcclMeta(), computes comms_id from the 6 fields above and emits it as "Comms Id" in the profiler metadata map. 3. Trace output (output_json.cpp) — Kineto reads "Comms Id" from the metadata and writes it into the Chrome trace JSON, making it visible in trace viewers. 4. Tests (comms_id.cpp, CuptiActivityProfilerTest.cpp) — 9 unit tests covering: - Storage/retrieval of seqNumber and isP2P - Default values - End-to-end: comms_id appears in saveNcclMeta() output with correct hash - Determinism across instances - Uniqueness across different PG names, sequence numbers, P2P vs collective, and communicator topologies Test Plan: 1. added unit tests Differential Revision: D95659539
ycui1984
added a commit
to ycui1984/kineto
that referenced
this pull request
Mar 11, 2026
Summary: This is part of a larger effort to expose comms_id in profiler traces to enable correlating the same communication operation across different ranks. This diff adds the Kineto side: reading "Comms Id" from the profiler metadata and writing it into the Chrome trace JSON output. When the metadata key is present, it will be included in the trace event args, making it visible in trace viewers. The PyTorch side (computing and emitting the comms_id) will be in a follow-up diff after the Kineto submodule is updated. Differential Revision: D96153960
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Context
Add a comms_id to PyTorch profiler traces that uniquely identifies each collective/P2P
communication operation across all ranks. This enables trace analysis tools to correlate the same operation across different ranks for debugging distributed training performance.
How comms_id is computed
comms_id = hash(pg_name, seqNumber, isP2P, globalRankStart, globalRankStride, worldSize)
disambiguating cases where one PG creates multiple communicators (e.g., comm splits)
Changes by layer
Differential Revision: D95659539