Move Stream Sync events to a new row in JSON trace export (#1356)#1356
Closed
jiannanWang wants to merge 1 commit intopytorch:mainfrom
Closed
Move Stream Sync events to a new row in JSON trace export (#1356)#1356jiannanWang wants to merge 1 commit intopytorch:mainfrom
jiannanWang wants to merge 1 commit intopytorch:mainfrom
Conversation
|
@jiannanWang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100110463. |
bab6507 to
0c1c2d2
Compare
jiannanWang
added a commit
to jiannanWang/kineto
that referenced
this pull request
Apr 14, 2026
Summary: Stream Sync events (from cudaStreamSynchronize) are placed on the same GPU stream row as kernel events in Chrome Trace JSON output. Since Stream Sync uses CPU-side timestamps that overlap with still-running kernels, Perfetto drops the overlapping events, making Stream Sync invisible in the trace viewer. Fix: In the JSON export path only, emit Stream Sync events on a virtual tid (streamId + 1000000) with a "stream N (sync)" label. The same sort_index as the original stream places the sync row right below it in Perfetto. The in-memory event data and protobuf export are unchanged. Differential Revision: D100110463
Summary: Stream Sync events (from cudaStreamSynchronize) are placed on the same GPU stream row as kernel events in Chrome Trace JSON output. Since Stream Sync uses CPU-side timestamps that overlap with still-running kernels, Perfetto drops the overlapping events, making Stream Sync invisible in the trace viewer. Fix: In the JSON export path only, emit Stream Sync events on a virtual tid (streamId + 1000000) with a "stream N (sync)" label. The same sort_index as the original stream places the sync row right below it in Perfetto. The in-memory event data and protobuf export are unchanged. Reviewed By: ryanzhang22 Differential Revision: D100110463
0c1c2d2 to
4e3b58f
Compare
|
This pull request has been merged in 041e7ce. |
scotts
added a commit
to scotts/pytorch
that referenced
this pull request
Apr 16, 2026
Includes the following commits: - Fix stream wait events referencing future correlation IDs (pytorch/kineto#1339) 23b5bb5 - Remove kineto tb_plugin directory entirely (pytorch/kineto#1368) 9497960 - Move Stream Sync events to a new row in JSON trace export (pytorch/kineto#1356) 041e7ce - Expose isGpuCollectionStopped() through Kineto's public API (pytorch/kineto#1367) 17708f5 - Fix toggle test (pytorch/kineto#1369) ee2103c - Link to correct fmt repo (pytorch/kineto#1345) 3447834 - Fix data race on CuptiActivityApi::externalCorrelationEnabled_ (pytorch/kineto#1365) 0e86499 - Stop allocating CUPTI buffers after exceeding max buffer count (pytorch/kineto#1362) 666f62c - Add XPU workflow (pytorch/kineto#1302) 11cc1e0 - Remove RocprofActivity.h/RoctracerActivity.h from RocmActivityProfiler.h (pytorch/kineto#1357) 896068d - Split ActivityProfilerController into Sync and Async Handlers (pytorch/kineto#1269) 6d7f045 - Add priority field to kernel metadata (pytorch/kineto#1361) f2a7423 - Add kineto-release skill (pytorch/kineto#1360) 675b6cd Authored with Claude.
pytorchmergebot
pushed a commit
to pytorch/pytorch
that referenced
this pull request
Apr 17, 2026
Includes the following commits: - Fix stream wait events referencing future correlation IDs (pytorch/kineto#1339) 23b5bb5 - Remove kineto tb_plugin directory entirely (pytorch/kineto#1368) 9497960 - Move Stream Sync events to a new row in JSON trace export (pytorch/kineto#1356) 041e7ce - Expose isGpuCollectionStopped() through Kineto's public API (pytorch/kineto#1367) 17708f5 - Fix toggle test (pytorch/kineto#1369) ee2103c - Link to correct fmt repo (pytorch/kineto#1345) 3447834 - Fix data race on CuptiActivityApi::externalCorrelationEnabled_ (pytorch/kineto#1365) 0e86499 - Stop allocating CUPTI buffers after exceeding max buffer count (pytorch/kineto#1362) 666f62c - Add XPU workflow (pytorch/kineto#1302) 11cc1e0 - Remove RocprofActivity.h/RoctracerActivity.h from RocmActivityProfiler.h (pytorch/kineto#1357) 896068d - Split ActivityProfilerController into Sync and Async Handlers (pytorch/kineto#1269) 6d7f045 - Add priority field to kernel metadata (pytorch/kineto#1361) f2a7423 - Add kineto-release skill (pytorch/kineto#1360) 675b6cd Authored with Claude. Pull Request resolved: #180606 Approved by: https://github.com/ryanzhang22, https://github.com/Skylion007
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Stream Sync events (from cudaStreamSynchronize) are placed on the same GPU
stream row as kernel events in Chrome Trace JSON output. Since Stream Sync
uses CPU-side timestamps that overlap with still-running kernels, Perfetto
drops the overlapping events, making Stream Sync invisible in the trace
viewer.
Fix: In the JSON export path only, emit Stream Sync events on a virtual tid
(streamId + 1000000) with a "stream N (sync)" label. The same sort_index as
the original stream places the sync row right below it in Perfetto. The
in-memory event data and protobuf export are unchanged.
Reviewed By: ryanzhang22
Differential Revision: D100110463