Skip to content

Move Stream Sync events to a new row in JSON trace export (#1356)#1356

Closed
jiannanWang wants to merge 1 commit intopytorch:mainfrom
jiannanWang:export-D100110463
Closed

Move Stream Sync events to a new row in JSON trace export (#1356)#1356
jiannanWang wants to merge 1 commit intopytorch:mainfrom
jiannanWang:export-D100110463

Conversation

@jiannanWang
Copy link
Copy Markdown
Contributor

@jiannanWang jiannanWang commented Apr 9, 2026

Summary:

Stream Sync events (from cudaStreamSynchronize) are placed on the same GPU
stream row as kernel events in Chrome Trace JSON output. Since Stream Sync
uses CPU-side timestamps that overlap with still-running kernels, Perfetto
drops the overlapping events, making Stream Sync invisible in the trace
viewer.

Fix: In the JSON export path only, emit Stream Sync events on a virtual tid
(streamId + 1000000) with a "stream N (sync)" label. The same sort_index as
the original stream places the sync row right below it in Perfetto. The
in-memory event data and protobuf export are unchanged.

Reviewed By: ryanzhang22

Differential Revision: D100110463

@meta-cla meta-cla bot added the cla signed label Apr 9, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 9, 2026

@jiannanWang has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100110463.

@meta-codesync meta-codesync bot changed the title Move Stream Sync events to a new row in JSON trace export Move Stream Sync events to a new row in JSON trace export (#1356) Apr 14, 2026
jiannanWang added a commit to jiannanWang/kineto that referenced this pull request Apr 14, 2026
Summary:

Stream Sync events (from cudaStreamSynchronize) are placed on the same GPU
stream row as kernel events in Chrome Trace JSON output. Since Stream Sync
uses CPU-side timestamps that overlap with still-running kernels, Perfetto
drops the overlapping events, making Stream Sync invisible in the trace
viewer.

Fix: In the JSON export path only, emit Stream Sync events on a virtual tid
(streamId + 1000000) with a "stream N (sync)" label. The same sort_index as
the original stream places the sync row right below it in Perfetto. The
in-memory event data and protobuf export are unchanged.

Differential Revision: D100110463
Summary:

Stream Sync events (from cudaStreamSynchronize) are placed on the same GPU
stream row as kernel events in Chrome Trace JSON output. Since Stream Sync
uses CPU-side timestamps that overlap with still-running kernels, Perfetto
drops the overlapping events, making Stream Sync invisible in the trace
viewer.

Fix: In the JSON export path only, emit Stream Sync events on a virtual tid
(streamId + 1000000) with a "stream N (sync)" label. The same sort_index as
the original stream places the sync row right below it in Perfetto. The
in-memory event data and protobuf export are unchanged.

Reviewed By: ryanzhang22

Differential Revision: D100110463
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 16, 2026

This pull request has been merged in 041e7ce.

scotts added a commit to scotts/pytorch that referenced this pull request Apr 16, 2026
Includes the following commits:

- Fix stream wait events referencing future correlation IDs (pytorch/kineto#1339) 23b5bb5
- Remove kineto tb_plugin directory entirely (pytorch/kineto#1368) 9497960
- Move Stream Sync events to a new row in JSON trace export (pytorch/kineto#1356) 041e7ce
- Expose isGpuCollectionStopped() through Kineto's public API (pytorch/kineto#1367) 17708f5
- Fix toggle test (pytorch/kineto#1369) ee2103c
- Link to correct fmt repo (pytorch/kineto#1345) 3447834
- Fix data race on CuptiActivityApi::externalCorrelationEnabled_ (pytorch/kineto#1365) 0e86499
- Stop allocating CUPTI buffers after exceeding max buffer count (pytorch/kineto#1362) 666f62c
- Add XPU workflow (pytorch/kineto#1302) 11cc1e0
- Remove RocprofActivity.h/RoctracerActivity.h from RocmActivityProfiler.h (pytorch/kineto#1357) 896068d
- Split ActivityProfilerController into Sync and Async Handlers (pytorch/kineto#1269) 6d7f045
- Add priority field to kernel metadata (pytorch/kineto#1361) f2a7423
- Add kineto-release skill (pytorch/kineto#1360) 675b6cd

Authored with Claude.
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Apr 17, 2026
Includes the following commits:

- Fix stream wait events referencing future correlation IDs (pytorch/kineto#1339) 23b5bb5
- Remove kineto tb_plugin directory entirely (pytorch/kineto#1368) 9497960
- Move Stream Sync events to a new row in JSON trace export (pytorch/kineto#1356) 041e7ce
- Expose isGpuCollectionStopped() through Kineto's public API (pytorch/kineto#1367) 17708f5
- Fix toggle test (pytorch/kineto#1369) ee2103c
- Link to correct fmt repo (pytorch/kineto#1345) 3447834
- Fix data race on CuptiActivityApi::externalCorrelationEnabled_ (pytorch/kineto#1365) 0e86499
- Stop allocating CUPTI buffers after exceeding max buffer count (pytorch/kineto#1362) 666f62c
- Add XPU workflow (pytorch/kineto#1302) 11cc1e0
- Remove RocprofActivity.h/RoctracerActivity.h from RocmActivityProfiler.h (pytorch/kineto#1357) 896068d
- Split ActivityProfilerController into Sync and Async Handlers (pytorch/kineto#1269) 6d7f045
- Add priority field to kernel metadata (pytorch/kineto#1361) f2a7423
- Add kineto-release skill (pytorch/kineto#1360) 675b6cd

Authored with Claude.
Pull Request resolved: #180606
Approved by: https://github.com/ryanzhang22, https://github.com/Skylion007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant