Skip to content

Fix toggle test#1369

Closed
scotts wants to merge 3 commits intopytorch:mainfrom
scotts:fix_toggle_test
Closed

Fix toggle test#1369
scotts wants to merge 3 commits intopytorch:mainfrom
scotts:fix_toggle_test

Conversation

@scotts
Copy link
Copy Markdown
Contributor

@scotts scotts commented Apr 16, 2026

Fix test_dynamic_toggle failure caused by toggle's own CUDA events leaking into trace.

PR #1362 (D100731798) fixed a bug in activityBuffers() where readyGpuTraceBuffers_ was ignored when allocatedGpuTraceBuffers_ was empty. A side effect is that CUPTI buffers flushed during toggleCollectionDynamic(false) are now correctly returned during trace processing. This surfaced a latent issue: synchronizeGpuDevice() calls cudaDeviceSynchronize() while CUPTI is still enabled, so CUPTI captures that call as a "cudaDeviceSynchronize" runtime activity. The event contains "cuda" in its name and fails the PyTorch test_dynamic_toggle assertion that no GPU events appear after the toggle.

Fix by disabling CUPTI activity collection before the device sync when toggling off. The sync still serves its purpose — draining in-flight GPU work and flushing pre-existing CUPTI buffers — but the sync call itself is no longer instrumented.

@meta-cla meta-cla bot added the cla signed label Apr 16, 2026
scotts added 2 commits April 15, 2026 20:36
…aking into trace

PR pytorch#1362 (D100731798) fixed a bug in activityBuffers() where
readyGpuTraceBuffers_ was ignored when allocatedGpuTraceBuffers_ was empty.
A side effect is that CUPTI buffers flushed during
toggleCollectionDynamic(false) are now correctly returned during trace
processing. This surfaced a latent issue: synchronizeGpuDevice() calls
cudaDeviceSynchronize() while CUPTI is still enabled, so CUPTI captures
that call as a "cudaDeviceSynchronize" runtime activity. The event contains
"cuda" in its name and fails the PyTorch test_dynamic_toggle assertion that
no GPU events appear after the toggle.

Fix by disabling CUPTI activity collection before the device sync when
toggling off. The sync still serves its purpose — draining in-flight GPU
work and flushing pre-existing CUPTI buffers — but the sync call itself
is no longer instrumented.
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 16, 2026

@scotts has imported this pull request. If you are a Meta employee, you can view this in D101110738.

@scotts scotts marked this pull request as ready for review April 16, 2026 11:01
@meta-codesync meta-codesync bot closed this in ee2103c Apr 16, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 16, 2026

@scotts merged this pull request in ee2103c.

scotts added a commit to scotts/pytorch that referenced this pull request Apr 16, 2026
Includes the following commits:

- Fix stream wait events referencing future correlation IDs (pytorch/kineto#1339) 23b5bb5
- Remove kineto tb_plugin directory entirely (pytorch/kineto#1368) 9497960
- Move Stream Sync events to a new row in JSON trace export (pytorch/kineto#1356) 041e7ce
- Expose isGpuCollectionStopped() through Kineto's public API (pytorch/kineto#1367) 17708f5
- Fix toggle test (pytorch/kineto#1369) ee2103c
- Link to correct fmt repo (pytorch/kineto#1345) 3447834
- Fix data race on CuptiActivityApi::externalCorrelationEnabled_ (pytorch/kineto#1365) 0e86499
- Stop allocating CUPTI buffers after exceeding max buffer count (pytorch/kineto#1362) 666f62c
- Add XPU workflow (pytorch/kineto#1302) 11cc1e0
- Remove RocprofActivity.h/RoctracerActivity.h from RocmActivityProfiler.h (pytorch/kineto#1357) 896068d
- Split ActivityProfilerController into Sync and Async Handlers (pytorch/kineto#1269) 6d7f045
- Add priority field to kernel metadata (pytorch/kineto#1361) f2a7423
- Add kineto-release skill (pytorch/kineto#1360) 675b6cd

Authored with Claude.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant