Enable CUPTI teardown by default for CUDA 12.6+#1264
Enable CUPTI teardown by default for CUDA 12.6+#1264sraikund16 wants to merge 1 commit intopytorch:mainfrom
Conversation
Summary: Previously, CUPTI teardown (calling cuptiFinalize() after profiling) was only enabled when the TEARDOWN_CUPTI=1 env var was explicitly set. This meant users on CUDA 12.6+ still experienced post-profiling QPS degradation unless they knew to set the env var. This change makes teardown the default for CUDA 12.6+, where it is known to work reliably, while keeping it off by default for older versions. The TEARDOWN_CUPTI env var continues to override the default in either direction for all CUDA versions. Differential Revision: D93661386
|
@sraikund16 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93661386. |
aaronenyeshi
left a comment
There was a problem hiding this comment.
Review automatically exported from Phabricator review in Meta.
|
Hi @sraikund16! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
Summary: Previously, CUPTI teardown (calling cuptiFinalize() after profiling) was only enabled when the TEARDOWN_CUPTI=1 env var was explicitly set. This meant users on CUDA 12.6+ still experienced post-profiling QPS degradation unless they knew to set the env var. This change makes teardown the default for CUDA 12.6+, where it is known to work reliably, while keeping it off by default for older versions. The TEARDOWN_CUPTI env var continues to override the default in either direction for all CUDA versions.
Differential Revision: D93661386