From 913971c5fd4391ea1755df59a4666454da323f44 Mon Sep 17 00:00:00 2001 From: Ivana Date: Wed, 3 Jun 2026 05:34:18 +0000 Subject: [PATCH 1/2] cmake: replace hardcoded CUDA native arch with portable fat-binary default MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous unconditional `set(CMAKE_CUDA_ARCHITECTURES native)` had three problems for redistributable PyPI wheels built on GPU-less CI runners: 1. `native` queries the local GPU at configure time, so the build fails outright on a machine with no NVIDIA device. 2. Even when it succeeds, `native` bakes in only the build host's architecture — the resulting wheel does not run on any other GPU generation. 3. Because it was a plain (non-cache) `set()`, it overrode any value supplied via -D, a CMakePresets.json cacheVariables entry, the CUDAARCHS environment variable, or the cache, so there was no way to override it without editing the file. Replace it with a guarded default that runs before enable_language(CUDA): if(NOT CMAKE_CUDA_ARCHITECTURES AND NOT DEFINED ENV{CUDAARCHS}) set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86-real 89-real 90-real 90-virtual) endif() enable_language(CUDA) The guard must run before enable_language(CUDA): afterwards CMAKE_CUDA_ARCHITECTURES is never empty (CMake fills in its own default), so the "not specified" case can no longer be detected. enable_language(CUDA) already reads the CUDAARCHS environment variable on its own, so the guard only has to avoid shadowing it — there is no need to copy CUDAARCHS into the variable, and no need to honor a CMAKE_CUDA_ARCHITECTURES environment variable (CMake defines no such variable; only CUDAARCHS is standard). A -D flag, the cache, or a preset's cacheVariables populate the normal/cache variable, so the NOT CMAKE_CUDA_ARCHITECTURES guard lets them win. The default targets Volta through Hopper (sm_70 is the minimum required by cuTENSOR/cuQuantum), with 90-virtual PTX so the driver can JIT-compile for GPUs newer than Hopper without a rebuild. Also drop the now-false "native" justification on the cmake_minimum_required line: with the native keyword gone, the comment is updated to note that cmake_language(EVAL CODE ...) is the feature setting the 3.18 lower bound, with 3.24 kept as the tested minimum. Closes #870. Co-authored-by: Claude --- CMakeLists.txt | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index a5a86523f..f5096314e 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -7,7 +7,7 @@ message(STATUS "") # ##################################################################### # ## CMAKE and CXX VERSION # ##################################################################### -cmake_minimum_required(VERSION 3.24) # require for the "native" value of CUDA_ARCHITECTURES +cmake_minimum_required(VERSION 3.24) # 3.18+ required for cmake_language(EVAL CODE ...); 3.24 is a tested minimum set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules") # Inria's morse_cmake provides an up-to-date FindLAPACKE (and helpers) that @@ -185,7 +185,18 @@ project(CYTNX VERSION ${CYTNX_VERSION} LANGUAGES CXX C) set(CMAKE_EXPORT_COMPILE_COMMANDS ON) if(USE_CUDA) - set(CMAKE_CUDA_ARCHITECTURES native) + # Default to a portable fat binary unless the caller picked architectures via + # -D, the cache, a preset's cacheVariables, or the CUDAARCHS environment + # variable. This must run before enable_language(CUDA): afterwards + # CMAKE_CUDA_ARCHITECTURES is never empty (CMake fills in its own default), so + # the "not specified" case can no longer be detected. enable_language(CUDA) + # reads CUDAARCHS on its own, so we only avoid shadowing it here, not copy it. + # The default embeds SASS for each supported real architecture (Volta sm_70 is + # the floor required by cuTENSOR/cuQuantum, up through Hopper sm_90) plus PTX + # of the newest (90-virtual) so the driver can JIT for newer/unknown GPUs. + if(NOT CMAKE_CUDA_ARCHITECTURES AND NOT DEFINED ENV{CUDAARCHS}) + set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86-real 89-real 90-real 90-virtual) + endif() enable_language(CUDA) # Disable generation of "--option-file" flag in compile_commands.json. # This workaround helps VSCode's cpptools extension correctly locate CUDA From 57b7618b844a7f192d8da756116b61e196bdbbd8 Mon Sep 17 00:00:00 2001 From: Ivana Date: Wed, 3 Jun 2026 09:23:56 +0000 Subject: [PATCH 2/2] cmake: require CMake 3.25 for CUDA device LTO support CMake 3.25 is the first release where CMAKE_INTERPROCEDURAL_OPTIMIZATION (and the INTERPROCEDURAL_OPTIMIZATION target property) activate CUDA device link-time optimization (nvcc -dlto) in addition to host C++ LTO. On earlier CMake versions the same setting silently produces no device LTO for CUDA targets, so the optimisation the build asks for via CMAKE_INTERPROCEDURAL_OPTIMIZATION=TRUE would be quietly dropped. Raise the floor from 3.24 to 3.25 so the requested device LTO is actually emitted rather than ignored. The previous 3.24 floor existed only for the removed "native" CUDA architecture keyword; the remaining version-sensitive feature, cmake_language(EVAL CODE ...), needs 3.18, which 3.25 also covers. Co-authored-by: Claude --- CMakeLists.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index f5096314e..0d99ea934 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -7,7 +7,7 @@ message(STATUS "") # ##################################################################### # ## CMAKE and CXX VERSION # ##################################################################### -cmake_minimum_required(VERSION 3.24) # 3.18+ required for cmake_language(EVAL CODE ...); 3.24 is a tested minimum +cmake_minimum_required(VERSION 3.25) # 3.25 added CUDA device LTO via INTERPROCEDURAL_OPTIMIZATION set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake/Modules") # Inria's morse_cmake provides an up-to-date FindLAPACKE (and helpers) that