You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This update modifies the `gpu/install_gpu_driver.sh` script to enhance support for newer CUDA versions and GPU architectures, including initial support for NVIDIA Blackwell.
## Changes:
1. **Expanded CUDA Version Support:**
* Added mappings for CUDA versions 12.8, 12.9, 13.0, and 13.1 to the `DRIVER_FOR_CUDA`, `DRIVER_SUBVER`, `CUDNN_FOR_CUDA`, `NCCL_FOR_CUDA`, and `CUDA_SUBVER` arrays. This enables the script to select appropriate driver, CuDNN, and NCCL versions for these newer CUDA toolkits.
2. **Updated Default CUDA Version:**
* Changed the `DEFAULT_CUDA_VERSION` for Dataproc 2.2 and 2.3 image versions to `13.0.1`. This makes CUDA 13 the default for newer Dataproc images, likely to improve compatibility with the latest GPU hardware.
3. **Refined NCCL Build Flags (`NVCC_GENCODE`):**
* The logic for setting `NVCC_GENCODE` in the `install_nvidia_nccl` function has been updated to be more granular based on the CUDA version.
* Volta architectures (`sm_70`, `sm_72`) are now only included for CUDA versions less than 13.0.
* Blackwell architecture (`sm_110`) is now included for CUDA versions greater than or equal to 13.0.
* The commented-out section for `sm_101` remains, suggesting it's not yet fully supported or tested.
4. **Script Robustness:**
* In `install_nvidia_userspace_runfile`, the variables `local_tarball` and `gcs_tarball` are now explicitly initialized to empty strings, preventing potential unbound variable errors.
* The `make clean` command within the `install_nvidia_nccl` function is now non-fatal. If it fails, a warning is printed, but the script continues execution. This prevents build failures due to a missing `doc` directory, which doesn't affect the creation of the necessary Debian/RPM packages.
0 commit comments