-
Notifications
You must be signed in to change notification settings - Fork 22
docs: troubleshoot CUDA device-LTO elfLink failure on apt nvidia-cuda-toolkit #876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -342,6 +342,60 @@ In the case that Cytnx is installed locally from binary build, not from anaconda | |
| CYTNX_ROOT is the path where Cytnx is installed from binary build. | ||
|
|
||
|
|
||
| Build troubleshooting | ||
| ************************************* | ||
|
|
||
| CUDA device link fails with ``elfLink linker library load error`` | ||
| ------------------------------------------------------------------------------------- | ||
|
|
||
| **Symptom.** A CUDA-enabled build (``-DUSE_CUDA=ON``) configures successfully, | ||
| but the CUDA *device link* step aborts with:: | ||
|
|
||
| nvlink fatal : elfLink linker library load error | ||
|
|
||
| On non-Apple builds Cytnx turns on interprocedural optimization | ||
| (``CMAKE_INTERPROCEDURAL_OPTIMIZATION``) which, together with CUDA separable | ||
| compilation, enables CUDA *device* link-time optimization (``nvcc -dlto``). The | ||
| device link step then asks ``nvlink`` to load the NVVM library, and the error | ||
| above means it could not. | ||
|
|
||
| **Cause.** This is *not* caused by the empty ``libpthread.a`` / ``librt.a`` / | ||
| ``libdl.a`` stub archives that glibc 2.34+ ships -- those are tolerated by | ||
| ``nvlink``. It is a layout problem specific to the Debian/Ubuntu | ||
| ``nvidia-cuda-toolkit`` apt package. That package installs ``libnvvm.so`` into | ||
| the multiarch directory ``/usr/lib/x86_64-linux-gnu/`` but does not place it | ||
| under the toolkit's ``lib64`` directory, which is where ``nvcc`` tells | ||
| ``nvlink`` to look. ``nvcc`` passes ``-nvvmpath=/usr/lib/nvidia-cuda-toolkit``, | ||
| so ``nvlink`` tries to open ``/usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so`` | ||
| and finds nothing. Regular (non-LTO) device linking does not load NVVM, which is | ||
| why the failure appears only once device LTO is enabled. | ||
|
|
||
| **Fix (recommended): use a complete CUDA toolkit.** Install CUDA from conda or | ||
| NVIDIA's official installer instead of the distribution's | ||
| ``nvidia-cuda-toolkit`` package, and make sure its ``nvcc`` is first on | ||
| ``PATH``: | ||
|
|
||
| .. code-block:: shell | ||
|
|
||
| $conda install -c nvidia cuda | ||
|
|
||
| A toolkit laid out this way keeps ``libnvvm.so`` under ``nvvm/lib64`` where | ||
| ``nvlink`` expects it, so device LTO works with no further action. This is also | ||
| the layout the CUDA build presets assume. | ||
|
|
||
| **Workaround: keep the apt package and add the missing path.** If you must build | ||
| against the distribution package, create the directory ``nvlink`` searches and | ||
| symlink the packaged ``libnvvm.so`` into it: | ||
|
|
||
| .. code-block:: shell | ||
|
|
||
| $sudo mkdir -p /usr/lib/nvidia-cuda-toolkit/lib64 | ||
| $sudo ln -s /usr/lib/x86_64-linux-gnu/libnvvm.so /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
On stock Debian/Ubuntu apt installs, Useful? React with 👍 / 👎. |
||
|
|
||
| Then re-run the build. On non-x86_64 hosts the multiarch directory differs; | ||
| locate the real library first with ``find /usr -name 'libnvvm.so*'``. | ||
|
|
||
|
|
||
| Check Cytnx version | ||
| ************************************* | ||
| The current version of the library can be printed by: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.