Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions docs/source/adv_install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,60 @@ In the case that Cytnx is installed locally from binary build, not from anaconda
CYTNX_ROOT is the path where Cytnx is installed from binary build.


Build troubleshooting
*************************************

CUDA device link fails with ``elfLink linker library load error``
-------------------------------------------------------------------------------------

**Symptom.** A CUDA-enabled build (``-DUSE_CUDA=ON``) configures successfully,
but the CUDA *device link* step aborts with::

nvlink fatal : elfLink linker library load error

On non-Apple builds Cytnx turns on interprocedural optimization
(``CMAKE_INTERPROCEDURAL_OPTIMIZATION``) which, together with CUDA separable
compilation, enables CUDA *device* link-time optimization (``nvcc -dlto``). The
device link step then asks ``nvlink`` to load the NVVM library, and the error
above means it could not.

**Cause.** This is *not* caused by the empty ``libpthread.a`` / ``librt.a`` /
``libdl.a`` stub archives that glibc 2.34+ ships -- those are tolerated by
``nvlink``. It is a layout problem specific to the Debian/Ubuntu
``nvidia-cuda-toolkit`` apt package. That package installs ``libnvvm.so`` into
the multiarch directory ``/usr/lib/x86_64-linux-gnu/`` but does not place it
under the toolkit's ``lib64`` directory, which is where ``nvcc`` tells
``nvlink`` to look. ``nvcc`` passes ``-nvvmpath=/usr/lib/nvidia-cuda-toolkit``,
so ``nvlink`` tries to open ``/usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so``
and finds nothing. Regular (non-LTO) device linking does not load NVVM, which is
why the failure appears only once device LTO is enabled.

**Fix (recommended): use a complete CUDA toolkit.** Install CUDA from conda or
NVIDIA's official installer instead of the distribution's
``nvidia-cuda-toolkit`` package, and make sure its ``nvcc`` is first on
``PATH``:

.. code-block:: shell

$conda install -c nvidia cuda
Comment thread
IvanaGyro marked this conversation as resolved.

A toolkit laid out this way keeps ``libnvvm.so`` under ``nvvm/lib64`` where
``nvlink`` expects it, so device LTO works with no further action. This is also
the layout the CUDA build presets assume.

**Workaround: keep the apt package and add the missing path.** If you must build
against the distribution package, create the directory ``nvlink`` searches and
symlink the packaged ``libnvvm.so`` into it:

.. code-block:: shell

$sudo mkdir -p /usr/lib/nvidia-cuda-toolkit/lib64
$sudo ln -s /usr/lib/x86_64-linux-gnu/libnvvm.so /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Link to the versioned libnvvm soname

On stock Debian/Ubuntu apt installs, libnvvm4 provides /usr/lib/x86_64-linux-gnu/libnvvm.so.4 and .4.0.0, not an unversioned /usr/lib/x86_64-linux-gnu/libnvvm.so (Ubuntu jammy, Debian sid). In that environment this ln -s command creates a dangling /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so, so the documented workaround still leaves nvlink unable to load NVVM. Please point the symlink at the actual versioned file found by find or make the command use that discovered path.

Useful? React with 👍 / 👎.


Then re-run the build. On non-x86_64 hosts the multiarch directory differs;
locate the real library first with ``find /usr -name 'libnvvm.so*'``.


Check Cytnx version
*************************************
The current version of the library can be printed by:
Expand Down
Loading