Skip to content

docs: troubleshoot CUDA device-LTO elfLink failure on apt nvidia-cuda-toolkit#876

Open
IvanaGyro wants to merge 1 commit into
masterfrom
claude/docs-cuda-lto-troubleshooting
Open

docs: troubleshoot CUDA device-LTO elfLink failure on apt nvidia-cuda-toolkit#876
IvanaGyro wants to merge 1 commit into
masterfrom
claude/docs-cuda-lto-troubleshooting

Conversation

@IvanaGyro
Copy link
Copy Markdown
Collaborator

@IvanaGyro IvanaGyro commented Jun 3, 2026

Summary

Adds a Build troubleshooting section to docs/source/adv_install.rst documenting the CUDA device-LTO failure:

nvlink fatal : elfLink linker library load error

This surfaced while preparing a CUDA build on Ubuntu and was initially misattributed to glibc 2.34+ empty stub archives. The doc records the real cause and the fix so others don't chase the same dead end.

What the section explains

  • Symptom — a -DUSE_CUDA=ON build configures fine but the device-link step aborts with the elfLink error. It only appears because Cytnx enables CMAKE_INTERPROCEDURAL_OPTIMIZATION on non-Apple builds, which (with CUDA separable compilation) turns on CUDA device LTO (nvcc -dlto).
  • Cause — the Debian/Ubuntu nvidia-cuda-toolkit apt package ships libnvvm.so in /usr/lib/x86_64-linux-gnu/ but not under the toolkit's lib64/, where nvcc points nvlink via -nvvmpath=/usr/lib/nvidia-cuda-toolkit. nvlink then can't open /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so. The empty glibc stub archives are explicitly called out as not the cause (nvlink tolerates them).
  • Fix (recommended) — install a complete CUDA toolkit via conda (conda install -c nvidia cuda) or NVIDIA's installer, which keep libnvvm.so under nvvm/lib64.
  • Workaround — symlink the packaged libnvvm.so into the path nvlink searches.

Evidence

All verified on glibc 2.39: device LTO links cleanly on conda CUDA 12.0–12.9 (even with empty stubs on the line) and on a standard install; it fails only on the apt-packaged 12.0, with or without stubs. strace confirmed the single failing openat is /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so, and adding that one symlink makes the link succeed.

Notes

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a "Build troubleshooting" section to the advanced installation documentation, explaining how to resolve a CUDA device link error (elfLink linker library load error) caused by a layout issue in the Debian/Ubuntu nvidia-cuda-toolkit package. The feedback suggests formatting the shell code blocks with a space after the $ prompt for consistency, and using the unversioned libnvvm.so as the symlink source to make the workaround more robust and future-proof.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread docs/source/adv_install.rst
Comment thread docs/source/adv_install.rst Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 29.49%. Comparing base (ee9d856) to head (f0dc9c1).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #876   +/-   ##
=======================================
  Coverage   29.49%   29.49%           
=======================================
  Files         241      241           
  Lines       35512    35512           
  Branches    14777    14777           
=======================================
  Hits        10475    10475           
  Misses      17784    17784           
  Partials     7253     7253           
Flag Coverage Δ
cpp 29.09% <ø> (ø)
python 52.71% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
C++ backend 30.74% <ø> (ø)
Python bindings 17.09% <ø> (ø)
Python package 52.71% <ø> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee9d856...f0dc9c1. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…lkit

Add a "Build troubleshooting" section to the advanced install guide covering
"nvlink fatal : elfLink linker library load error" during a CUDA-enabled build.

On non-Apple builds Cytnx enables CMAKE_INTERPROCEDURAL_OPTIMIZATION, which
together with CUDA separable compilation turns on CUDA device LTO (nvcc -dlto).
The device link step loads NVVM. The Debian/Ubuntu nvidia-cuda-toolkit apt
package installs libnvvm.so under /usr/lib/x86_64-linux-gnu/ but not under the
toolkit's lib64 directory, where nvcc points nvlink via
-nvvmpath=/usr/lib/nvidia-cuda-toolkit; nvlink then fails to open
/usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so. The empty glibc 2.34+ stub
archives (libpthread.a / librt.a / libdl.a) are unrelated and are tolerated by
nvlink, so they are explicitly called out as not being the cause.

Document the recommended fix (install a complete CUDA toolkit from conda or
NVIDIA so libnvvm.so sits in nvvm/lib64) and a symlink workaround for the
distribution package.

Co-authored-by: Claude <noreply@anthropic.com>
@IvanaGyro IvanaGyro force-pushed the claude/docs-cuda-lto-troubleshooting branch from 58c6c3b to f0dc9c1 Compare June 3, 2026 11:52
@IvanaGyro IvanaGyro marked this pull request as ready for review June 3, 2026 16:16
@IvanaGyro IvanaGyro requested review from manuschneider and pcchen June 3, 2026 16:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f0dc9c1986

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

.. code-block:: shell

$sudo mkdir -p /usr/lib/nvidia-cuda-toolkit/lib64
$sudo ln -s /usr/lib/x86_64-linux-gnu/libnvvm.so /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Link to the versioned libnvvm soname

On stock Debian/Ubuntu apt installs, libnvvm4 provides /usr/lib/x86_64-linux-gnu/libnvvm.so.4 and .4.0.0, not an unversioned /usr/lib/x86_64-linux-gnu/libnvvm.so (Ubuntu jammy, Debian sid). In that environment this ln -s command creates a dangling /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so, so the documented workaround still leaves nvlink unable to load NVVM. Please point the symlink at the actual versioned file found by find or make the command use that discovered path.

Useful? React with 👍 / 👎.

@pcchen
Copy link
Copy Markdown
Collaborator

pcchen commented Jun 7, 2026

Review: PR #876 — "docs: troubleshoot CUDA device-LTO elfLink failure on apt nvidia-cuda-toolkit"

Overview

Docs-only addition of a "Build troubleshooting" section covering the nvlink fatal: elfLink linker library load error that hits users building with the Debian/Ubuntu nvidia-cuda-toolkit apt package and CUDA device LTO enabled.


Accuracy

The technical content is correct and well-researched:

  • Root cause (apt package puts libnvvm.so in /usr/lib/x86_64-linux-gnu/ instead of lib64/) is accurate and precisely described.
  • "Not the empty glibc stub archives" callout is valuable — this is a common red herring and explicitly ruling it out saves debugging time.
  • Conda fix and symlink workaround are both sound. The find /usr -name 'libnvvm.so*' tip for non-x86_64 hosts is a good touch.
  • The comment that regular (non-LTO) device linking doesn't load NVVM correctly explains why this only appears after CMAKE_INTERPROCEDURAL_OPTIMIZATION is on.

Two minor issues

1. Missing space after $ prompt in code blocks

Both shell examples are formatted as $conda ... and $sudo ... (no space), which looks like a variable expansion rather than a shell prompt. The rest of adv_install.rst uses $ command (with a space). Should be:

    $ conda install -c nvidia cuda
    $ sudo mkdir -p /usr/lib/nvidia-cuda-toolkit/lib64
    $ sudo ln -s /usr/lib/x86_64-linux-gnu/libnvvm.so /usr/lib/nvidia-cuda-toolkit/lib64/libnvvm.so

2. Missing third option: disable device LTO

Users who need a quick escape hatch without changing their toolkit installation could simply disable LTO at configure time. Worth adding a brief note after the workaround:

**Alternative: disable device LTO.** If neither option above is practical,
configure with ``-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=OFF`` to skip device LTO
entirely. The build will complete but without link-time optimizations.

RST structure

Section hierarchy (* for level 1, - for subsection) is consistent with the rest of the file. Underline lengths are correct. Blank lines after titles are present. No structural issues.


Summary

Item Status
Technical accuracy ✅ Correct and well-verified
glibc stub red-herring callout ✅ Valuable addition
RST structure ✅ Consistent
$conda / $sudo prompt spacing ⚠️ Missing space — should be $ conda, $ sudo
No mention of -DCMAKE_INTERPROCEDURAL_OPTIMIZATION=OFF escape hatch ⚠️ Worth adding

Posted by Claude Code on behalf of @pcchen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants