I am encountering issues running nvkind on fedora 42 and latest docker with nvidia-smi 580.95.05
These are the latest drivers and nvidia-container-toolkit provided by rpmfusion.
❯ nvidia-ctk --version
NVIDIA Container Toolkit CLI version 1.17.4
❯ uname -a
Linux hostname 6.16.12-200.fc42.x86_64
Following the README instructions, the first problem I encounter is that the instructions tell me to configure sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled. This gives me a /etc/docker/daemon.json that sets default runtime to nvidia. Running a containerized nvidia-smi with --gpus or --device flags does not work because nvidia-container-runtime is missing. It is not provided by the latest nvidia-container-toolkit package, presumably because this runtime implementation has been retired in favor of CDI.
If I remove the changes to daemon.json and use CDI (--device) with the default runtime, I can run a containerized nvidia-smi and move on to creating an nvkind cluster. That's when I run into another error:
> nvkind cluster create
[...]
time="2025-10-24T13:35:24Z" level=info msg="Using config version 3"
time="2025-10-24T13:35:24Z" level=info msg="Using CRI runtime plugin name \"io.containerd.cri.v1.runtime\""
time="2025-10-24T13:35:24Z" level=info msg="Wrote updated config to /etc/containerd/config.d/99-nvidia.toml"
time="2025-10-24T13:35:24Z" level=info msg="It is recommended that containerd daemon be restarted."
umount: /proc/driver/nvidia: not mounted
F1024 09:35:24.918249 3194125 main.go:45] Error: patching /proc/driver/nvidia on node 'nvkind-xjqfc-worker': running script on nvkind-xjqfc-worker: executing command: exit status 1
Inside the worker, /proc/driver/nvidia is populated, but there's no mount directly there. That implementation detail seems to be specific to nvidia-container-runtime.
Maybe I'm overlooking some option that makes nvkind work with CDI rather than the nvidia-container-runtime? Or is the expectation that I get an older NCT version from other source? But it seems to me like nvidia-container-runtime is on its way out?
I am encountering issues running nvkind on fedora 42 and latest docker with nvidia-smi 580.95.05
These are the latest drivers and
nvidia-container-toolkitprovided by rpmfusion.Following the README instructions, the first problem I encounter is that the instructions tell me to configure
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled. This gives me a/etc/docker/daemon.jsonthat sets default runtime tonvidia. Running a containerizednvidia-smiwith--gpusor--deviceflags does not work becausenvidia-container-runtimeis missing. It is not provided by the latestnvidia-container-toolkitpackage, presumably because this runtime implementation has been retired in favor of CDI.If I remove the changes to
daemon.jsonand use CDI (--device) with the default runtime, I can run a containerizednvidia-smiand move on to creating an nvkind cluster. That's when I run into another error:Inside the worker,
/proc/driver/nvidiais populated, but there's no mount directly there. That implementation detail seems to be specific tonvidia-container-runtime.Maybe I'm overlooking some option that makes nvkind work with CDI rather than the
nvidia-container-runtime? Or is the expectation that I get an older NCT version from other source? But it seems to me likenvidia-container-runtimeis on its way out?