Skip to content

Daemon freezes after some time #53

@lprobsth

Description

@lprobsth

On my system nvx works as expected after booting:

  • nvidia dGPU is disabled
  • nvx on works: turns dGPU on
  • nvx off works: turns dGPU off
  • nvx start works: launches program, turns on dGPU, and then turns off dGPU when program finishes

But after some time the daemon becomes unresponsive:

  File "/usr/bin/nvx", line 299, in <module>
    sock.recv(1024).decode("utf-8")
    ~~~~~~~~~^^^^^^

I looked at the source code and the log points into the "remove PCI device" direction.

This is what I can see after the daemon freezes:

  • the PCI device of the Nvidia dGPU is still available
  • the "remove" interface of the Nvidia dGPU is missing
  • the PCI bridge is still available
  • interacting with the "[bridge path]/power/control" (e.g. "auto") freezes the terminal
  • the "nvidia_drm", "nvidia_modeset", and "nvidia" kernel modules are still loaded

What I tried:

Removing the nvidia dGPU via PCI call:

echo 1 | sudo tee /sys/bus/pci/devices/0000:01:00.0/reset

This fails because the reset interface is missing (already reset?)

Powering down the PCI bridge:

echo auto | sudo tee /sys/bus/pci/devices/0000:00:1c.0/power/control

This hangs / freezes the console.

Unloading the kernel modules unfreezes the terminals and allows restarting nvx:

sudo modprobe --remove --remove-holders nvidia_drm
sudo modprobe --remove --remove-holders nvidia_modeset

I think for some reason nvx fails to unload the kernel modules before turning off the dGPU via PCI calls. Then the daemon freezes.
Restarting nvx leads to the daemon freezing up again because it does not unload the kernel modules on start.

Question:

  • Should the daemon unload the modules on start? (omit deadlock)
  • Why could the daemon fail to unload the modules on exit? (race condition?)
  • Is this caused by my setup? (for me it seems that the modules are not unloaded e.g. on start; also nvx normally works for some time; so it should not be a missing module in the nvx config?)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions