Skip to content

Remove sh -c wrappers from operand-asset DaemonSets (Part 2) #2437

@rajathagasthya

Description

@rajathagasthya

Part of NVIDIA/cloud-native-team#299.

Once the operand images (mig-parted, nvidia-container-toolkit,
k8s-driver-manager, k8s-device-plugin) drop their /bin/sh busybox
symlink, the remaining sh -c wrappers in gpu-operator's operand
asset DaemonSets will break. These need to be converted to direct
binary invocations or to rmglob-style static helpers (modeled on
the rmglob introduced in PR #2434).

Scope (assets/state-*/):

  • state-driver/0500_daemonset.yamlnvidia-driver probe_nvidia_peermem, lsmod | grep nvidia_fs, lsmod | grep gdrdrv, rm -f /run/.../driver-ctr-ready preStop
  • state-vfio-manager/0500_daemonset.yamlvfio-manage bind --all && while true; do sleep …
  • state-mig-manager/0600_daemonset.yaml
  • state-vgpu-manager/0500_daemonset.yaml
  • state-vgpu-device-manager/0600_daemonset.yaml
  • state-sandbox-device-plugin/0500_daemonset.yaml
  • state-cc-manager/0500_daemonset.yaml
  • state-dcgm/0400_dcgm.yml
  • state-dcgm-exporter/0800_daemonset.yaml
  • state-mps-control-daemon/0400_daemonset.yaml
  • state-container-toolkit/0500_daemonset.yaml
  • state-device-plugin/0500_daemonset.yaml
  • gpu-feature-discovery/0500_daemonset.yaml

Acceptance:

  • All listed manifests no longer wrap operand binaries in sh -c
  • lsmod | grep <module> checks replaced by Go-based module
    checks or sentinel-file-based readiness
  • preStop rm -f calls replaced with rmglob or equivalent
    static binary
  • e2e against a real GPU node passes

Metadata

Metadata

Labels

enhancementImprovements to existing features, performance, or usability (not bug fixes or new features).

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions