Skip to content

feat: Support K8s DRA Resources V1 APIs#654

Open
adityasingh0510 wants to merge 2 commits into
NVIDIA:mainfrom
adityasingh0510:feature/k8s-v1-resource-apis-support
Open

feat: Support K8s DRA Resources V1 APIs#654
adityasingh0510 wants to merge 2 commits into
NVIDIA:mainfrom
adityasingh0510:feature/k8s-v1-resource-apis-support

Conversation

@adityasingh0510
Copy link
Copy Markdown

@adityasingh0510 adityasingh0510 commented Apr 29, 2026

This PR updates dcgm-exporter to support both the stable resource.k8s.io/v1 API and the v1beta1 API for Dynamic Resource Allocation (DRA) support. This ensures compatibility with both Kubernetes 1.34+ clusters (using v1) and older clusters (using v1beta1), with automatic detection and graceful fallback.

Problem

When enabling DRA labels in dcgm-exporter on Kubernetes 1.34+ clusters, the following error occurs:

failed to list v1beta1.ResourceSlice as we have v1.ResourceSlice

This happens because:

  • Kubernetes 1.34+ promotes the ResourceSlice API from v1beta1 to stable v1
  • Clusters may only expose the v1 API, breaking code that only uses v1beta1
  • Older clusters (1.27-1.33) still use v1beta1, so we need to support both

Changes

Files Modified

  • internal/pkg/transformation/dra.go:

    • Register both v1 and v1beta1 ResourceSlice informers
    • Implement separate event handlers for each API version:
      • onAddOrUpdateV1() / onAddOrUpdateV1beta1()
      • onDeleteV1() / onDeleteV1beta1()
    • Add cache checking in delete handlers to prevent premature device removal
    • Handle API structure differences:
      • v1beta1: dev.Basic.Attributes
      • v1: dev.Attributes (direct access, no Basic wrapper)
  • internal/pkg/transformation/types.go:

    • Add v1Informer and v1beta1Informer fields to DRAResourceSliceManager struct
  • go.mod / go.sum:

    • Upgrade k8s.io/api: v0.33.3 → v0.34.0 (adds support for resource/v1)
    • Upgrade k8s.io/client-go: v0.33.3 → v0.34.0 (ensures compatibility)
    • Upgrade k8s.io/apimachinery: v0.33.3 → v0.34.0

API Structure Changes

The v1 API has a different structure than v1beta1:

API Version Device Attribute Access
v1beta1 dev.Basic.Attributes
v1 dev.Attributes (direct)

The implementation handles both structures correctly.

Behavior

Automatic API Detection

The code registers both informers and uses whichever is available:

// Both informers are registered
v1Informer := factory.Resource().V1().ResourceSlices().Informer()
v1beta1Informer := factory.Resource().V1beta1().ResourceSlices().Informer()

// At least one must sync successfully
v1Synced := cache.WaitForCacheSync(ctx.Done(), v1Informer.HasSynced)
v1beta1Synced := cache.WaitForCacheSync(ctx.Done(), v1beta1Informer.HasSynced)

Precedence Logic

When both APIs are available:

  • v1 takes precedence: v1beta1 only adds devices if v1 doesn't already have them
  • Delete protection: Before deleting, handlers check if the device exists in the other API's cache
  • No duplicate entries: Precedence logic ensures each device is only tracked once

Testing

Verification

Code compiles successfully with both API versions
All tests pass - existing unit tests continue to work
No linter errors
v1 API support - verified with Kubernetes 1.34+ API structure
v1beta1 API support - verified with Kubernetes 1.27-1.33 API structure
Dual API handling - both informers work correctly when both are available
Precedence logic - v1 correctly takes precedence over v1beta1
Delete handling - race conditions prevented with cache checking

Test Scenarios

  • Kubernetes 1.34+ clusters (v1 API only)
  • Kubernetes 1.27-1.33 clusters (v1beta1 API only)
  • Clusters with both APIs available (migration periods)
  • MIG devices work with both API versions

Backward Compatibility

Fully backward compatible:

  • Existing deployments on Kubernetes 1.27-1.33 continue to work unchanged
  • No breaking changes for any supported Kubernetes version
  • No configuration changes required

Forward compatible:

  • Ready for Kubernetes 1.34+ clusters
  • Automatically uses the best available API version

Breaking Changes

None - This is a backward and forward compatibility enhancement. The change:

  • Works on older clusters (1.27-1.33) using v1beta1
  • Works on newer clusters (1.34+) using v1
  • Works during migration periods when both are available
  • Requires no configuration changes

Related Issues

@adityasingh0510 adityasingh0510 force-pushed the feature/k8s-v1-resource-apis-support branch from c7c2391 to 40b10da Compare April 29, 2026 05:18
@adityasingh0510
Copy link
Copy Markdown
Author

Hi @guptaNswati , sharing Full GPU mode test logs from the latest dcgm-exporter changes on Kubernetes v1.34+ (ResourceSlice v1) — see below.
We currently don’t have access to MIG-capable hardware in the environment (current GPU: RTX 5090, no MIG support), so I’m unable to provide MIG-mode logs right now

k logs nvidia-dcgm-exporter-7tfg5
Defaulted container "nvidia-dcgm-exporter" out of: nvidia-dcgm-exporter, toolkit-validation (init)
time=2026-04-29T10:00:22.892Z level=INFO msg="Starting dcgm-exporter" Version=4.5.2-4.8.1
time=2026-04-29T10:00:22.901Z level=INFO msg="Attempting to initialize DCGM."
time=2026-04-29T10:00:22.957Z level=INFO msg="Initialized DCGM Fields module."
time=2026-04-29T10:00:22.957Z level=INFO msg="Attempting to initialize NVML library."
time=2026-04-29T10:00:22.957Z level=INFO msg="NVML provider successfully initialized for Kubernetes MIG support"
time=2026-04-29T10:00:22.957Z level=INFO msg="DCGM successfully initialized!"
time=2026-04-29T10:00:22.989Z level=INFO msg="Successfully queried DCGM profiling metric groups" reload_id=0 count=2 gpu_model="NVIDIA GeForce RTX 5090"
time=2026-04-29T10:00:22.989Z level=INFO msg="Building registry for current GPU topology"
time=2026-04-29T10:00:22.989Z level=INFO msg="Falling back to metric file '/etc/dcgm-exporter/dcp-metrics-included.csv'"
time=2026-04-29T10:00:22.989Z level=INFO msg="Initializing system entities of type 'GPU'"
time=2026-04-29T10:00:23.031Z level=INFO msg="Initializing system entities of type 'NvSwitch'"
time=2026-04-29T10:00:23.031Z level=INFO msg="Not collecting NvSwitch metrics; no switches to monitor"
time=2026-04-29T10:00:23.031Z level=INFO msg="Initializing system entities of type 'NvLink'"
time=2026-04-29T10:00:23.031Z level=WARN msg="Failed to initialize NvSwitch/NvLink info" error="no switches to monitor"
time=2026-04-29T10:00:23.046Z level=INFO msg="Initializing system entities of type 'CPU'"
time=2026-04-29T10:00:23.125Z level=INFO msg="Not collecting CPU metrics; error retrieving DCGM CPU hierarchy: This request is serviced by a module of DCGM that is not currently loaded"
time=2026-04-29T10:00:23.125Z level=INFO msg="Initializing system entities of type 'CPU Core'"
time=2026-04-29T10:00:23.125Z level=INFO msg="Not collecting CPU Core metrics; error retrieving DCGM CPU hierarchy: This request is serviced by a module of DCGM that is not currently loaded"
time=2026-04-29T10:00:23.171Z level=INFO msg="Registry built successfully" collector_count=2
time=2026-04-29T10:00:23.171Z level=INFO msg="Kubernetes metrics collection enabled!"
time=2026-04-29T10:00:23.172Z level=INFO msg="Initializing Pod Informer" nodeName=stg-nc-partner1-wkld1
I0429 10:00:23.186172       1 warnings.go:110] "Warning: resource.k8s.io/v1beta1 ResourceSlice is deprecated in v1.35+, unavailable in v1.38+"
time=2026-04-29T10:00:23.286Z level=INFO msg="ResourceSlice API informer synced successfully" apiVersion=v1
time=2026-04-29T10:00:23.286Z level=INFO msg="Started DRAResourceSliceManager with auto-detected API version"
time=2026-04-29T10:00:23.287Z level=INFO msg="Profiling endpoints enabled at /debug/pprof/"
time=2026-04-29T10:00:23.287Z level=INFO msg="HTTP server started - ready to serve metrics"
time=2026-04-29T10:00:23.287Z level=INFO msg="Watching for changes in file" file=/etc/dcgm-exporter/dcp-metrics-included.csv debounce=200ms
time=2026-04-29T10:00:23.287Z level=INFO msg="Starting webserver"
time=2026-04-29T10:00:23.291Z level=INFO msg="Listening on" address=[::]:9400
time=2026-04-29T10:00:23.291Z level=INFO msg="TLS is disabled." http2=false address=[::]:9400
time=2026-04-29T10:00:23.388Z level=INFO msg="Pod informer cache synced"

@adityasingh0510 adityasingh0510 changed the title feat: Add dual API support for ResourceSlice (v1 and v1beta1) feat: Support K8s DRA Resources V1 APIs Apr 29, 2026
Comment thread internal/pkg/transformation/types.go Outdated
deviceToUUID map[string]string // pool/device -> UUID (for full GPUs)
migDevices map[string]*DRAMigDeviceInfo // pool/device -> MIG info (for MIG devices)
factory informers.SharedInformerFactory
v1Informer cache.SharedIndexInformer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont need both informers here. can

Suggested change
v1Informer cache.SharedIndexInformer
informer cache.SharedIndexInformer

Comment thread internal/pkg/transformation/types.go Outdated
// - "v1beta1" if v1 does not, but v1beta1 does
preferredAPIVersion string
cancelContext context.CancelFunc
mu sync.RWMutex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, not needed anymore.

Comment thread internal/pkg/transformation/dra.go Outdated
// For MIG devices: returns (parentUUID, *DRAMigDeviceInfo)
// For full GPUs: returns (deviceUUID, nil)
func (m *DRAResourceSliceManager) GetDeviceInfo(pool, device string) (string, *DRAMigDeviceInfo) {
m.mu.RLock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need this lock

Comment thread internal/pkg/transformation/dra.go Outdated
// Search for the device in the selected slices
for _, item := range items {
var adapter resourceSliceAdapter
switch obj := item.(type) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still does a per-item type-switch on v1.ResourceSlice / v1beta1.ResourceSlice even though getV1DeviceInfo / getV1beta1DeviceInfo already know which version they're handling. @varunrsekar called this out #596 (comment) earlier.

getV1DeviceInfo and getV1beta1DeviceInfo can each inline the lookup against their own typed slice.

Then another helper that does the gpu / mig / default switch.

Comment thread internal/pkg/transformation/dra_test.go Outdated
uuid, migInfo := m.GetDeviceInfo("gpu-pool", "gpu0")
assert.Empty(t, uuid, "expected no UUID when preferred version is invalid")
assert.Nil(t, migInfo, "expected no MIG info when preferred version is invalid")
} No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no new line

Comment thread internal/pkg/transformation/dra.go Outdated
return "", nil
}
return mappings[0].MappingKey, mappings[0].Info
} No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no new line

Comment thread internal/pkg/transformation/dra.go Outdated
for i := range resourceSlicesList.Items {
items = append(items, &resourceSlicesList.Items[i])
}
v1beta1HasNvidiaSlices = countGPUSlices(items) > 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified a bit? These two list+count blocks are almost identical. Have a helper that returns a bool on the first match. Something like:

// hasNvidiaDRASlices reports whether the cluster currently exposes any
// NVIDIA GPU DRA ResourceSlices on the given API version. 
func hasNvidiaDRASlices(ctx context.Context, client kubernetes.Interface, apiVersion string) (bool, error) {
    switch apiVersion {
    case "v1":
        list, err := client.ResourceV1().ResourceSlices().List(ctx, metav1.ListOptions{})
        if err != nil {
            return false, fmt.Errorf("listing v1 ResourceSlices: %w", err)
        }
        for i := range list.Items {
            s := &list.Items[i]
            if s.Spec.Driver == DRAGPUDriverName && len(s.Spec.Devices) > 0 {
                return true, nil
            }
        }
        return false, nil
    case "v1beta1":
        list, err := client.ResourceV1beta1().ResourceSlices().List(ctx, metav1.ListOptions{})
        if err != nil {
            return false, fmt.Errorf("listing v1beta1 ResourceSlices: %w", err)
        }
        for i := range list.Items {
            s := &list.Items[i]
            if s.Spec.Driver == DRAGPUDriverName && len(s.Spec.Devices) > 0 {
                return true, nil
            }
        }
        return false, nil
    default:
        return false, fmt.Errorf("unsupported ResourceSlice API version: %q", apiVersion)
    }
}

than replace the above blocks and no more countGPUSlices check needed

v1HasNvidiaSlices := false
if v1Served {
    has, err := hasNvidiaDRASlices(ctx, client, "v1")
    if err != nil { return nil, err }
    v1HasNvidiaSlices = has
}
// same for v1beta1

Comment thread internal/pkg/transformation/dra_test.go Outdated
)

// testInformer is a simple test implementation of SharedIndexInformer
type testInformer struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed, can usetestInformerForDRA

Comment thread go.mod
github.com/avast/retry-go/v4 v4.6.0
github.com/bits-and-blooms/bitset v1.22.0
github.com/fsnotify/fsnotify v1.7.0
github.com/containerd/cgroups/v3 v3.1.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these updates needed for this PR?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsnotify and cgroups are both still required, they were already on main (watcher → fsnotify, pidmapper → cgroups). This PR doesn’t change those files. The diff is from go mod tidy after the k8s.io/* bump (line order / direct vs indirect / versions). We need the updated go.mod for a consistent module graph with the Kubernetes upgrade.

Comment thread internal/pkg/transformation/dra.go Outdated
//
// Deprecated behavior: this returns only the first mapping. Prefer
// GetDynamicResourceMappings when a DynamicResource may contain multiple devices.
func (m *DRAResourceSliceManager) GetDynamicResourceInfo(resource *podresourcesapi.DynamicResource) (string, *DynamicResourceInfo) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be removed. but double check if called in the tests

@guptaNswati
Copy link
Copy Markdown
Contributor

@adityasingh0510 thank you for the new PR. i have some nits that needs addressing but most of the comments from previous PR is addressed and looks good. I will find a MIG device internally for testing.

@adityasingh0510
Copy link
Copy Markdown
Author

Thanks @guptaNswati . I’ve pushed updates for the remaining nits; please let me know if anything else stands out. Appreciate you testing on a MIG setup when you have a chance.


// Wait for cache sync on the selected informer.
synced := cache.WaitForCacheSync(ctx.Done(), informer.HasSynced)
synced := cache.WaitForCacheSync(wait.NeverStop, informer.HasSynced)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

Comment thread internal/pkg/transformation/dra_test.go Outdated
assert.Nil(t, migInfo, "expected no MIG info for GPU device")
}

func TestGetDeviceInfo_InvalidPreferredVersion_ReturnsEmpty(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see these tests are removed? why?

selected = "v1beta1"
default:
slog.Warn("No NVIDIA DRA ResourceSlices found; DRA labels will not be available")
return nil, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is a potential race condition here. In the intended installation order, dra-driver is deployed before dcgm-exporter and hence we expect for the available api nvidia ResourceSlice (RS) will exit. But if dcgm-exporter is up first, the api check

  • client.ResourceV1().ResourceSlices().List(...) succeeds with an empty list
  • hasNvidiaDRASlices returns (false, nil) for both versions.
  • it lands into the default case, returning nil, nil
  • PodMapper.ResourceSliceManager stays nil for the rest of the pod's lifetime untill exporter is restarted.

may be we should log and always start v1 informer. @varunsekar Thoughts?

originally, v1beta1_informer is always initiated: https://github.com/NVIDIA/dcgm-exporter/blob/main/internal/pkg/transformation/dra.go#L43

resources, err := client.Discovery().ServerResourcesForGroupVersion(groupVersion)
if err != nil {
// Discovery returns errors when the group/version isn't served.
slog.Debug("Discovery failed for groupVersion", "groupVersion", groupVersion, "error", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be logged as a warning.

key := pool + "/" + device
m.mu.RLock()
defer m.mu.RUnlock()
func (m *DRAResourceSliceManager) getV1DeviceInfo(pool, device string) (string, *DRAMigDeviceInfo) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can getV1DeviceInfo and getV1beta1DeviceInfo be collapsed in one function with a switch? something like

if m.informer == nil {
        return "", nil
    }
    items, err := m.informer.GetIndexer().ByIndex("poolName", pool)
    if err != nil {
        slog.Error("Error listing ResourceSlices by pool index", "pool", pool, "err", err)
        return "", nil
    }
for _, item := range items {
        var adapter resourceSliceAdapter
        var driver string
        switch rs := item.(type) {
        case *resourcev1.ResourceSlice:
            driver, adapter = rs.Spec.Driver, &v1ResourceSliceAdapter{slice: rs}
        case *resourcev1beta1.ResourceSlice:
            driver, adapter = rs.Spec.Driver, &v1beta1ResourceSliceAdapter{slice: rs}
        default:
            continue
        }
        if driver != DRAGPUDriverName {
            continue
        }
        if mappingKey, migInfo := lookupDRADeviceInAdapter(pool, device, adapter); mappingKey != "" {
            return mappingKey, migInfo
        }
    }

@guptaNswati
Copy link
Copy Markdown
Contributor

@adityasingh0510 can you pls address the open comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for K8s v1.34 resource.k8s.io/v1 DRA APIs

2 participants