feat: Support K8s DRA Resources V1 APIs#654
Conversation
c7c2391 to
40b10da
Compare
|
Hi @guptaNswati , sharing Full GPU mode test logs from the latest dcgm-exporter changes on Kubernetes v1.34+ (ResourceSlice v1) — see below. |
| deviceToUUID map[string]string // pool/device -> UUID (for full GPUs) | ||
| migDevices map[string]*DRAMigDeviceInfo // pool/device -> MIG info (for MIG devices) | ||
| factory informers.SharedInformerFactory | ||
| v1Informer cache.SharedIndexInformer |
There was a problem hiding this comment.
dont need both informers here. can
| v1Informer cache.SharedIndexInformer | |
| informer cache.SharedIndexInformer |
| // - "v1beta1" if v1 does not, but v1beta1 does | ||
| preferredAPIVersion string | ||
| cancelContext context.CancelFunc | ||
| mu sync.RWMutex |
There was a problem hiding this comment.
same, not needed anymore.
| // For MIG devices: returns (parentUUID, *DRAMigDeviceInfo) | ||
| // For full GPUs: returns (deviceUUID, nil) | ||
| func (m *DRAResourceSliceManager) GetDeviceInfo(pool, device string) (string, *DRAMigDeviceInfo) { | ||
| m.mu.RLock() |
| // Search for the device in the selected slices | ||
| for _, item := range items { | ||
| var adapter resourceSliceAdapter | ||
| switch obj := item.(type) { |
There was a problem hiding this comment.
This still does a per-item type-switch on v1.ResourceSlice / v1beta1.ResourceSlice even though getV1DeviceInfo / getV1beta1DeviceInfo already know which version they're handling. @varunrsekar called this out #596 (comment) earlier.
getV1DeviceInfo and getV1beta1DeviceInfo can each inline the lookup against their own typed slice.
Then another helper that does the gpu / mig / default switch.
| uuid, migInfo := m.GetDeviceInfo("gpu-pool", "gpu0") | ||
| assert.Empty(t, uuid, "expected no UUID when preferred version is invalid") | ||
| assert.Nil(t, migInfo, "expected no MIG info when preferred version is invalid") | ||
| } No newline at end of file |
| return "", nil | ||
| } | ||
| return mappings[0].MappingKey, mappings[0].Info | ||
| } No newline at end of file |
| for i := range resourceSlicesList.Items { | ||
| items = append(items, &resourceSlicesList.Items[i]) | ||
| } | ||
| v1beta1HasNvidiaSlices = countGPUSlices(items) > 0 |
There was a problem hiding this comment.
Can this be simplified a bit? These two list+count blocks are almost identical. Have a helper that returns a bool on the first match. Something like:
// hasNvidiaDRASlices reports whether the cluster currently exposes any
// NVIDIA GPU DRA ResourceSlices on the given API version.
func hasNvidiaDRASlices(ctx context.Context, client kubernetes.Interface, apiVersion string) (bool, error) {
switch apiVersion {
case "v1":
list, err := client.ResourceV1().ResourceSlices().List(ctx, metav1.ListOptions{})
if err != nil {
return false, fmt.Errorf("listing v1 ResourceSlices: %w", err)
}
for i := range list.Items {
s := &list.Items[i]
if s.Spec.Driver == DRAGPUDriverName && len(s.Spec.Devices) > 0 {
return true, nil
}
}
return false, nil
case "v1beta1":
list, err := client.ResourceV1beta1().ResourceSlices().List(ctx, metav1.ListOptions{})
if err != nil {
return false, fmt.Errorf("listing v1beta1 ResourceSlices: %w", err)
}
for i := range list.Items {
s := &list.Items[i]
if s.Spec.Driver == DRAGPUDriverName && len(s.Spec.Devices) > 0 {
return true, nil
}
}
return false, nil
default:
return false, fmt.Errorf("unsupported ResourceSlice API version: %q", apiVersion)
}
}than replace the above blocks and no more countGPUSlices check needed
v1HasNvidiaSlices := false
if v1Served {
has, err := hasNvidiaDRASlices(ctx, client, "v1")
if err != nil { return nil, err }
v1HasNvidiaSlices = has
}
// same for v1beta1
| ) | ||
|
|
||
| // testInformer is a simple test implementation of SharedIndexInformer | ||
| type testInformer struct { |
There was a problem hiding this comment.
why is this needed, can usetestInformerForDRA
| github.com/avast/retry-go/v4 v4.6.0 | ||
| github.com/bits-and-blooms/bitset v1.22.0 | ||
| github.com/fsnotify/fsnotify v1.7.0 | ||
| github.com/containerd/cgroups/v3 v3.1.1 |
There was a problem hiding this comment.
are these updates needed for this PR?
There was a problem hiding this comment.
fsnotify and cgroups are both still required, they were already on main (watcher → fsnotify, pidmapper → cgroups). This PR doesn’t change those files. The diff is from go mod tidy after the k8s.io/* bump (line order / direct vs indirect / versions). We need the updated go.mod for a consistent module graph with the Kubernetes upgrade.
| // | ||
| // Deprecated behavior: this returns only the first mapping. Prefer | ||
| // GetDynamicResourceMappings when a DynamicResource may contain multiple devices. | ||
| func (m *DRAResourceSliceManager) GetDynamicResourceInfo(resource *podresourcesapi.DynamicResource) (string, *DynamicResourceInfo) { |
There was a problem hiding this comment.
this should be removed. but double check if called in the tests
|
@adityasingh0510 thank you for the new PR. i have some nits that needs addressing but most of the comments from previous PR is addressed and looks good. I will find a MIG device internally for testing. |
|
Thanks @guptaNswati . I’ve pushed updates for the remaining nits; please let me know if anything else stands out. Appreciate you testing on a MIG setup when you have a chance. |
|
|
||
| // Wait for cache sync on the selected informer. | ||
| synced := cache.WaitForCacheSync(ctx.Done(), informer.HasSynced) | ||
| synced := cache.WaitForCacheSync(wait.NeverStop, informer.HasSynced) |
There was a problem hiding this comment.
why did you change this?
| assert.Nil(t, migInfo, "expected no MIG info for GPU device") | ||
| } | ||
|
|
||
| func TestGetDeviceInfo_InvalidPreferredVersion_ReturnsEmpty(t *testing.T) { |
There was a problem hiding this comment.
i see these tests are removed? why?
| selected = "v1beta1" | ||
| default: | ||
| slog.Warn("No NVIDIA DRA ResourceSlices found; DRA labels will not be available") | ||
| return nil, nil |
There was a problem hiding this comment.
i think there is a potential race condition here. In the intended installation order, dra-driver is deployed before dcgm-exporter and hence we expect for the available api nvidia ResourceSlice (RS) will exit. But if dcgm-exporter is up first, the api check
client.ResourceV1().ResourceSlices().List(...)succeeds with an empty list- hasNvidiaDRASlices returns (false, nil) for both versions.
- it lands into the default case, returning nil, nil
PodMapper.ResourceSliceManagerstays nil for the rest of the pod's lifetime untill exporter is restarted.
may be we should log and always start v1 informer. @varunsekar Thoughts?
originally, v1beta1_informer is always initiated: https://github.com/NVIDIA/dcgm-exporter/blob/main/internal/pkg/transformation/dra.go#L43
| resources, err := client.Discovery().ServerResourcesForGroupVersion(groupVersion) | ||
| if err != nil { | ||
| // Discovery returns errors when the group/version isn't served. | ||
| slog.Debug("Discovery failed for groupVersion", "groupVersion", groupVersion, "error", err) |
There was a problem hiding this comment.
this should be logged as a warning.
| key := pool + "/" + device | ||
| m.mu.RLock() | ||
| defer m.mu.RUnlock() | ||
| func (m *DRAResourceSliceManager) getV1DeviceInfo(pool, device string) (string, *DRAMigDeviceInfo) { |
There was a problem hiding this comment.
can getV1DeviceInfo and getV1beta1DeviceInfo be collapsed in one function with a switch? something like
if m.informer == nil {
return "", nil
}
items, err := m.informer.GetIndexer().ByIndex("poolName", pool)
if err != nil {
slog.Error("Error listing ResourceSlices by pool index", "pool", pool, "err", err)
return "", nil
}
for _, item := range items {
var adapter resourceSliceAdapter
var driver string
switch rs := item.(type) {
case *resourcev1.ResourceSlice:
driver, adapter = rs.Spec.Driver, &v1ResourceSliceAdapter{slice: rs}
case *resourcev1beta1.ResourceSlice:
driver, adapter = rs.Spec.Driver, &v1beta1ResourceSliceAdapter{slice: rs}
default:
continue
}
if driver != DRAGPUDriverName {
continue
}
if mappingKey, migInfo := lookupDRADeviceInAdapter(pool, device, adapter); mappingKey != "" {
return mappingKey, migInfo
}
}
|
@adityasingh0510 can you pls address the open comments. |
This PR updates dcgm-exporter to support both the stable
resource.k8s.io/v1API and thev1beta1API for Dynamic Resource Allocation (DRA) support. This ensures compatibility with both Kubernetes 1.34+ clusters (using v1) and older clusters (using v1beta1), with automatic detection and graceful fallback.Problem
When enabling DRA labels in dcgm-exporter on Kubernetes 1.34+ clusters, the following error occurs:
This happens because:
Changes
Files Modified
internal/pkg/transformation/dra.go:onAddOrUpdateV1()/onAddOrUpdateV1beta1()onDeleteV1()/onDeleteV1beta1()dev.Basic.Attributesdev.Attributes(direct access, no Basic wrapper)internal/pkg/transformation/types.go:v1Informerandv1beta1Informerfields toDRAResourceSliceManagerstructgo.mod/go.sum:k8s.io/api: v0.33.3 → v0.34.0 (adds support forresource/v1)k8s.io/client-go: v0.33.3 → v0.34.0 (ensures compatibility)k8s.io/apimachinery: v0.33.3 → v0.34.0API Structure Changes
The v1 API has a different structure than v1beta1:
dev.Basic.Attributesdev.Attributes(direct)The implementation handles both structures correctly.
Behavior
Automatic API Detection
The code registers both informers and uses whichever is available:
Precedence Logic
When both APIs are available:
Testing
Verification
Code compiles successfully with both API versions
All tests pass - existing unit tests continue to work
No linter errors
v1 API support - verified with Kubernetes 1.34+ API structure
v1beta1 API support - verified with Kubernetes 1.27-1.33 API structure
Dual API handling - both informers work correctly when both are available
Precedence logic - v1 correctly takes precedence over v1beta1
Delete handling - race conditions prevented with cache checking
Test Scenarios
Backward Compatibility
Fully backward compatible:
Forward compatible:
Breaking Changes
None - This is a backward and forward compatibility enhancement. The change:
Related Issues