Skip to content

Feat: Add DpuNetwork CR, Per-network device pools and Accelerated NF support#636

Open
alkama-hasan wants to merge 3 commits intoopenshift:mainfrom
alkama-hasan:dpunetwork-controller
Open

Feat: Add DpuNetwork CR, Per-network device pools and Accelerated NF support#636
alkama-hasan wants to merge 3 commits intoopenshift:mainfrom
alkama-hasan:dpunetwork-controller

Conversation

@alkama-hasan
Copy link
Copy Markdown
Contributor

@alkama-hasan alkama-hasan commented Mar 3, 2026

This PR introduces a cluster-scoped DpuNetwork CRD (config.openshift.io) with a controller and device plugin manager for per-network DPU resource management and covers e2e deployment of Accelerated NF.

Changes

  • DpuNetwork API: nodeSelector, dpuSelector (vfId range support), isAccelerated; status with resourceName, selectedVFs, and conditions.
  • Controller: writes per-network entries to dpu-device-plugin-config ConfigMap, creates a NAD per DpuNetwork, manages finalizer for cleanup on delete.
  • Device Plugin Manager: polls ConfigMap to start/stop per-network and accelerated device plugins; gracefully restarts default plugin when all networks are removed.
  • Accelerated NF Mode: per-VF CreateNetworkFunction/DeleteNetworkFunction with bridgeID plumbed through CNI config, BridgePort, and NF APIs.
  • Device Recovery: recovers orphaned VF representors on namespace teardown and graceful daemon shutdown via releaseNfDevices and recoverOrphanedDevice.
  • CNI Config: added bridgeID and isAccelerated fields to NetConf; BridgePort now uses bridgeID instead of hardcoded vf+2.
  • VendorPlugin Interface: extended with bridgeID parameter and SetDpuNetworkConfig RPC; GrpcPlugin wires DpuNetworkConfigServiceClient.
  • Marvell VSP: implements SetDpuNetworkConfig gRPC to toggle accelerated mode and report accelerated devices.
  • Host/DPU Side Managers: K8s client creation for config-driven device plugin registration with fallback on failure.
  • PathManager: unique socket paths per network (dpuNet-<name>.sock).
  • RBAC: controller permissions for dpunetworks, ConfigMaps, NADs; daemon role updated with ConfigMap access.
  • CRD/Examples: OpenAPI schema, print columns, dpunetwork-net1.yaml, dpunetwork-net2.yaml, host-pod.yaml, nf-pod.yaml.

@openshift-ci openshift-ci bot requested review from bn222 and vrindle March 3, 2026 10:31
@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 3, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 3, 2026

Hi @alkama-hasan. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@alkama-hasan alkama-hasan force-pushed the dpunetwork-controller branch from d5837b7 to 6f82b32 Compare March 3, 2026 10:37
@wizhaoredhat
Copy link
Copy Markdown
Contributor

/approve

@wizhaoredhat
Copy link
Copy Markdown
Contributor

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 5, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alkama-hasan, wizhaoredhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2026
@alkama-hasan alkama-hasan force-pushed the dpunetwork-controller branch 3 times, most recently from a8da5d9 to fe9f635 Compare March 18, 2026 07:05
@alkama-hasan
Copy link
Copy Markdown
Contributor Author

/retest

@alkama-hasan alkama-hasan changed the title Add DpuNetwork CRD and controller for DPU network configuration Add DpuNetwork CRD ,controller and device-plugin for DPU network configuration Mar 25, 2026
@alkama-hasan alkama-hasan force-pushed the dpunetwork-controller branch from bcf1947 to 075b3a4 Compare March 25, 2026 17:19
@alkama-hasan
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@alkama-hasan
Copy link
Copy Markdown
Contributor Author

/retest

@alkama-hasan
Copy link
Copy Markdown
Contributor Author

/test make-test

@alkama-hasan alkama-hasan force-pushed the dpunetwork-controller branch 3 times, most recently from 20b15b1 to a7e0074 Compare April 7, 2026 09:45
@alkama-hasan alkama-hasan changed the title Add DpuNetwork CRD ,controller and device-plugin for DPU network configuration Feat: Add DpuNetwork CR, Per-network device pools and Accelerated NF support Apr 7, 2026
@alkama-hasan alkama-hasan force-pushed the dpunetwork-controller branch from a7e0074 to be1c641 Compare April 7, 2026 11:03
Introduce DpuNetwork custom resource and reconciler: maintains
dpu-device-plugin-config ConfigMap and NADs per network, with
finalizer cleanup on delete. Includes RBAC, examples and test.

Signed-off-by: Alkama Hasan <alkamah@marvell.com>
Introduced configMap based device plugin per dpuNetwork CR resource.

Signed-off-by: Alkama Hasan <alkamah@marvell.com>
Plumb bridgeID across CNI and network function APIs. Add per-VF
accelerated mode and orphaned VF representor recovery on shutdown.
Signed-off-by: Alkama Hasan <alkamah@marvell.com>
@alkama-hasan alkama-hasan force-pushed the dpunetwork-controller branch from be1c641 to 3ce6ba1 Compare April 7, 2026 12:02
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 7, 2026

@alkama-hasan: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-deps be1c641 link true /test verify-deps
ci/prow/make-vendor-check be1c641 link true /test make-vendor-check
ci/prow/make-e2e-test-marvell be1c641 link true /test make-e2e-test-marvell
ci/prow/make-test be1c641 link true /test make-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants