CORENET-7091: Add enhancement proposal to productize ovn-kubernetes MCP tools#2002
CORENET-7091: Add enhancement proposal to productize ovn-kubernetes MCP tools#2002arkadeepsen wants to merge 1 commit into
Conversation
|
@arkadeepsen: This pull request references CORENET-7091 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
|
||
| OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows) while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface. | ||
|
|
||
| The primary motivation for landing these tools in upstream kubernetes-mcp-server is **productization via downstream sync into openshift-mcp-server**. By first integrating the OVN toolset upstream, OpenShift can ship and support the same upstream code through the established downstream pipeline. |
There was a problem hiding this comment.
Instead of productization, can we say that keeping all Openshft related MCP servers in a single repository is the main motivation? or we can keep both.
There was a problem hiding this comment.
Added a line stating that the ovnk tools can be consumed from the same ocp mcp server.
| kms --> Sync | ||
| ``` | ||
|
|
||
| **Downstream.** openshift-mcp-server consumes kubernetes-mcp-server changes through its normal fork sync or vendor workflow (exact mechanics follow that repository’s documented process). |
There was a problem hiding this comment.
Dont we want to add any more implementation details, like what exact tools will be added and what purpose those may serve?
There was a problem hiding this comment.
I have added that all the tools under ovn and ovs packages will be added to the OCP MCP server. I have added some more details about how to go about the implementation. I didn't want to add specific details of the the local PoC I did as that might not be the only way of implementing the integration.
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools. | ||
| - Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature. | ||
| - Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server. |
There was a problem hiding this comment.
Is having an automated sync mechanism between ovn-mcp-server, kubernetes-mcp-server and openshift-mcp-server also a goal of this feature?
There was a problem hiding this comment.
The current plan is to import the the packages from ovn-kubernetes-mcp repo. Thus, whenever we need the latest changes in kubernetes-mcp-server, the go.mod and go.sum files can be updated to refer to the latest changes from ovn-kubernetes-mcp repo. Regarding the automation, since kubernetes-mcp-server is in a separate upstream repo where we are not maintainers, not sure whether adding the automatic sync process as part of this EP would be appropriate. We can figure that part out, if needed, in the future. For now, we'll just bump the import as we do for k8s bump in the different repos.
There was a problem hiding this comment.
Given that must-gather is downstream specific, bringing it into the kubernetes-mcp-server would not be a problem, right?
There was a problem hiding this comment.
There's already an existing downstream effort for must-gather. It differs from how it's been implemented in ovn-kubernetes-mcp repo. If we want to integrate the networking bits from the must-gather tool, we'll have to do it in the openshift-mcp-server directly, as kubernetes-mcp-server won't have must-gather related tools.
There was a problem hiding this comment.
okay, we can skip using must-gather tool from ovn-kubernetes-mcp and use the existing one. We can try to directly add networking bits to kubernetes-mcp-server to imitate behaviour in ovn-kubernetes-mcp. Can we consider this one of the goals?
There was a problem hiding this comment.
It will not work on kubernetes-mcp-server as the must-gather implementation is in downstream openshift-mcp-server.
aaccb39 to
bbf81a5
Compare
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools. | ||
| - Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature. | ||
| - Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server. |
There was a problem hiding this comment.
Given that must-gather is downstream specific, bringing it into the kubernetes-mcp-server would not be a problem, right?
|
|
||
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN/OVS tool commands in-cluster using its existing pod-exec capabilities, with only minor refactoring required in **ovn-kubernetes-mcp** and **kubernetes-mcp-server** to integrate that pod-exec path cleanly. | ||
| - Import the full OVN and OVS handler set from ovn-kubernetes-mcp (`pkg/ovn/mcp` and `pkg/ovs/mcp`) into the `ovn-kubernetes` toolset, while other upstream packages stay excluded per Non-Goals. |
There was a problem hiding this comment.
As per today's discussion, we should mention kernel and sosreport tools which are helpful to explore node's kernel resources.
There was a problem hiding this comment.
This is already changed.
| - Full parity in the first iteration with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example kernel diagnostics, optional images such as pwru/tcpdump, must-gather, sosreport) where those require separate dependencies, images, or workflows. | ||
| - New Kubernetes or OpenShift APIs, CRDs, operators, or cluster-side agents solely for this feature. | ||
| - Replacing existing CLI-based troubleshooting; MCP tools are an additional interface. | ||
| - Importing ovn-kubernetes-mcp tools under `kernel` and `network-tools` packages in the first iteration, since those tools depend on a node debugging capability (for example a node-debug tool) that is not currently available in kubernetes-mcp-server. |
There was a problem hiding this comment.
This is already changed.
|
|
||
| ### Non-Goals | ||
|
|
||
| - Full parity in the first iteration with every tool category shipped by the standalone ovn-kubernetes-mcp binary (for example kernel diagnostics, optional images such as pwru/tcpdump, must-gather, sosreport) where those require separate dependencies, images, or workflows. |
There was a problem hiding this comment.
Since ovn-kubernetes-mcp is an upstream repo, we can't expect all current and future tools to be applicable to an OpenShift environment.
Given that we plan to import the packages from ovn-kubernetes-mcp repo, how should we control access to tools that may not be supported?
There was a problem hiding this comment.
We're only going to call the handlers of the tools which are supported. The import is for the packages where these handlers are defined. Unsupported handlers should not be used.
| - https://redhat.atlassian.net/browse/CORENET-7091 | ||
| see-also: | ||
| - https://github.com/ovn-kubernetes/ovn-kubernetes-mcp | ||
| - https://github.com/containers/kubernetes-mcp-server |
There was a problem hiding this comment.
NIT: do we need kubernetes-mcp-server and openshift-mcp-server here?
There was a problem hiding this comment.
Yes. Since the implementation of the EP will impact all the repos, we need all of them to be incuded here.
| ### User Stories | ||
|
|
||
| - As a cluster administrator or platform engineer, I want OVN-Kubernetes MCP troubleshooting tools in the same MCP server I already use for Kubernetes resources, so that I do not have to deploy, operate, or manage authentication for a second MCP server dedicated only to OVN-Kubernetes. | ||
| - As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports—NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling—so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident. |
There was a problem hiding this comment.
| - As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports—NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling—so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident. | |
| - As a support engineer, I want MCP clients to expose the full ovn-kubernetes-mcp troubleshooting surface that kubernetes-mcp-server imports (NB/SB inspection and related `ovn-*` workflows (including `get`, `lflow-list`, `trace` where those tools apply), OVS bridge and OpenFlow helpers, and **`kernel`** / **`network-tools`** host and capture tooling) so that assisted troubleshooting matches how other cluster operations are automated without switching servers or credentials mid-incident. |
There was a problem hiding this comment.
There's already a bracket between the dashes.
| - Add an `ovn-kubernetes` toolset to kubernetes-mcp-server that reuses the existing OVN MCP tool implementations from ovn-kubernetes-mcp, rather than re-implementing equivalent functionality. | ||
| - Enable kubernetes-mcp-server to execute OVN tool commands in-cluster using its existing pod-exec capabilities, with only minor upstream refactoring required in the imported OVN tools. | ||
| - Import the OVN and OVS layers from ovn-kubernetes-mcp incrementally (starting with core OVN/OVS troubleshooting tools), expanding coverage as dependencies and eval coverage mature. | ||
| - Make the toolset available to OpenShift users through openshift-mcp-server via downstream sync from kubernetes-mcp-server. |
There was a problem hiding this comment.
okay, we can skip using must-gather tool from ovn-kubernetes-mcp and use the existing one. We can try to directly add networking bits to kubernetes-mcp-server to imitate behaviour in ovn-kubernetes-mcp. Can we consider this one of the goals?
|
|
||
| **Importing upstream tools into kubernetes-mcp-server.** The OVN troubleshooting MCP tools already exist in ovn-kubernetes-mcp. The integration approach for kubernetes-mcp-server is to add an `ovn-kubernetes` toolset that reuses those implementations as imported packages and exposes them through kubernetes-mcp-server’s tool registration. | ||
|
|
||
| **Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation. |
There was a problem hiding this comment.
I understand that kubernetes-macp-server is building its own node-debug method to allow host access using kubectl/oc CLI. However, in ovn-kubernetes-mcp we use a different method to do node debug for kernel and other network tools. I wonder how we can use tools from ovn-kubernetes-mcp while using the utility from kubernetes-mcp-server, considering it's downstream of ovn-kubernetes-mcp.
There was a problem hiding this comment.
The same way we'll use pod-exec from kubernetes-mcp-server for the OVN/OVS tools. The function definition should be similar, that is, the argument list and the return type should be same in both ovn-kubernetes-mcp and kubernetes-mcp-server, for the node-debug function, which will be called by the kernel and the network-tools handlers.
There was a problem hiding this comment.
Shall we mention this explicitly in the document? for what I understand the current kubernetes-mcp-server does not have any node-debug method capability so far. so if that needs to be implemented is worth to call it out in this section.
|
|
||
| **Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation. | ||
|
|
||
| **Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces—must-gather, sosreport, and similar—remain out of scope unless separately agreed; see Non-Goals. |
There was a problem hiding this comment.
| **Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces—must-gather, sosreport, and similar—remain out of scope unless separately agreed; see Non-Goals. | |
| **Scope.** All troubleshooting tools under ovn-kubernetes-mcp **`ovn`**, **`ovs`**, **`kernel`**, and **`network-tools`** belong to this effort (NB/SB inspection, logical flows, OVN trace, OVS bridge and OpenFlow helpers, kernel-oriented diagnostics, and **`network-tools`**-style capture where applicable). Other ovn-kubernetes-mcp surfaces (must-gather, sosreport, and similar) remain out of scope unless separately agreed; see Non-Goals. |
|
@arkadeepsen: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| ## Open Questions | ||
|
|
||
| - How to structure mcpchecker suites or task labels so OVN/OVS, **`kernel`**, and **`network-tools`** coverage stays maintainable under kubernetes-mcp-server’s pass-rate gates, given differing cluster prerequisites? |
There was a problem hiding this comment.
For the mcpchecker structure — since kernel and network-tools require privileged node access which may not be available in all CI environments, would it make sense to have separate suites for OVN/OVS and kernel/network-tools so their pass rates are tracked independently?
There was a problem hiding this comment.
I am more inclined towards creating a separate suite for each layer of ovnk mcp server tools. That is for each of OVN, OVS, kernel and network-tools, we'll have separate evals suites. But we can take a call when working on the evals for the tools.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: taanyas The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
mattedallo
left a comment
There was a problem hiding this comment.
lgtm
I added some "non blocking" comments.
|
|
||
| OVN-Kubernetes operators and support engineers often need Northbound and Southbound database views (`ovn-nbctl`, `ovn-sbctl`, traces, logical flows), host-oriented diagnostics, and packet or kernel-level capture workflows while investigating connectivity and routing. These tools are already implemented in ovn-kubernetes-mcp, but OpenShift users benefit from consuming them via a **single MCP server** that shares authentication, tool governance, and documentation with the rest of the platform troubleshooting surface. | ||
|
|
||
| The primary motivation for landing these tools in upstream kubernetes-mcp-server is **productization via downstream sync into openshift-mcp-server**. By first integrating the OVN toolset upstream, OpenShift can ship and support the same upstream code through the established downstream pipeline. This also lets OpenShift customers consume the OVN-Kubernetes tools from the same MCP server as the rest of the platform troubleshooting surface, openshift-mcp-server, after downstream sync. |
There was a problem hiding this comment.
Nit: maybe we can expand a bit what is the cost we are saving on exploiting the existing openshift-mcp-server productization pipeline.
That will strength the motivation of integrating versus keeping it separate.
|
|
||
| None. This work adds MCP tools only and does not extend the OpenShift or Kubernetes API surface. | ||
|
|
||
| ### Topology Considerations |
There was a problem hiding this comment.
Minor note : the topology section seems written with the local binary deployment model in mind. It might be worth a brief mention that the same considerations apply for in-cluster deployments, or a note that the OVN-K tools inherit whatever cluster-access model kubernetes-mcp-server provides.
|
|
||
| **Importing upstream tools into kubernetes-mcp-server.** The OVN troubleshooting MCP tools already exist in ovn-kubernetes-mcp. The integration approach for kubernetes-mcp-server is to add an `ovn-kubernetes` toolset that reuses those implementations as imported packages and exposes them through kubernetes-mcp-server’s tool registration. | ||
|
|
||
| **Command execution strategy.** OVN/OVS tools run commands inside OVN-Kubernetes pods via kubernetes-mcp-server’s pod exec. **`kernel`** and **`network-tools`** handlers use the node-level execution contract wired up in the same integration (for example debug pod or node-targeted exec, as the upstream packages require). Imported libraries should delegate all cluster I/O to kubernetes-mcp-server rather than opening separate Kubernetes client connections. Expect **refactoring in ovn-kubernetes-mcp and kubernetes-mcp-server** so each category uses a clear, single host-supplied execution path per invocation. |
There was a problem hiding this comment.
Shall we mention this explicitly in the document? for what I understand the current kubernetes-mcp-server does not have any node-debug method capability so far. so if that needs to be implemented is worth to call it out in this section.
|
|
||
| **Split of work:** kubernetes-mcp-server decides how each capability is exposed to MCP users (tool names and parameters). ovn-kubernetes-mcp keeps handler logic that validates inputs, builds command lines, and defines execution contracts; kubernetes-mcp-server integrates by calling those libraries and supplying pod exec, node-level debugging, or other supported cluster operations against the target cluster. | ||
|
|
||
| ```mermaid |
There was a problem hiding this comment.
On the diagram few things tripped me up:
- The main call relationship (kubernetes-mcp-server's tool handler calling ovn-kubernetes-mcp's imported handler logic) isn't shown that's the core of the integration.
- "delegated_in_cluster_execution" sits inside the ovn-kubernetes-mcp box, but the actual execution will happen in kubernetes-mcp-server's client AFAIU. ovn-kubernetes-mcp defines the contract/interface; kubernetes-mcp-server implements it.
- The box only shows "OVN_OVS" but kernel and network-tools are also in scope, with a different execution path (node-debug vs pod-exec).
- The two subgraphs connected by a dotted arrow could be read as two separate services communicating at runtime, when in practice ovn-kubernetes-mcp will be compiled into kubernetes-mcp-server as an imported Go package.
Would something like this be more accurate? Let me know your thoughts
flowchart TB
subgraph kms [kubernetes-mcp-server process]
ToolHandler["Tool handler\n(defines MCP tool name, schema)"]
subgraph ovnkLib ["ovn-kubernetes-mcp (imported Go package)"]
HandlerLogic["Handler logic\n(validates inputs, builds commands)"]
end
subgraph executor [kubernetes-mcp-server K8s client]
PodExec["PodExec\n(OVN/OVS tools)"]
NodeDebug["NodeDebug\n(kernel / network-tools)"]
end
ToolHandler -->|"calls imported package"| HandlerLogic
HandlerLogic -->|"calls injected executor"| PodExec
HandlerLogic -->|"calls injected executor"| NodeDebug
PodExec -->|"exec in ovnkube pod"| Cluster["Cluster"]
NodeDebug -->|"privileged debug pod on node"| Cluster
end
No description provided.