Skip to content

feat: detect and warn when OVN routingViaHost is not enabled#386

Open
Bobbins228 wants to merge 1 commit into
kagenti:mainfrom
Bobbins228:feat/rhaieng-5326-ovn-network-check
Open

feat: detect and warn when OVN routingViaHost is not enabled#386
Bobbins228 wants to merge 1 commit into
kagenti:mainfrom
Bobbins228:feat/rhaieng-5326-ovn-network-check

Conversation

@Bobbins228
Copy link
Copy Markdown
Contributor

@Bobbins228 Bobbins228 commented May 29, 2026

Summary

Adds a startup-time network configuration check that detects when OVN-Kubernetes routingViaHost is not enabled on OpenShift clusters. Without this setting, Istio ambient mode's ztunnel cannot intercept pod-to-pod traffic and mTLS/authorization policies are silently bypassed. The operator logs a warning at startup — no per-CR conditions, no per-reconcile overhead.

Changes

  • network_check.goNetworkOperatorCRDExists (CRD discovery at startup) and CheckOVNNetworkConfig (reads network.operator.openshift.io/cluster, returns a warning string if misconfigured)
  • cmd/main.go — runs the check once at startup after CRD discovery, logs warning if misconfigured
  • RBACget on operator.openshift.io/networks in config/rbac/role.yaml and Helm chart
  • network_check_test.go — 5 unit tests covering misconfigured OVN, correct config, non-OVN clusters, and missing resource

Copy link
Copy Markdown
Contributor

@ChristianZaccaria ChristianZaccaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this, we may need to reconsider the approach, please see comments below for more info.

}

// 4.2. Check OVN network configuration (OpenShift only).
r.checkNetworkConfig(ctx, rt)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reconsider the approach here. For context, the network check is a cluster-level fact and not per-workload concern. It doesn't vary per agent or runtime, hence the agenruntime_controller shouldn't be responsible for performing this check.

Moreover, with the changes in agentruntime_controller, this would mean per N agents, you get N API calls on every reconcile which can be a lot.

Simpler approach: Do the check once at startup + log warning if necessary in main.go:

Startup check + log warning. Run CheckOVNNetworkConfig once in main.go right after NetworkOperatorCRDExists. Log a clear warning. Expected around <50 lines of code total, zero per-reconcile overhead, no per-CR conditions to maintain. Operators see it in the pod logs.

}

// NetworkCheckResult represents the outcome of checking OVN network configuration.
type NetworkCheckResult struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the whole struct:
Since we probably won't propagate the conditions or messages to the AgentRuntime CR, we can remove them entirely, and just return a simple string to be logged by the operator.

Add a reconcile-time check to the AgentRuntime controller that reads
network.operator.openshift.io/cluster on OpenShift clusters and surfaces
a NetworkReady status condition when OVN-Kubernetes is present but
routingViaHost is not configured. Without this setting, Istio ambient
mode's ztunnel cannot intercept pod-to-pod traffic, silently bypassing
mTLS and authorization policies.

The check is automatically enabled via CRD discovery at startup (same
pattern as TektonConfig) and only requires read-only RBAC on
operator.openshift.io/networks. No cluster infrastructure is mutated.

Closes: RHAIENG-5326
Spike: RHAIENG-4900

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Bobbins228 <mcampbel@redhat.com>
@Bobbins228 Bobbins228 force-pushed the feat/rhaieng-5326-ovn-network-check branch from 4db79bb to 8d864c9 Compare May 29, 2026 13:32
@Bobbins228
Copy link
Copy Markdown
Contributor Author

Addressed both review comments:

  1. Moved check to startupCheckOVNNetworkConfig now runs once in main.go right after NetworkOperatorCRDExists. No per-reconcile overhead, no per-CR conditions to maintain. The reconciler is untouched.

  2. Removed NetworkCheckResult structCheckOVNNetworkConfig returns a plain warning string (empty = no issue). ~350 lines removed from the PR.

RBAC narrowed to just get (single read at startup, no list/watch needed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New /:ToDo

Development

Successfully merging this pull request may close these issues.

3 participants