Skip to content

HIVE-3131: Add ValidatingWebhookConfiguration for ClusterPool#2879

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
2uasimojo:HIVE-3131/cp-vwhc
Mar 25, 2026
Merged

HIVE-3131: Add ValidatingWebhookConfiguration for ClusterPool#2879
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
2uasimojo:HIVE-3131/cp-vwhc

Conversation

@2uasimojo
Copy link
Copy Markdown
Member

@2uasimojo 2uasimojo commented Mar 24, 2026

Turns out we've been running without this... forever, despite having plenty of code for it in our admission server.

Summary by CodeRabbit

  • New Features
    • Added a validation webhook for cluster pool resources. This webhook validates creation, update, and deletion operations through the Kubernetes admission control system. It ensures cluster pool configurations are validated before being accepted by the cluster.

Turns out we've been running without this... forever, despite having
plenty of code for it in our admission server.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 24, 2026

@2uasimojo: This pull request references HIVE-3131 which is a valid jira issue.

Details

In response to this:

Turns out we've been running without this... forever, despite having plenty of code for it in our admission server.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from dlom and jstuever March 24, 2026 16:29
@2uasimojo
Copy link
Copy Markdown
Member Author

/assign @suhanime

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Adds a new Kubernetes ValidatingWebhookConfiguration for clusterpool resource validation. The webhook manifest is embedded as a binary asset and integrated into the HiveAdmission operator's deployment flow to validate CREATE, UPDATE, and DELETE operations on clusterpools.

Changes

Cohort / File(s) Summary
Webhook Manifest
config/hiveadmission/clusterpool-webhook.yaml
New ValidatingWebhookConfiguration resource that validates CREATE, UPDATE, and DELETE operations on clusterpools in the hive.openshift.io API group, communicating with the Kubernetes service in the default namespace.
Binary Asset Integration
pkg/operator/assets/bindata.go
Generated code updated to embed the new webhook manifest as a binary asset, adding accessor functions and registering the asset in the bindata index for runtime retrieval.
Operator Deployment
pkg/operator/hive/hiveadmission.go
Webhook asset manifest added to the webhookAssets list, enabling the HiveAdmission operator to deploy the new clusterpool validation webhook during reconciliation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A webhook hops into the cluster so bright,
Clusterpools guarded with validation might,
Create, update, delete—all checked with care,
One manifest file, zero bugs to repair! ✓

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a ValidatingWebhookConfiguration for ClusterPool, which matches the core modifications across all three files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 24, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 24, 2026

@2uasimojo: This pull request references HIVE-3131 which is a valid jira issue.

Details

In response to this:

Turns out we've been running without this... forever, despite having plenty of code for it in our admission server.

Summary by CodeRabbit

  • New Features
  • Added a validation webhook for cluster pool resources. This webhook validates creation, update, and deletion operations through the Kubernetes admission control system. It ensures cluster pool configurations are validated before being accepted by the cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@2uasimojo
Copy link
Copy Markdown
Member Author

👀 e2e-pool...

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
config/hiveadmission/clusterpool-webhook.yaml (1)

8-9: Add v1 to admissionReviewVersions for forward compatibility.

The webhook only specifies v1beta1 for admission review versions. Modern Kubernetes best practice is to list v1 as the preferred version, optionally including v1beta1 for backward compatibility with older clients. The project targets Kubernetes 1.34, which fully supports v1 for admission review versions.

Suggested change
   admissionReviewVersions:
+  - v1
   - v1beta1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/hiveadmission/clusterpool-webhook.yaml` around lines 8 - 9, The
manifest currently lists only admissionReviewVersions: - v1beta1; update the
admissionReviewVersions array to include v1 (preferably first) for forward
compatibility with Kubernetes 1.34 while optionally keeping v1beta1 for older
clients — e.g., change the admissionReviewVersions entries to include "v1" and
"v1beta1" (reference the admissionReviewVersions key and the existing "v1beta1"
entry to locate the change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@config/hiveadmission/clusterpool-webhook.yaml`:
- Around line 8-9: The manifest currently lists only admissionReviewVersions: -
v1beta1; update the admissionReviewVersions array to include v1 (preferably
first) for forward compatibility with Kubernetes 1.34 while optionally keeping
v1beta1 for older clients — e.g., change the admissionReviewVersions entries to
include "v1" and "v1beta1" (reference the admissionReviewVersions key and the
existing "v1beta1" entry to locate the change).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b095168-3632-47e6-b42e-228ac6494450

📥 Commits

Reviewing files that changed from the base of the PR and between 14bd5ee and f2e18e7.

📒 Files selected for processing (3)
  • config/hiveadmission/clusterpool-webhook.yaml
  • pkg/operator/assets/bindata.go
  • pkg/operator/hive/hiveadmission.go

@2uasimojo
Copy link
Copy Markdown
Member Author

@coderabbitai if possible, it sure would be helpful for you to hop on over to pkg/validating-webhooks/hive/v1/clusterpool_validating_admission_hook.go and trace the validations we're going to suddenly be performing as the result of this PR. Since we haven't been running this code in production, there is a chance it has drifted and that some of it is incorrect. We're less worried about missing checks at the moment -- after all, we've been missing everything up to now -- but rather, we're concerned that there may be checks in place that will reject valid configurations, thereby causing regressions in behavior in the field.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull up the validation hook file and any relevant types so I can trace through all the checks.


🏁 Script executed:

#!/bin/bash
# Read the main validation hook file
cat pkg/validating-webhooks/hive/v1/clusterpool_validating_admission_hook.go

echo "======= ClusterPool types ======="
fd -e go -g '*clusterpool*' --full-path

Length of output: 9519


The key function validateClusterPoolPlatform isn't defined in that file. Let me find it.


🏁 Script executed:

#!/bin/bash
echo "======= validateClusterPoolPlatform definition ======="
rg -n "func validateClusterPoolPlatform" --type go -A 80

echo ""
echo "======= ClusterPool API types ======="
fd -e go --full-path | xargs grep -l "type ClusterPool struct" 2>/dev/null

echo ""
echo "======= ClusterPoolPlatform type ======="
rg -n "type ClusterPoolPlatform" --type go -A 30

echo ""
echo "======= DNS1123LabelMaxLength value ======="
rg -n "DNS1123LabelMaxLength" --type go | head -5

echo ""
echo "======= Other validation helpers used in this webhook package ======="
rg -n "^func validate" pkg/validating-webhooks/hive/v1/ --type go

Length of output: 17638


🏁 Script executed:

#!/bin/bash
echo "======= validatePlatformConfiguration ======="
sed -n '476,590p' pkg/validating-webhooks/hive/v1/clusterdeployment_validating_admission_hook.go

echo ""
echo "======= ClusterPool types ======="
cat apis/hive/v1/clusterpool_types.go

echo ""
echo "======= Platform type (to understand what platforms ClusterPool can use) ======="
rg -n "type Platform struct" apis/ --type go -A 60

Length of output: 49493


@2uasimojo: Here's a detailed trace of all validations that will now be enforced. The call chain is:

validateCreate / validateUpdatevalidateClusterPoolPlatformvalidatePlatformConfiguration


CREATE checks

1. Name length (validateCreate only)
Names longer than 63 characters (DNS1123LabelMaxLength) are rejected. This is a straightforward, unambiguous check — no regression risk unless someone has existing pools with names > 63 chars (unlikely to be valid Kubernetes names anyway, since metadata.name already has this constraint at the API layer).

2. Platform configuration (all platforms below)

The validatePlatformConfiguration function enforces "exactly one platform must be specified" and per-platform field checks. Here's a breakdown by platform:

Platform Required fields enforced Regression concern?
AWS credentialsSecretRef.name OR credentialsAssumeRole (mutually exclusive), region Low — both paths are well-defined
Azure credentialsSecretRef.name, region Low
GCP credentialsSecretRef.name, region ⚠️ See below
OpenStack credentialsSecretRef.name, cloud; if certificatesSecretRef pointer is set, its name must be non-empty Low
vSphere credentialsSecretRef.name, certificatesSecretRef.name, vCenter, datacenter, defaultDatastore ⚠️ See below
IBMCloud credentialsSecretRef.name, region Low
Nutanix credentialsSecretRef.name, prismCentral.address, prismCentral.port (non-zero) Low
BareMetal / AgentBareMetal / None Only counted (no field validation) None

⚠️ Potential regression risks

GCP — credentials secret marked optional in the type:

In apis/hive/v1/gcp/platform.go, CredentialsSecretRef carries // +optional. This means any GCP ClusterPool using Workload Identity (without a credentials secret) would be rejected by the validator with "must specify secrets for GCP access". If any production GCP pools omit this field today, they'll fail to create or update after this PR.

// In gcp/platform.go:
// +optional
CredentialsSecretRef corev1.LocalObjectReference `json:"credentialsSecretRef"`

// In the validator — always required:
if gcp.CredentialsSecretRef.Name == "" {
    allErrs = append(allErrs, field.Required(..., "must specify secrets for GCP access"))
}

vSphere — certificatesSecretRef required but non-pointer:

The vsphere.Platform type has CertificatesSecretRef corev1.LocalObjectReference as a non-pointer, non-optional field, and the validator requires its .Name to be non-empty. If users have vSphere ClusterPools without this populated (relying on defaults/empty), they'll now fail. Worth checking whether the CRD OpenAPI schema already enforces this field.


UPDATE path — all CREATE platform checks re-run on every update

This is the subtler risk. validateUpdate calls validateClusterPoolPlatform on every update — there's no check like "only validate platform if platform changed." This means if any existing ClusterPool has a platform configuration that would fail the above checks (most notably GCP without credentialsSecretRef), any update to that pool — even a trivial change to spec.size or spec.labels — will be rejected.


Minor observation (semantic, not a regression risk)

The AWS mutual-exclusivity error uses field.Required with the message "cannot specify assume role when credentials secret is provided" — that's semantically incorrect (should be field.Invalid), but it only fires when both are specified, so it won't cause false rejections of valid configs.


Summary of what to verify before merging:

  1. Do any production GCP ClusterPools omit credentialsSecretRef (using Workload Identity)?
  2. Do any production vSphere ClusterPools omit certificatesSecretRef?
  3. The e2e-pool tests you're watching should cover basic CREATE/UPDATE paths — but if they only test AWS, the GCP gap could be missed.

@2uasimojo
Copy link
Copy Markdown
Member Author

Thanks, @coderabbitai. The paths you've flagged also exist for ClusterDeployment, which we have been validating, and all seem appropriate. I think we're good here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Tip

For best results, initiate chat on the files or code changes.

@2uasimojo: That's reassuring — good to know those same constraints have been battle-tested via ClusterDeployment validation already. Thanks for the context, and good luck with the e2e-pool run! 🚀

@2uasimojo
Copy link
Copy Markdown
Member Author

@coderabbitai Meanwhile, I've opened https://redhat.atlassian.net/browse/HIVE-3132 to track adding v1 to our admissionReviewVersions (for all our ValidatingWebhookConfigurations, not just this one). We can ignore that issue for this PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Tip

For best results, initiate chat on the files or code changes.

@2uasimojo: Sounds good — noted that HIVE-3132 is tracking the admissionReviewVersions: v1 upgrade across all webhooks as a separate effort. I'll leave it out of scope for this PR. Good luck with the e2e run! 🙌

@2uasimojo
Copy link
Copy Markdown
Member Author

/hold for QE

/cc @jianping-shu

@openshift-ci openshift-ci bot requested a review from jianping-shu March 24, 2026 17:57
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.29%. Comparing base (14bd5ee) to head (f2e18e7).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
pkg/operator/assets/bindata.go 0.00% 9 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2879      +/-   ##
==========================================
- Coverage   50.31%   50.29%   -0.02%     
==========================================
  Files         280      280              
  Lines       34314    34323       +9     
==========================================
  Hits        17264    17264              
- Misses      15689    15698       +9     
  Partials     1361     1361              
Files with missing lines Coverage Δ
pkg/operator/hive/hiveadmission.go 0.00% <ø> (ø)
pkg/operator/assets/bindata.go 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jianping-shu
Copy link
Copy Markdown
Contributor

@2uasimojo ClusterPool shares the real validation functions with ClusterDeployment so I only ran a lightweight test with test case OCP-88412, PTAL

@2uasimojo
Copy link
Copy Markdown
Member Author

Thanks @jianping-shu. Based on the error you saw from the vSphere test (spec.platform.vsphere.vCenter: Required) I assume you ran this on a build without #2851?

/hold cancel

/assign @dlom

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 25, 2026
@dlom
Copy link
Copy Markdown
Contributor

dlom commented Mar 25, 2026

From a code maintenance standpoint, this change doesn't introduce any new business logic; we're just calling into code we already have. I imagine there could be repercussions down the line (people potentially relying on malformed cluster pool specs? unlikely tbh), but we need to have this for completeness.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 25, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 25, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 2uasimojo, dlom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 25, 2026

@2uasimojo: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 8296ffc into openshift:master Mar 25, 2026
25 checks passed
@2uasimojo 2uasimojo deleted the HIVE-3131/cp-vwhc branch March 25, 2026 22:20
@jianping-shu
Copy link
Copy Markdown
Contributor

Based on the error you saw from the vSphere test (spec.platform.vsphere.vCenter: Required) I assume you ran this on a build without #2851?

@2uasimojo Yes

2uasimojo added a commit to 2uasimojo/hive that referenced this pull request Mar 26, 2026
This was missed in openshift#1672.

Side quest: Dropping `DELETE` from the ClusterPool VWHC added in openshift#2879.
It didn't have functional impact, but would have been wastefully sending
Delete requests to our hiveadmission service.

Also adding some coderabbit config intended to catch this kind of miss
in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants