Skip to content

[DNM][main] Rework webhook TLS and RBAC: replace dynamiclistener, bake WebhookConfigurations#1502

Draft
crobby wants to merge 8 commits into
rancher:mainfrom
crobby:webhookconfig-rework
Draft

[DNM][main] Rework webhook TLS and RBAC: replace dynamiclistener, bake WebhookConfigurations#1502
crobby wants to merge 8 commits into
rancher:mainfrom
crobby:webhookconfig-rework

Conversation

@crobby
Copy link
Copy Markdown
Collaborator

@crobby crobby commented May 26, 2026

Problem

The webhook currently uses dynamiclistener for TLS certificate management and ships with a cluster-admin ClusterRoleBinding. This is more privilege than necessary and dynamiclistener adds complexity that can be replaced with simpler file-based TLS.

Solution

  • Replace dynamiclistener + secretHandler with file-based TLS
  • Bake WebhookConfigurations and needacert annotation into the chart (instead of runtime generation)
  • Narrow RBAC from cluster-admin to an explicit ClusterRole with only required permissions
  • Start TLS server in a goroutine so informers can start independently
  • Make securityContext unconditional (only NET_BIND_SERVICE capability is conditional)
  • Populate caBundle from TLS secret after helm upgrade

Related PRs

Testing

  • Webhook starts and serves TLS with file-based certs
  • caBundle populated correctly after helm upgrade
  • RBAC scoped correctly — webhook can perform all required operations
  • securityContext applied unconditionally
  • No regression in admission webhook behavior

crobby added 8 commits May 26, 2026 16:56
Remove leader election, secretHandler, ensureWebhookConfiguration, and
dynamiclistener dependency. The webhook now reads serving certs from
mounted files (/tmp/k8s-webhook-server/serving-certs/) populated by
needacert via a projected Secret volume. Cert rotation is handled by
re-reading the files on each TLS handshake. WebhookConfiguration
ownership has moved to the rancher-webhook Helm chart.
NewErrorChecker initializes with a not-ready error. The old
secretHandler.sync() cleared it; with that gone the health endpoint
returned 500 permanently. Clear the error after the serving cert is
successfully loaded, just before ListenAndServeTLS.
Full ValidatingWebhookConfiguration (31 entries) and
MutatingWebhookConfiguration (9 entries) with per-entry failurePolicy
preserved. MCM-only entries gated on .Values.mcm.enabled. Service
annotated for needacert, deployment mounts cattle-webhook-tls secret
at /tmp/k8s-webhook-server/serving-certs.
Replaces the cluster-admin ClusterRoleBinding with a scoped
ClusterRole + renamed ClusterRoleBinding (rancher-webhook-binding).
Helm will prune the old rancher-webhook CRB on upgrade since roleRef
is immutable. Enumerates built-in rke-machine types; custom node
drivers will need a chart update to add their machine resource.
ListenAndServeTLS blocks, so clients.Start (which starts informer
caches) never ran. Validators that read from caches (cluster lookups,
RBAC checks, etc.) silently returned empty results. Move the listener
into a goroutine, start clients after, and block on ctx.Done.
@crobby crobby requested a review from a team as a code owner May 26, 2026 20:57
@crobby crobby changed the title Rework webhook TLS and RBAC: replace dynamiclistener, bake WebhookConfigurations [DNM][main] Rework webhook TLS and RBAC: replace dynamiclistener, bake WebhookConfigurations May 26, 2026
@crobby crobby marked this pull request as draft May 26, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant