Skip to content

feat: configurable data-plane Service shape via GatewayClassParameters (#70)#71

Open
oysteins10 wants to merge 17 commits into
varnish:mainfrom
oysteins10:feat/configurable-service-shape
Open

feat: configurable data-plane Service shape via GatewayClassParameters (#70)#71
oysteins10 wants to merge 17 commits into
varnish:mainfrom
oysteins10:feat/configurable-service-shape

Conversation

@oysteins10
Copy link
Copy Markdown

@oysteins10 oysteins10 commented May 25, 2026

Summary

Fixes #70 — exposes the data-plane Service shape through GatewayClassParameters.spec.service, unblocking gateway-behind-gateway, bare-metal MetalLB, and internal-cache deployment patterns.

  • New ServiceConfig CRD type with six fields: type, annotations, labels, loadBalancerClass, loadBalancerSourceRanges, externalTrafficPolicy
  • CEL admission-time validation rules enforce field-type compatibility (e.g., loadBalancerClass only with type: LoadBalancer)
  • Per-Gateway overlay for labels and annotations via the upstream Gateway.spec.infrastructure field; other knobs stay GatewayClass-level
  • Sentinel-based merger (varnish.io/managed-{annotations,labels}) preserves annotations added by cloud controllers (AWS LB Controller, MetalLB, etc.) across reconciles
  • Service-shape changes do not trigger pod restarts (Service decoupled from infra hash; locked by reflection-based regression test)
  • Backwards compatible: a GatewayClassParameters with no service field produces an identical Type: LoadBalancer Service to before

Architecture

  • Pure resolveServiceConfig(gateway, params) layers defaults → GatewayClass config → Gateway.spec.infrastructure overlay for labels/annotations
  • buildService consumes the resolved config; Spec.Selector is built only from controller-managed labels (user labels can never break pod selection)
  • needsServiceUpdate extended to detect drift in all owned fields plus the managed subset of annotations/labels; cloud-controller-added keys are ignored
  • Update path re-merges annotations/labels via the sentinel; unconditionally re-applies controller-managed labels as a defense against hand-edited Services

Test plan

  • Unit tests: 9 resolver, 10 merger, 10 buildService, 14 needsServiceUpdate cases, 1 controller-managed-labels invariant, 1 infra-hash regression (reflection)
  • envtest scenarios: defaults to LoadBalancer, type transition (LB → ClusterIP preserves ports), cloud-controller annotation preservation
  • make build-go, make test-go, make test-envtest, make vet, make verify-manifests all green
  • Conformance suite (make test-conformance) not run — requires a live cluster. Recommend running before merge.
  • Manual e2e worth doing pre-merge on a real cluster:
    • LoadBalancer → ClusterIP → LoadBalancer cycle preserves clusterIP and re-populates loadBalancer.ingress
    • ClusterIP gateway reachable via in-cluster DNS
    • MetalLB-annotated Service receives its requested IP

Notes

  • Pre-existing make staticcheck failures (deprecation warnings for Requeue and fake.NewSimpleClientset) are in untouched files; none of the new/modified files trigger any lint findings.
  • The plan and design spec used to drive the work are intentionally not part of this PR (kept as local working notes).

🤖 Generated with Claude Code

oysteins10 and others added 16 commits May 25, 2026 16:32
Pure function resolving GatewayClassParameters.spec.service + Gateway.spec.infrastructure
into a fully-defaulted ResolvedServiceConfig. Defaults Type to LoadBalancer when omitted;
overlays Gateway-level annotations/labels on top of class-level defaults without aliasing inputs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l merger

Implements the sentinel-based merge function that writes desired keys onto
the existing map, prunes previously-managed keys no longer in desired,
leaves unmanaged external keys untouched, and enforces protected-key policy.
Eight tests cover first-apply, drift-add, drift-remove, external key
preservation, empty desired pruning, sorted sentinel, protected key dropping,
and nil existing map handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the ResolvedServiceConfig produced by resolveServiceConfig through
buildService: service type, LoadBalancer knobs, ExternalTrafficPolicy,
and user annotations/labels (via managed-keys sentinel). Selector is
built from controller-managed labels only so user labels can never break
pod selection. Update all five call sites (4 in resources_test, 1 in
gateway_controller_test) and add a placeholder in reconcileResources
(Task 6 replaces with real resolver invocation). Add controllerManagedLabelKeys
helper and 5 new TestBuildService_* tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce shape

Extend needsServiceUpdate to compare Type, LoadBalancerClass,
LoadBalancerSourceRanges, ExternalTrafficPolicy, and managed
annotations/labels (via sentinel) in addition to ports. Add
stringPtrEqual, stringSliceEqual, and managedMapEqual helpers.

Replace the Service update branch in reconcileResource to apply all
operator-owned fields when drift is detected, using desiredAnnotationsFromSentinel,
desiredLabelsFromSentinel, and buildLabelsForService helpers to
preserve cloud-controller-managed state during updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hoist params out of the short-variable-declaration if-block so it is in
scope at the buildService call site, then replace the hardcoded
LoadBalancer default with resolveServiceConfig(gateway, params). The
resolver is nil-safe, so gateways without GatewayClassParameters
continue to produce an identical LoadBalancer Service.

Add TestInfraHash_ServiceConfigDoesNotAffectHash to pin the design
invariant that Service shape (Type, annotations, labels, LB knobs) is
deliberately excluded from InfrastructureConfig and must never trigger
pod restarts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@oysteins10
Copy link
Copy Markdown
Author

Conformance run (local, k3d)

Ran make test-conformance against a k3d v5.7.4 cluster (k3s v1.30.4) with Gateway API v1.5.0 CRDs and MetalLB, operator built from this branch (VERSION=k3d).

Profile Result Passed Failed Skipped
Core success 33 0 0
Extended success 9 0 0

Profile: GATEWAY-HTTP. 27 test functions PASS, 37 SKIP (TLSRoute, UDPRoute, ListenerSet, Timeout, Mirror — all intentionally unsupported by this operator), 0 FAIL. Runtime 129s.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 88.54167% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.77%. Comparing base (dc332ae) to head (b6e6709).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/gateway_controller.go 80.00% 15 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #71      +/-   ##
==========================================
+ Coverage   68.02%   68.77%   +0.74%     
==========================================
  Files          39       40       +1     
  Lines        6096     6267     +171     
==========================================
+ Hits         4147     4310     +163     
- Misses       1618     1620       +2     
- Partials      331      337       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Three findings from review of the configurable Service shape feature.

1. Labels sentinel was written into .metadata.labels with a
   comma-separated key list. Commas and slashes are not valid in
   Kubernetes label values, so the API server rejected any Service
   produced by a config with >= 2 user labels or any DNS-prefixed
   label key (e.g. app.kubernetes.io/component). The Gateway
   reconcile failed every loop. Both sentinels now live in
   .metadata.annotations; the user-supplied label values still live
   in .Labels and are looked up via the annotation-borne key list.

2. The Service update path replaced Spec.Ports wholesale with the
   desired slice, which has NodePort: 0. On every reconcile for a
   NodePort/LoadBalancer Service the API server would reallocate node
   ports, disconnecting clients connected via the old allocation.
   Added mergeServicePorts to carry existing NodePort allocations
   over by port name.

3. Protected-label collisions were silently dropped despite the API
   doc claiming a log warning. buildService now logs a Warn entry
   per collision, naming the gateway and the dropped key.

Tests:
- TestBuildService_MultipleLabels_SentinelIsValidLabelValue
- TestServiceShape_MultipleLabels_Accepted_Envtest (would fail with
  API-server "Invalid value" prior to the fix)
- TestMergeServicePorts_* (preserve, desired-wins, renamed)

Co-Authored-By: Claude <noreply@anthropic.com>
@oysteins10
Copy link
Copy Markdown
Author

@perbu hva er status her?

@perbu
Copy link
Copy Markdown
Contributor

perbu commented May 29, 2026

@oysteins10 I've cleaned up the PR, it is ready to merged. This is a not a bug fix, so it will require a major release. I have a couple of bugs I wanna squash in this cycle before I merge this.

@oysteins10
Copy link
Copy Markdown
Author

Makes sense, crd changes should be major 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GatewayClassParameters: configurable Service shape (type, annotations) for internal / behind-another-L7 deployments

2 participants