feat(e2e): support running e2e tests on real OCP clusters with Prometheus alert validation#205
feat(e2e): support running e2e tests on real OCP clusters with Prometheus alert validation#205rlobillo wants to merge 4 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
❌ Generated Files Verification Failed One or more generated files in this PR are out of sync:
Please regenerate the files locally and commit the changes. |
ddf1ff8 to
2aa2c76
Compare
|
❌ Generated Files Verification Failed One or more generated files in this PR are out of sync:
Please regenerate the files locally and commit the changes. |
1 similar comment
|
❌ Generated Files Verification Failed One or more generated files in this PR are out of sync:
Please regenerate the files locally and commit the changes. |
|
/hold Tests need to be updated as soon as #207 is ready on a build. |
Add `make run-e2e` target that runs ginkgo directly against whatever cluster KUBECONFIG points to, without the Kind setup/teardown cycle. Adapt test setup to be non-destructive on real clusters: - drift_test.go and anti_thrashing_e2e_test.go use ensureHCOExists() and patchAutopilotAndWait() instead of deleting/recreating HCO - crd_lifecycle_test.go skips automatically on OCP (detected via ClusterVersion CRD presence) since CRD creation/deletion is only meaningful on Kind - Add isOpenShiftCluster() helper to helpers_test.go Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, VirtPlatformDependencyMissing) Table-driven tests for VirtPlatformSyncFailed using blocking webhooks to trigger SSA failures, with CNV-89450 workaround (delete resource to bypass dry-run). Passive test for VirtPlatformDependencyMissing when optional CRDs are absent. Tests skip on Kind (no Prometheus) and when gate CRDs are missing. Includes helpers for Prometheus querying via thanos-querier route with SA token, PrometheusRule patching, and metrics infrastructure readiness check (CNV-89454). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The unparam linter flagged that the namespace parameter always receives operatorNamespace. Use the package-level variable directly instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
f2e3f9f to
c90c024
Compare
|
❌ Generated Files Verification Failed One or more generated files in this PR are out of sync:
Please regenerate the files locally and commit the changes. |
The blocking webhook only intercepted Create operations, but SSA dry-run uses Patch (Update). With the CNV-89450 fix now setting compliance_status=0 on dry-run failure, the webhook must also block Update so DetectDrift() errors out and the VirtPlatformSyncFailed alert fires. Also removes the old workaround of deleting the resource. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
All tests passing on real OCP cluster |
Short Description
Enable running e2e tests against real OCP clusters (not just Kind) and add Prometheus alert tests for
VirtPlatformSyncFailedandVirtPlatformDependencyMissing.More details
The existing e2e suite was designed for Kind-only. This PR adapts the test lifecycle to be non-destructive on real clusters and adds a new
make run-e2e-tests-onlytarget that runs Ginkgo directly against whatever clusterKUBECONFIGpoints to, producing JUnit XML and JSON reports in_output/.Two new Prometheus alert tests are added:
ValidatingWebhookConfigurationthat blocks SSACreateandUpdatefor each managed asset, then triggers reconciliation so the SSA dry-run fails and the CNV-89450 fix setscompliance_status=0. Asserts the alert fires with correct labels (kind,name,severity: critical,operator: virt-platform-autopilot). Assets whose CRD or gateCRD are missing are skipped automatically./metricsendpoint forkubevirt_autopilot_missing_dependency == 1, then verifies awarningalert fires for each missing optional CRD individually.What this PR does / why we need it
Commit 1:
feat(e2e): support running e2e tests against real OCP clustersmake run-e2e-tests-onlytarget (runs Ginkgo with JUnit/JSON reports)drift_test.goandanti_thrashing_e2e_test.goto useensureHCOExists()/patchAutopilotAndWait()instead of deleting/recreating HCO (non-destructive on real clusters)crd_lifecycle_test.goskips on OCP (CRD creation/deletion only meaningful on Kind)isOpenShiftCluster()helper (detects OCP via ClusterVersion CRD presence)Commit 2:
feat(e2e): add Prometheus alert tests for OCPVirtPlatformSyncFailedtests for all phase-1 assets (MachineConfig, KubeletConfig, NodeHealthCheck, UIPlugin, KubeDescheduler)queryFiringAlert()returnsmap[string]stringlabels with PromQL label filters — concise logs ("not firing yet"/"firing — kind=X name=Y severity=Z")queryMetricExists()with clean logs ("not found yet"/"found (N series)")touchHCO()triggers reconciliation before metrics wait (handles idle pods with no recent metrics)getMissingDependenciesFromMetrics()parses/metricsfor individual missing dependencieswebhookCreatedflag guardsAfterEachcleanup (prevents timeouts on skipped/failed tests)fordurations reduced to 15s for faster test feedbackCommit 3:
fix(e2e): remove unused namespace param from captureAssetMetricsunparamlint:captureAssetMetricsnamespace parameter always receivedoperatorNamespace— use the package-level variable directlyCommit 4:
fix(e2e): block Update ops in webhook to trigger SSA dry-run failureCreateoperations, but SSA dry-run usesPatch(Update). With the CNV-89450 fix now settingcompliance_status=0on dry-run failure, the webhook must also blockUpdatesoDetectDrift()errors out and theVirtPlatformSyncFailedalert fires.🤖 Generated with Claude Code