Skip to content

[CCXDEV-16390] [Proposal] Add KinD scripts for running DVO e2e tests locally#676

Draft
ikerreyes wants to merge 1 commit into
app-sre:masterfrom
ikerreyes:kind-cluster
Draft

[CCXDEV-16390] [Proposal] Add KinD scripts for running DVO e2e tests locally#676
ikerreyes wants to merge 1 commit into
app-sre:masterfrom
ikerreyes:kind-cluster

Conversation

@ikerreyes

@ikerreyes ikerreyes commented May 4, 2026

Copy link
Copy Markdown

Summary

Add scripts to create a KinD cluster, deploy the DVO operator, and run the
existing e2e test suite without requiring a full OpenShift cluster.

  • kind/config.yaml: KinD cluster config with NodePort mapping for metrics
  • kind/setup.sh: creates cluster, builds and loads images, deploys DVO
  • kind/run-tests.sh: runs tests from the test container image
  • kind/teardown.sh: deletes the cluster
  • kind/README.md: usage and configuration documentation

Usage:

./kind/setup.sh
./kind/run-tests.sh
./kind/teardown.sh

Dependencies

This MR requires the corresponding changes in:

  • ccx-ocp-testlib: Fix Kubernetes compatibility issues for KinD testing (see MR)
  • deployment-validation-operator-tests: Support running tests on vanilla Kubernetes (see MR)

How to test locally

The test container image (quay.io/redhatqe/deployment-validation-operator-tests:latest)
does not include the dependency changes yet. To test end-to-end, build a local
test image that overlays the unreleased changes.

1. Clone the repos side by side

parent/
├── ccx-ocp-testlib/                       # with the KinD compatibility branch
├── deployment-validation-operator/        # this MR
└── deployment-validation-operator-tests/  # with the vanilla K8s support branch

2. Build a local test image

A kind/Dockerfile.tests is provided for this purpose:

FROM quay.io/redhatqe/deployment-validation-operator-tests:latest

USER 0

COPY ccx-ocp-testlib /tmp/ccx-ocp-testlib
COPY deployment-validation-operator-tests/tests /insights-operator-tests/tests
COPY deployment-validation-operator-tests/shared /insights-operator-tests/shared

RUN pip install --no-cache-dir -e /tmp/ccx-ocp-testlib

USER 1001

Build from the parent directory so all repos are in the build context:

cd parent/
podman build -f deployment-validation-operator/kind/Dockerfile.tests \
    -t localhost/dvo-tests:dev .

3. Run the setup and tests

cd deployment-validation-operator/
./kind/setup.sh
./kind/run-tests.sh --image localhost/dvo-tests:dev

4. Clean up

./kind/teardown.sh

Assisted-by: Claude Opus 4.6

Summary by CodeRabbit

  • Documentation

    • Added guide for running the test suite on a local KinD cluster with prerequisites and configurable options.
  • Chores

    • Added automation scripts for KinD cluster provisioning, test execution, and cleanup with environment variable support.

Add scripts to create a KinD cluster, deploy the DVO operator, and run
the existing e2e test suite without requiring a full OpenShift cluster.

- kind/config.yaml: KinD cluster config with NodePort mapping for metrics
- kind/setup.sh: creates cluster, builds and loads images, deploys DVO
- kind/run-tests.sh: runs tests from the test container image
- kind/teardown.sh: deletes the cluster
- kind/README.md: usage and configuration documentation

Usage:
  ./kind/setup.sh
  ./kind/run-tests.sh
  ./kind/teardown.sh

Assisted-by: Claude Opus 4.6
@openshift-ci openshift-ci Bot requested review from ncaak and opokornyy May 4, 2026 11:49
@openshift-ci

openshift-ci Bot commented May 4, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ikerreyes
Once this PR has been reviewed and has the lgtm label, please assign ncaak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented May 4, 2026

Copy link
Copy Markdown

Hi @ikerreyes. Thanks for your PR.

I'm waiting for a app-sre member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai

coderabbitai Bot commented May 4, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR introduces KinD (Kubernetes-in-Docker) infrastructure and scripting to enable local end-to-end testing of the deployment-validation-operator without an OpenShift cluster. It includes cluster configuration, provisioning, test execution, teardown scripts, and supporting documentation.

Changes

KinD Local Testing Infrastructure

Layer / File(s) Summary
Infrastructure Configuration
kind/config.yaml
KinD cluster definition with a single control-plane node and TCP port mapping (30383 → 8383) for metrics access.
Cluster Setup & Provisioning
kind/setup.sh
Provisions KinD cluster, builds and loads operator and HTTPD images, applies RBAC/config manifests with namespace patching, deploys operator with KinD-specific adaptations (NodePort metrics service, removal of node selectors/tolerations/affinity), and waits for operator readiness.
Test Execution
kind/run-tests.sh
Runs test suite from a container image with temporary kubeconfig injection, environment variable configuration for metrics URL and HTTPD image, and support for custom test image and pytest argument passthrough.
Cluster Teardown
kind/teardown.sh
Deletes the KinD cluster by name, with configurable cluster name via environment variable.
Documentation
kind/README.md
Documents prerequisites, three-step usage flow, environment variables, script responsibilities, and KinD-specific adaptations (NodePort Routes, manifest stripping).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A cluster hops into place, so light and small,
KinD brings the tests without OpenShift's call!
Setup, run, teardown—bash scripts dancing free,
Local validation ops, fast as can be! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding KinD scripts to enable local e2e testing without full OpenShift infrastructure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
kind/run-tests.sh (1)

8-8: ⚡ Quick win

DVO_METRICS_URL is hardcoded and not env-overridable.

All other configurable values in this script are derived from environment variables with defaults (e.g., CLUSTER_NAME, TEST_IMAGE, HTTPD_IMAGE). DVO_METRICS_URL is the sole exception. If someone changes hostPort in kind/config.yaml (e.g., because 8383 is already in use), they must also edit this script.

♻️ Proposed fix
-DVO_METRICS_URL="http://localhost:8383"
+DVO_METRICS_URL="${DVO_METRICS_URL:-http://localhost:8383}"

Also add a row to the README env-var table:

| `DVO_METRICS_URL` | `http://localhost:8383` | Metrics endpoint for the test suite |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kind/run-tests.sh` at line 8, Replace the hardcoded DVO_METRICS_URL
assignment in kind/run-tests.sh with an environment-overridable default (use the
existing pattern used for CLUSTER_NAME/TEST_IMAGE/HTTPD_IMAGE so DVO_METRICS_URL
can be overridden when needed) and update any references to use that variable
(look for DVO_METRICS_URL in the script). Also add a row to the README env-var
table documenting DVO_METRICS_URL with default "http://localhost:8383" and a
short description like "Metrics endpoint for the test suite".
kind/setup.sh (1)

79-91: Consider folding the "2m" timeout substitution into the Python YAML transform.

The sed pattern s|value: "2m"|value: "10s"| (line 81) is unanchored — it relies on a single exact string match rather than targeting a specific field. While only one env var (VALIDATION_CHECK_INTERVAL) currently holds this value, the pattern remains fragile for future maintenance. Since the Python YAML transform already processes the document, the change can be done precisely there by targeting the specific env var by name.

♻️ Proposed refactor — move timeout change into the Python block
 sed -e "s|quay.io/deployment-validation-operator/dv-operator:latest|${DVO_IMAGE}|" \
     -e "s|imagePullPolicy: Always|imagePullPolicy: IfNotPresent|" \
-    -e 's|value: "2m"|value: "10s"|' \
     "${DEPLOY_DIR}/operator.yaml" | \
     python3 -c "
 import sys, yaml
 doc = yaml.safe_load(sys.stdin)
 spec = doc['spec']['template']['spec']
 spec.pop('nodeSelector', None)
 spec.pop('tolerations', None)
 spec.pop('affinity', None)
+for env in spec.get('containers', [{}])[0].get('env', []):
+    if env.get('name') == 'VALIDATION_CHECK_INTERVAL' and env.get('value') == '2m':
+        env['value'] = '10s'
 yaml.dump(doc, sys.stdout, default_flow_style=False)
 " | kubectl apply -f - -n "${DVO_NAMESPACE}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kind/setup.sh` around lines 79 - 91, The sed-based substitution for value:
"2m" is brittle; remove the sed pattern and instead update the Python YAML
transform to find the container env list in the document and change the env var
named VALIDATION_CHECK_INTERVAL from "2m" to "10s" (e.g., locate
doc['spec']['template']['spec'] → containers → env and mutate the entry where
env['name']=="VALIDATION_CHECK_INTERVAL"). Also keep the other edits (image and
pullPolicy) in sed, but delete the unanchored s|value: "2m"|...|; ensure the
Python block writes the modified YAML to stdout and preserves existing
spec.pop(...) removals (nodeSelector, tolerations, affinity).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@kind/run-tests.sh`:
- Around line 53-69: The temporary kubeconfig created by KUBECONFIG_TMP via
mktemp can leak if any command (kind get kubeconfig, chmod, or the
${CONTAINER_ENGINE} run) fails; add a trap to ensure cleanup on any exit.
Specifically, define KUBECONFIG_TMP="$(mktemp)", then immediately register a
trap like trap 'rm -f "${KUBECONFIG_TMP}"' EXIT (or a function that removes the
file) before calling kind get kubeconfig and before running ${CONTAINER_ENGINE}
run, so rm -f "${KUBECONFIG_TMP}" is executed on normal exit and on errors; keep
the existing final rm -f as a no-op fallback.

In `@kind/teardown.sh`:
- Around line 6-8: The teardown script currently calls kind delete cluster
--name "${CLUSTER_NAME}" unconditionally which fails if the cluster doesn't
exist; modify teardown.sh to first check for the cluster's existence (e.g.,
using kind get clusters or kind get cluster --name "${CLUSTER_NAME}" and test
its exit/status) and only run kind delete cluster --name "${CLUSTER_NAME}" when
the cluster is present, mirroring the guard logic used in setup.sh so the script
doesn't exit non-zero under set -euo pipefail when the cluster was never
created.

---

Nitpick comments:
In `@kind/run-tests.sh`:
- Line 8: Replace the hardcoded DVO_METRICS_URL assignment in kind/run-tests.sh
with an environment-overridable default (use the existing pattern used for
CLUSTER_NAME/TEST_IMAGE/HTTPD_IMAGE so DVO_METRICS_URL can be overridden when
needed) and update any references to use that variable (look for DVO_METRICS_URL
in the script). Also add a row to the README env-var table documenting
DVO_METRICS_URL with default "http://localhost:8383" and a short description
like "Metrics endpoint for the test suite".

In `@kind/setup.sh`:
- Around line 79-91: The sed-based substitution for value: "2m" is brittle;
remove the sed pattern and instead update the Python YAML transform to find the
container env list in the document and change the env var named
VALIDATION_CHECK_INTERVAL from "2m" to "10s" (e.g., locate
doc['spec']['template']['spec'] → containers → env and mutate the entry where
env['name']=="VALIDATION_CHECK_INTERVAL"). Also keep the other edits (image and
pullPolicy) in sed, but delete the unanchored s|value: "2m"|...|; ensure the
Python block writes the modified YAML to stdout and preserves existing
spec.pop(...) removals (nodeSelector, tolerations, affinity).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Enterprise

Run ID: 07ffe223-4fea-4159-9362-42c529f05370

📥 Commits

Reviewing files that changed from the base of the PR and between 534fb89 and df4310d.

📒 Files selected for processing (5)
  • kind/README.md
  • kind/config.yaml
  • kind/run-tests.sh
  • kind/setup.sh
  • kind/teardown.sh

Comment thread kind/run-tests.sh
Comment on lines +53 to +69
KUBECONFIG_TMP="$(mktemp)"
kind get kubeconfig --name "${CLUSTER_NAME}" > "${KUBECONFIG_TMP}"
chmod 644 "${KUBECONFIG_TMP}"

echo "==> Running tests from image: ${TEST_IMAGE}"
echo " Metrics: ${DVO_METRICS_URL}"

${CONTAINER_ENGINE} run --rm \
--network host \
-v "${KUBECONFIG_TMP}:/tmp/kubeconfig:Z" \
-e KUBECONFIG=/tmp/kubeconfig \
-e DVO_METRICS_URL="${DVO_METRICS_URL}" \
-e HTTPD_TEST_IMAGE="${HTTPD_IMAGE}" \
"${TEST_IMAGE}" \
pytest.sh "${PYTEST_ARGS[@]}"

rm -f "${KUBECONFIG_TMP}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Temp kubeconfig is leaked when any subsequent command fails.

mktemp creates the file at line 53, but set -euo pipefail means the rm -f at line 69 is never reached if kind get kubeconfig, the chmod, or the ${CONTAINER_ENGINE} run exits non-zero. The file persists on disk indefinitely. Use trap to guarantee cleanup on any exit path.

🛡️ Proposed fix
 KUBECONFIG_TMP="$(mktemp)"
+trap 'rm -f "${KUBECONFIG_TMP}"' EXIT
 kind get kubeconfig --name "${CLUSTER_NAME}" > "${KUBECONFIG_TMP}"
 chmod 644 "${KUBECONFIG_TMP}"
 
 # ... rest of the script ...
 
-rm -f "${KUBECONFIG_TMP}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
KUBECONFIG_TMP="$(mktemp)"
kind get kubeconfig --name "${CLUSTER_NAME}" > "${KUBECONFIG_TMP}"
chmod 644 "${KUBECONFIG_TMP}"
echo "==> Running tests from image: ${TEST_IMAGE}"
echo " Metrics: ${DVO_METRICS_URL}"
${CONTAINER_ENGINE} run --rm \
--network host \
-v "${KUBECONFIG_TMP}:/tmp/kubeconfig:Z" \
-e KUBECONFIG=/tmp/kubeconfig \
-e DVO_METRICS_URL="${DVO_METRICS_URL}" \
-e HTTPD_TEST_IMAGE="${HTTPD_IMAGE}" \
"${TEST_IMAGE}" \
pytest.sh "${PYTEST_ARGS[@]}"
rm -f "${KUBECONFIG_TMP}"
KUBECONFIG_TMP="$(mktemp)"
trap 'rm -f "${KUBECONFIG_TMP}"' EXIT
kind get kubeconfig --name "${CLUSTER_NAME}" > "${KUBECONFIG_TMP}"
chmod 644 "${KUBECONFIG_TMP}"
echo "==> Running tests from image: ${TEST_IMAGE}"
echo " Metrics: ${DVO_METRICS_URL}"
${CONTAINER_ENGINE} run --rm \
--network host \
-v "${KUBECONFIG_TMP}:/tmp/kubeconfig:Z" \
-e KUBECONFIG=/tmp/kubeconfig \
-e DVO_METRICS_URL="${DVO_METRICS_URL}" \
-e HTTPD_TEST_IMAGE="${HTTPD_IMAGE}" \
"${TEST_IMAGE}" \
pytest.sh "${PYTEST_ARGS[@]}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kind/run-tests.sh` around lines 53 - 69, The temporary kubeconfig created by
KUBECONFIG_TMP via mktemp can leak if any command (kind get kubeconfig, chmod,
or the ${CONTAINER_ENGINE} run) fails; add a trap to ensure cleanup on any exit.
Specifically, define KUBECONFIG_TMP="$(mktemp)", then immediately register a
trap like trap 'rm -f "${KUBECONFIG_TMP}"' EXIT (or a function that removes the
file) before calling kind get kubeconfig and before running ${CONTAINER_ENGINE}
run, so rm -f "${KUBECONFIG_TMP}" is executed on normal exit and on errors; keep
the existing final rm -f as a no-op fallback.

Comment thread kind/teardown.sh
Comment on lines +6 to +8
echo "==> Deleting KinD cluster '${CLUSTER_NAME}'..."
kind delete cluster --name "${CLUSTER_NAME}"
echo "==> Done."

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

kind delete cluster fails when the cluster doesn't exist.

Unlike setup.sh, which guards deletion with an existence check, teardown.sh calls kind delete cluster unconditionally. With set -euo pipefail, this causes the script to exit non-zero if the cluster was never created (e.g., after a failed setup run), which is an annoying failure mode.

🛡️ Proposed fix — guard deletion like setup.sh does
-echo "==> Deleting KinD cluster '${CLUSTER_NAME}'..."
-kind delete cluster --name "${CLUSTER_NAME}"
+if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
+    echo "==> Deleting KinD cluster '${CLUSTER_NAME}'..."
+    kind delete cluster --name "${CLUSTER_NAME}"
+else
+    echo "==> No cluster '${CLUSTER_NAME}' found, nothing to do."
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "==> Deleting KinD cluster '${CLUSTER_NAME}'..."
kind delete cluster --name "${CLUSTER_NAME}"
echo "==> Done."
if kind get clusters 2>/dev/null | grep -q "^${CLUSTER_NAME}$"; then
echo "==> Deleting KinD cluster '${CLUSTER_NAME}'..."
kind delete cluster --name "${CLUSTER_NAME}"
else
echo "==> No cluster '${CLUSTER_NAME}' found, nothing to do."
fi
echo "==> Done."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kind/teardown.sh` around lines 6 - 8, The teardown script currently calls
kind delete cluster --name "${CLUSTER_NAME}" unconditionally which fails if the
cluster doesn't exist; modify teardown.sh to first check for the cluster's
existence (e.g., using kind get clusters or kind get cluster --name
"${CLUSTER_NAME}" and test its exit/status) and only run kind delete cluster
--name "${CLUSTER_NAME}" when the cluster is present, mirroring the guard logic
used in setup.sh so the script doesn't exit non-zero under set -euo pipefail
when the cluster was never created.

@opokornyy

Copy link
Copy Markdown
Collaborator

/ok-to-test

@codecov-commenter

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.16%. Comparing base (a7230ea) to head (df4310d).
⚠️ Report is 3 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #676   +/-   ##
=======================================
  Coverage   46.16%   46.16%           
=======================================
  Files          22       22           
  Lines        1083     1083           
=======================================
  Hits          500      500           
  Misses        553      553           
  Partials       30       30           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants