NETOBSERV-2387: Backend tests migration by oliver-smakal · Pull Request #2586 · netobserv/netobserv-operator

oliver-smakal · 2026-03-24T13:27:15Z

Description

This PR is intended to show how the e2e backend migration of tests could look like and where we can discuss more details.

Changes compare to the openshift-tests-private repo:

using ginkgo directly without wrapper
using BeforeSuite possible, but we need to be carefull about not using compat_otp.NewCLI and similar utilities which have g.BeforeEach calls
yaml testdata files are not being compiled in and are just referenced from the file tree

Testcases were migrated, there were some changes made due to the golanci linter complaining. Minor things were applied, however there were some things such as passing heavy structs without using * which I don't really see as a problem -> I have adjusted the Makefile to omit the integration-tests from linting/testing.

This PR can be tested with flexy cluster via the Network Observability Backend Tests.

Review tips

Important files to definitely check:

integration-tests/backend/backend_suite_test.go
integration-tests/backend/version_checker.go

To check changes in the testcases checkout the latest openshift-tests-private on the main branch and use:

git diff --no-index ~/Repos/openshift-tests-private/test/extended/netobserv/test_flowcollector.go test_flowcollector.go

to check file diff or

git diff --no-index ~/Repos/openshift-tests-private/test/extended/netobserv/ .

to check whole dirs.

How to run locally

Either:

go run github.com/onsi/ginkgo/v2/ginkgo #to run all
go run github.com/onsi/ginkgo/v2/ginkgo --focus="VM" #use regex to filter

or if the ginkgo cli is installed:

ginkgo #to run all
ginkgo --focus="VM" #use regex to filter

Other standard flags of ginkgo such as --dry-run or -v also work.

Example output

The following is an example output of of a run:

[1776324075] Backend Suite - 3/7488 specs Running Suite: Backend Suite - /home/osmakal/Repos/network-observability-operator/integration-tests/backend
==========================================================================================================
Random Seed: 1776324075

Will run 3 specs
[sig-netobserv] Network_Observability with Loki with Kafka Author:aramesha-NonPreRelease-Longduration-Critical-56362-High-53597-High-56326-Verify network flows are captured with Kafka with TLS [Serial][Slow]
/home/osmakal/Repos/network-observability-operator/integration-tests/backend/test_flowcollector.go:2601
• PASSED [0.000 seconds]
•[sig-netobserv] Network_Observability with Loki with Kafka Author:aramesha-NonPreRelease-Longduration-High-57397-High-65116-Verify network-flows export with Kafka and netobserv installation without Loki[Serial]
/home/osmakal/Repos/network-observability-operator/integration-tests/backend/test_flowcollector.go:2654
• PASSED [0.000 seconds]
•[sig-netobserv] Network_Observability with Loki with Kafka Author:aramesha-NonPreRelease-High-64880-High-75340-Verify secrets copied for Loki and Kafka when deployed in NS other than flowcollector pods and networkPolicy enabled[Serial]
/home/osmakal/Repos/network-observability-operator/integration-tests/backend/test_flowcollector.go:2780
• PASSED [0.000 seconds]
•------------------------------

Backend Suite - 3/3 specs • SUCCESS! [0.136 seconds]

Ran 3 tests
Passed: 3, Failed: 0, Skipped: 0
 SUCCESS! 136.307854ms PASS

Ginkgo ran 1 suite in 12.165523429s
Test Suite Passed

This implementation cannot control the [1776103444] Backend Suite - 3/7488 specs line. I don't think it will be issue in any way, as the rest of the report can make it really clear to us what is actually being run. For running it in prow it should also not be a problem if we preserve the way it works with the openshift-test-private repo and use junit report.

More examples:

How it could be used in CI.

Currently the openshift-test-private implementation uses junit to handle the result.
See lines 398-459 that junit result is generated and parsed and later is used by the openshift-e2e-test-qe-report step to fail the prow job if necessary). For the nice output in prow, it seems like the function handle_result in the openshift-extended-test step is used as it seems to be formatting and renaming the junit file with the help of handleresult.py python script.

Though this is probably not directly reusable for implementation as the junit in openshift-test-private does not use default ginkgo implementation, we could use the same idea and use the junit output with minor transformation in prow to report result in a nice way.

Alternative approaches

We could create something a bit more custom using more low level ginkgo functionality like in commit fbb6fc36b17bc37b6eb847b8e51f42d26a9b29d8, to remove things like the [1776103444] Backend Suite - 3/7488 specs line. However, I don't think it would be worth the tradeoff of not using the default ginkgo runner via RunSpecs() as we would use the possibility to run tests in parallel and some other features.

Summary by CodeRabbit

Chores
- Added golangci-lint v2 configuration for Go 1.23 with comprehensive linter rules
- Created Makefile with format and lint targets
- Added OWNERS configuration file
- Established Go module with pinned dependencies
Tests
- Added complete integration test framework with helpers for alerts, metrics, flow logs, and operators
- Included test data templates for networking, Kafka, Loki, virtualization, and cloud storage scenarios
- Established support for multi-cloud deployments (AWS, Azure, GCP) and multiple network topologies

openshift-ci · 2026-03-24T13:27:20Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2026-03-24T13:27:20Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

codecov · 2026-03-24T13:35:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.58%. Comparing base (fa26090) to head (e6a96cb).
⚠️ Report is 36 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2586      +/-   ##
==========================================
+ Coverage   72.50%   72.58%   +0.08%     
==========================================
  Files         107      107              
  Lines       11482    11481       -1     
==========================================
+ Hits         8325     8334       +9     
+ Misses       2658     2651       -7     
+ Partials      499      496       -3

Flag	Coverage Δ
unittests	`72.58% <ø> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

openshift-ci-robot · 2026-03-24T13:35:50Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-03-25T11:57:35Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T18:16:28Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T19:16:00Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T19:16:20Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T19:19:52Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T19:20:55Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T19:24:26Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T19:26:26Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T19:27:47Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-13T20:09:18Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-04-13T20:09:24Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e69b0e0f-425b-462f-8139-ddf86c3ccb37

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7fcfa and e6a96cb.

📒 Files selected for processing (7)

integration-tests/backend/testdata/DNS-pods.yaml
integration-tests/backend/testdata/kafka/kafka-metrics-config.yaml
integration-tests/backend/testdata/loki/lokistack-simple.yaml
integration-tests/backend/testdata/networking/test-client-DSCP.yaml
integration-tests/backend/testdata/networking/udn/udn_test_pod_template.yaml
integration-tests/backend/testdata/subscription/allnamespace-og.yaml
integration-tests/backend/testdata/testuser-client-server_template.yaml

✅ Files skipped from review due to trivial changes (1)

integration-tests/backend/testdata/loki/lokistack-simple.yaml

🚧 Files skipped from review as they are similar to previous changes (6)

integration-tests/backend/testdata/networking/udn/udn_test_pod_template.yaml
integration-tests/backend/testdata/subscription/allnamespace-og.yaml
integration-tests/backend/testdata/networking/test-client-DSCP.yaml
integration-tests/backend/testdata/DNS-pods.yaml
integration-tests/backend/testdata/kafka/kafka-metrics-config.yaml
integration-tests/backend/testdata/testuser-client-server_template.yaml

📝 Walkthrough

Walkthrough

Adds a backend e2e Go module with Ginkgo suite, shared helpers (operators, Loki, Kafka, networking/UDN, cloud STS/WIF), Makefile/lint/OWNERS, and extensive YAML templates/fixtures for scenarios.

Changes

Backend E2E module and helpers

Layer / File(s)	Summary
Suite, helpers, and fixtures `integration-tests/backend/`, `integration-tests/backend/testdata/`	Complete e2e suite, shared utilities (Loki/cloud/OLM/Kafka/UDN), and comprehensive test templates and manifests.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

openshift-ci-robot · 2026-04-14T12:12:06Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-14T12:16:43Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-14T12:18:28Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-14T12:56:39Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-04-16T07:23:10Z

@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

memodi

thanks @oliver-smakal , some questions/comments.

/cc @jotak @OlivierCazade @leandroberetta @jpinsonneau
JSYK, in terms of code the files in testdata and all the code has been already reviewed in openshift-tests-private except for couple of new files (version_checker.go, backend_suite_test.go) , besides those new files, the most important thing here to review here go.mod for new dependencies.

memodi

/lgtm

thanks @oliver-smakal

openshift-ci · 2026-05-12T07:45:39Z

New changes are detected. LGTM label has been removed.

openshift-ci · 2026-05-12T07:45:44Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from memodi. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 23

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (30)

integration-tests/backend/testdata/networking/baselineadminnetworkPolicy.yaml-19-26 (1)

19-26: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove duplicate name key in second ingress rule.

Line 26 redefines name in the same mapping as Line 20. This is invalid/ambiguous YAML and can break template processing or silently override the earlier value.

Proposed fix

     - action: Deny
       name: default-deny-ingress2
       from:
       - namespaces:
           namespaceSelector:
             matchLabels:
               kubernetes.io/metadata.name: ${CLIENT2_NS}
-      name: default-deny-ns

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@integration-tests/backend/testdata/networking/baselineadminnetworkPolicy.yaml`
around lines 19 - 26, The YAML contains a duplicate "name" key in the second
ingress rule: remove the redundant "name: default-deny-ns" that duplicates the
earlier "name: default-deny-ingress2" mapping (or, if a distinct name was
intended, replace it with the correct unique name); update the ingress rule
containing namespaces→namespaceSelector→matchLabels referencing ${CLIENT2_NS} so
it only has a single valid "name" field (identify the mapping by the existing
name "default-deny-ingress2" and the duplicate "default-deny-ns").

integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml-49-50 (1)

49-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin the container disk image to an immutable reference.

Using a floating image here makes backend e2e non-deterministic and can break tests when upstream changes.

Proposed fix

-              image: quay.io/containerdisks/fedora
+              image: quay.io/containerdisks/fedora@sha256:<pinned-digest>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml`
around lines 49 - 50, The image field currently uses a floating reference
"quay.io/containerdisks/fedora" for the container disk (name: rootdisk); replace
that with an immutable reference (either a specific version tag or a digest)
such as "quay.io/containerdisks/fedora:<fixed-tag>" or
"quay.io/containerdisks/fedora@sha256:<digest>" so the virtualization template
(the image entry for rootdisk) is pinned and e2e tests become
deterministic—update the template entry where image:
quay.io/containerdisks/fedora is defined and ensure any test fixtures that rely
on this template use the same pinned reference.

integration-tests/backend/virtualization.go-36-46 (1)

36-46: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle empty worker-node lists explicitly.

If no worker nodes are returned, this currently reports true, which can mis-gate bare-metal-only flows.

Proposed fix

 func hasMetalWorkerNodes(oc *exutil.CLI) bool {
 	workers, err := compat_otp.GetClusterNodesBy(oc, "worker")
 	o.Expect(err).NotTo(o.HaveOccurred())
+	if len(workers) == 0 {
+		e2e.Logf("Cluster has no worker nodes")
+		return false
+	}
 	for _, w := range workers {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/virtualization.go` around lines 36 - 46, The
function that checks for metal worker nodes uses compat_otp.GetClusterNodesBy
and currently returns true when the returned workers slice is empty; add an
explicit check after obtaining workers (e.g., if len(workers) == 0) to return
false (or fail the check) so empty worker lists do not incorrectly pass, then
proceed to the existing loop that queries each node with
oc.AsAdmin().WithoutNamespace().Run("get") and inspects the node label
node.kubernetes.io/instance-type for "metal".

integration-tests/backend/testdata/networking/test-client-DSCP.yaml-14-17 (1)

14-17: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix shell redirection order and use block scalar for clarity

Line 16 has reversed redirection: 2>&1 > /dev/null sends stderr to the terminal before redirecting stdout to /dev/null, allowing stderr to leak. Use >/dev/null 2>&1 instead. Also replace the fragile escaped quoted string with a YAML block scalar (|) for better readability and maintainability.

Proposed fix

   - command:
         - sh
         - -c
-        - "
-          \  while : ; do\n
-          \        curl nginx-service.test-server-68125.svc:80/data/100K  2>&1 >  /dev/null ; sleep 5 \n
-          \  done"
+        - |
+          while :; do
+            curl nginx-service.test-server-68125.svc:80/data/100K >/dev/null 2>&1
+            sleep 5
+          done

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/networking/test-client-DSCP.yaml` around
lines 14 - 17, Replace the escaped quoted shell string block with a YAML block
scalar (|) containing the loop, and fix the redirection order in the curl
command: change the fragment that contains "while : ; do ... curl
nginx-service.test-server-68125.svc:80/data/100K  2>&1 >  /dev/null ; sleep 5 ;
done" so it reads as a multi-line block scalar with the curl redirection as
>/dev/null 2>&1 (i.e., redirect stdout to /dev/null first, then stderr to
stdout); update the YAML entry that currently holds the escaped string
accordingly.

integration-tests/backend/testdata/flowcollectorSlice_v1alpha1_template.yaml-13-13 (1)

integration-tests/backend/ip_utils.go-35-56 (1)

35-56: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Array bounds error and confusing IP ordering logic.

Line 42 accesses .status.podIPs[1].ip without verifying the array has two elements. If the pod has only one IP (e.g., during initialization or in single-stack mode miscategorized as dual-stack), this will fail.

Additionally, the function fetches podIPs[1] first, then podIPs[0], then conditionally reorders based on IP version. This is confusing and error-prone.

🛡️ Proposed fix with bounds checking and clearer logic

 func getPodIP(oc *exutil.CLI, namespace, podName, ipStack string) (string, string) {
 	if (ipStack == "ipv6single") || (ipStack == "ipv4single") {
 		podIP, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs[0].ip}").Output()
 		o.Expect(err).NotTo(o.HaveOccurred())
 		e2e.Logf("The pod %s IP in namespace %s is %q", podName, namespace, podIP)
 		return podIP, ""
 	} else if ipStack == "dualstack" {
+		// Fetch all IPs as JSON array
+		podIPsJSON, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs}").Output()
+		o.Expect(err).NotTo(o.HaveOccurred())
+		// Parse and validate we have exactly 2 IPs, then order by IPv4/IPv6
+		// ... (add JSON parsing logic)
-		podIP1, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs[1].ip}").Output()
-		o.Expect(err).NotTo(o.HaveOccurred())
-		e2e.Logf("The pod's %s 1st IP in namespace %s is %q", podName, namespace, podIP1)
-		podIP2, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs[0].ip}").Output()
-		o.Expect(err).NotTo(o.HaveOccurred())
-		e2e.Logf("The pod's %s 2nd IP in namespace %s is %q", podName, namespace, podIP2)
-		if netutils.IsIPv6String(podIP1) {
-			e2e.Logf("This is IPv4 primary dual stack cluster with IP %s", podIP1)
-			return podIP1, podIP2
-		}
-		e2e.Logf("This is IPv6 primary dual stack cluster with IP %s", podIP2)
-		return podIP2, podIP1
 	}
 	return "", ""
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/ip_utils.go` around lines 35 - 56, The getPodIP
function is accessing .status.podIPs[1] without bounds checks and makes
confusing repeated calls and ordering; fix it by fetching the podIPs array once
(use a jsonpath like {.status.podIPs[*].ip} or retrieve the whole
status.podIPs), verify the slice length is >=2 before accessing index 1 and
handle the single-IP case gracefully (return the single IP and empty string or
surface an error), then determine which IP is primary by calling
netutils.IsIPv6String on the two retrieved IPs and return them in the correct
primary,secondary order; update references in getPodIP to stop making two
separate oc.Run("get") calls and add the length check to avoid out-of-bounds
panics.

integration-tests/backend/ip_utils.go-14-25 (1)

integration-tests/backend/testdata/netobserv-loki-reader-multitenant-crb.yaml-9-9 (1)

9-9: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parameterize ClusterRoleBinding name to avoid cross-test conflicts.

A static cluster-scoped name can be overwritten when tests bind different users, causing flaky auth behavior and permission bleed.

Proposed diff

     metadata:
-      name: netobserv-user-reader
+      name: "netobserv-user-reader-${USERNAME}"

Also applies to: 17-20

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@integration-tests/backend/testdata/netobserv-loki-reader-multitenant-crb.yaml`
at line 9, The ClusterRoleBinding uses a static name ("netobserv-user-reader")
which causes cross-test conflicts; change the ClusterRoleBinding name to be
unique per-test (e.g., use metadata.generateName or a templated/parameterized
name like "{{testID}}-netobserv-user-reader" or include the release/test name)
wherever the static name appears (the "name: netobserv-user-reader" entries and
the ClusterRoleBinding resource referenced around lines 17-20); ensure any code
that references this binding (subjects/roleRef) uses the same
generated/templated value so each test gets an isolated, non-conflicting
ClusterRoleBinding.

integration-tests/backend/testdata/networking/nmstate/operatorgroup-template.yaml-9-11 (1)

9-11: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove redundant generateName field and hardcoded suffix.

When name is present, Kubernetes ignores generateName per the ObjectMeta specification. Remove the pointless generateName line and the brittle -dzrmx suffix to use a single, deterministic naming strategy for repeatable tests.

Proposed diff

   metadata:
-    name: "${NAME}-dzrmx"
+    name: "${NAME}"
     namespace: "${NAMESPACE}"
-    generateName: "${NAME}-"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@integration-tests/backend/testdata/networking/nmstate/operatorgroup-template.yaml`
around lines 9 - 11, Remove the redundant generateName field and the hardcoded
"-dzrmx" suffix: update the resource metadata by deleting the generateName line
and change the name value from "${NAME}-dzrmx" to a deterministic "${NAME}" (or
another stable token) so tests use a single, repeatable naming strategy; ensure
you only modify the metadata.name and remove generateName to avoid Kubernetes
ignoring generateName when name is present.

integration-tests/backend/testdata/test-tls-server_template.yaml-59-63 (1)

integration-tests/backend/testdata/test-nginx-server_template.yaml-20-23 (1)

20-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix invalid securityContext placement and privileged field nesting.

securityContext at lines 20–23 is misplaced at Deployment spec level; it must be under spec.template.spec. The privileged field at line 39 is incorrectly nested under capabilities (which only accepts add/drop); it must be a sibling of capabilities under securityContext. These structural errors will cause the manifest to be rejected or silently drop security constraints.

Proposed diff

   spec:
-    securityContext:     
-      runAsNonRoot: true
-      seccompProfile:
-        type: RuntimeDefault
     replicas: 1
@@
     template:
       metadata:
         labels:
           app: nginx
       spec:
+        securityContext:
+          runAsNonRoot: true
+          seccompProfile:
+            type: RuntimeDefault
         containers:
         - name: nginx
           securityContext:
             allowPrivilegeEscalation: false
+            privileged: false
             capabilities:
               drop: ["ALL"]
-              privileged: false

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/test-nginx-server_template.yaml` around
lines 20 - 23, The manifest places securityContext at the Deployment spec root
and nests privileged under capabilities; move the top-level securityContext into
the Pod template (place it under spec.template.spec.securityContext) so
pod-level settings are applied, and for the container-level security settings
move the privileged boolean out of capabilities and make it a sibling of
capabilities under the container's securityContext (i.e.,
container.securityContext.privileged alongside
container.securityContext.capabilities). Ensure you update the blocks that
reference securityContext, spec.template.spec, capabilities, and privileged
accordingly so the structure matches Kubernetes schema.

integration-tests/backend/testdata/exporters/ipfix-collector.yaml-20-20 (1)

20-20: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pin the image tag to a specific version.

Using :latest makes tests non-deterministic and can cause unexpected failures when the upstream image changes.

📌 Proposed fix

-          image: antrea/ipfix-collector:latest
+          image: antrea/ipfix-collector:v0.7.0

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/exporters/ipfix-collector.yaml` at line
20, Replace the non-deterministic image reference "image:
antrea/ipfix-collector:latest" with a specific, immutable tag or digest (e.g.,
"image: antrea/ipfix-collector:vX.Y.Z" or "image:
antrea/ipfix-collector@sha256:...") so tests are reproducible; update the
ipfix-collector image entry in the YAML and ensure the chosen version is
recorded in test dependencies or CI config.

integration-tests/backend/testdata/exporters/ipfix-collector.yaml-18-32 (1)

18-32: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add securityContext and resource limits.

The container lacks security constraints and resource limits. For test reliability and cluster safety, add:

securityContext with runAsNonRoot, allowPrivilegeEscalation: false, dropped capabilities
resources with limits and requests

🛡️ Proposed fix

       containers:
         - name: ipfix-collector
           image: antrea/ipfix-collector:latest
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            seccompProfile:
+              type: RuntimeDefault
+          resources:
+            limits:
+              memory: 256Mi
+              cpu: 200m
+            requests:
+              memory: 128Mi
+              cpu: 100m
           args:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/exporters/ipfix-collector.yaml` around
lines 18 - 32, The ipfix-collector container spec (container name
"ipfix-collector") is missing securityContext and resource limits; update the
container definition to add a securityContext that sets runAsNonRoot: true,
allowPrivilegeEscalation: false and drops capabilities (e.g., capabilities.drop:
["ALL"]) and optionally a seccompProfile/runtimeClass if desired, and add a
resources block with sensible requests and limits (cpu and memory) for both
requests and limits to protect cluster resources; apply these changes in the
container stanza that contains image: antrea/ipfix-collector:latest and the args
list so the securityContext and resources are applied to that container.

integration-tests/backend/testdata/testuser-template-crb.yaml-9-10 (1)

9-10: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid static cluster-scoped RBAC names to prevent collisions.

Fixed names for ClusterRole/ClusterRoleBinding can conflict in parallel CI runs or reruns. Add a suffix parameter and include it in both names and roleRef.

Proposed fix

-    name: testuser-templating-cr
+    name: testuser-templating-cr-${RESOURCE_SUFFIX}
@@
-    name: testuser-templating-crb
+    name: testuser-templating-crb-${RESOURCE_SUFFIX}
@@
-    name: testuser-templating-cr
+    name: testuser-templating-cr-${RESOURCE_SUFFIX}
@@
 parameters:
 - name: USERNAME
   required: true
+- name: RESOURCE_SUFFIX
+  required: true

Also applies to: 23-27, 32-34

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/testuser-template-crb.yaml` around lines 9
- 10, The ClusterRole and ClusterRoleBinding use static names (e.g.,
"testuser-templating-cr") which can collide; add a template parameter (e.g., a
suffix variable like "{{ .Values.suffix }}" or similar used in this repo) and
append it to the ClusterRole metadata.name, the ClusterRoleBinding
metadata.name, and the roleRef.name so all three match the suffixed name; update
every occurrence (including the other instances noted around lines 23-27 and
32-34) to use the same suffix variable so parallel CI runs produce unique names.

integration-tests/backend/alerts.go-24-24 (1)

24-24: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Error from json.Unmarshal is silently ignored.

If unmarshaling fails, alertStatus will be empty and the function will return an empty map, masking the real error. This makes debugging difficult.

🛡️ Proposed fix

 	var alertStatus []interface{}
-	_ = json.Unmarshal([]byte(alertOut), &alertStatus)
+	if err := json.Unmarshal([]byte(alertOut), &alertStatus); err != nil {
+		return make(map[string]interface{}), err
+	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/alerts.go` at line 24, The json.Unmarshal call
currently ignores its returned error which can mask failures: capture the error
from json.Unmarshal([]byte(alertOut), &alertStatus) and handle it (e.g., return
the error from the surrounding function or fail the test/log a descriptive
message) instead of discarding it; update the function that uses alertStatus so
it propagates or reports the unmarshalling error to callers/tests to avoid
returning an empty map silently.

integration-tests/backend/multitenants.go-73-75 (1)

73-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop shell-wrapped htpasswd execution.

Using bash -c with dynamic strings is unnecessary and expands injection risk. Pass arguments directly to exec.Command.

Proposed fix

-        cmd := fmt.Sprintf("htpasswd -b %v %v %v", *passwdFile, users[i].Username, users[i].Password)
-        err := exec.Command("bash", "-c", cmd).Run()
+        err := exec.Command("htpasswd", "-b", *passwdFile, users[i].Username, users[i].Password).Run()
         if err != nil {
             return err
         }
@@
-        cmd := fmt.Sprintf("htpasswd -D %v %v", usersHTpassFile, users[i].Username)
-        err := exec.Command("bash", "-c", cmd).Run()
+        err := exec.Command("htpasswd", "-D", usersHTpassFile, users[i].Username).Run()
         o.Expect(err).NotTo(o.HaveOccurred())

Also applies to: lines 124-126

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/multitenants.go` around lines 73 - 75, The test
currently constructs a shell command string in variable cmd and runs it via
exec.Command("bash","-c", cmd).Run(), which risks shell injection; change the
call to pass arguments directly to exec.Command by invoking exec.Command with
the program "htpasswd" and the argument list "-b", *passwdFile,
users[i].Username, users[i].Password (and analogous replacements for the similar
block at lines 124-126), removing the intermediate cmd string so exec.Command
receives explicit args instead of a shell-wrapped command.

integration-tests/backend/loki.go-73-78 (1)

73-78: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Readiness failures are overwritten

Lines 73-78 overwrite err on each iteration. A failed component can be masked by a later successful check, returning nil incorrectly.

Suggested fix

 for _, deploy := range []string{l.Name + "-distributor", l.Name + "-gateway", l.Name + "-querier", l.Name + "-query-frontend"} {
-	err = waitForDeploymentPodsToBeReady(oc, l.Namespace, deploy)
+	if err = waitForDeploymentPodsToBeReady(oc, l.Namespace, deploy); err != nil {
+		return err
+	}
 }
 for _, ss := range []string{l.Name + "-compactor", l.Name + "-index-gateway", l.Name + "-ingester"} {
-	err = waitForStatefulsetReady(oc, l.Namespace, ss)
+	if err = waitForStatefulsetReady(oc, l.Namespace, ss); err != nil {
+		return err
+	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/loki.go` around lines 73 - 78, The loops calling
waitForDeploymentPodsToBeReady and waitForStatefulsetReady currently assign to
err on each iteration, allowing later successes to overwrite earlier failures;
change the logic in the block that iterates over []string{l.Name +
"-distributor", ...} and []string{l.Name + "-compactor", ...} to preserve errors
(e.g., if err != nil keep/aggregate it or return early) instead of blindly
reassigning err each time, so failures from waitForDeploymentPodsToBeReady or
waitForStatefulsetReady are not lost.

integration-tests/backend/operator.go-171-175 (1)

171-175: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

NotFound detection is broken due malformed formatting

Line 173 uses fmt.Sprint("v%", sub), producing a literal string and preventing reliable NotFound matching at Line 174. This can skip subscription creation.

Suggested fix

-		msg := fmt.Sprint("v%", sub)
+		msg := fmt.Sprintf("%v", err)
 		if strings.Contains(msg, "NotFound") {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/operator.go` around lines 171 - 175, The NotFound
detection is broken because msg is built with fmt.Sprint("v%", sub) which
produces a literal string; replace that with a proper formatting or conversion
(e.g., fmt.Sprintf("%v", sub) or fmt.Sprint(sub)) and change the
strings.Contains check to inspect the actual output variable (sub) or the
corrected msg; update the block around
oc.AsAdmin().WithoutNamespace().Run("get").Args("sub", "-n", so.Namespace,
so.PackageName).Output(), the variable sub, and the strings.Contains(msg,
"NotFound") check so subscription creation via so.setCatalogSourceObjects(oc)
triggers correctly.

integration-tests/backend/azure_utils.go-296-296 (1)

296-296: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Region query can return invalid multi-value output

Line 296 uses .items[]...region, which can return multiple values (space-concatenated). That can break Azure SDK calls expecting one region string.

Suggested fix

-func getAzureClusterRegion(oc *exutil.CLI) (string, error) {
-	return oc.AsAdmin().WithoutNamespace().Run("get").Args("node", `-ojsonpath={.items[].metadata.labels.topology\.kubernetes\.io/region}`).Output()
-}
+func getAzureClusterRegion(oc *exutil.CLI) (string, error) {
+	return oc.AsAdmin().WithoutNamespace().Run("get").Args("infrastructure", "cluster", "-o=jsonpath={.status.platformStatus.azure.region}").Output()
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/azure_utils.go` at line 296, The current call using
oc.AsAdmin().WithoutNamespace().Run("get").Args("node",
`-ojsonpath={.items[].metadata.labels.topology\.kubernetes\.io/region}`).Output()
can return multiple space-concatenated regions; change the query to return a
single region (e.g. select the first item) or fetch JSON and parse to a single
value so Azure SDK receives one region string. Update the Args(...) jsonpath to
target {.items[0].metadata.labels.topology.kubernetes.io/region} or replace the
jsonpath call with -o json and parse the first item's label in Go before
returning to ensure only one region is returned by the function.

integration-tests/backend/loki_client.go-274-274 (1)

274-274: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add timeout to HTTP client to prevent indefinite test hangs

The http.Client created at line 274 is missing a Timeout field. If the endpoint becomes stalled, test execution will hang indefinitely.

Suggested fix

-client := &http.Client{Transport: tr}
+client := &http.Client{
+	Transport: tr,
+	Timeout:   30 * time.Second,
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/loki_client.go` at line 274, The http.Client
created as client := &http.Client{Transport: tr} lacks a Timeout causing
potential test hangs; update the initializer to include a sensible Timeout
(e.g., Timeout: 10 * time.Second) and add the time import if it's not present so
the client is created as &http.Client{Transport: tr, Timeout: <duration>}.
Ensure the change is applied where client is declared so all requests from this
loki client use the timeout.

integration-tests/backend/loki_client.go-265-270 (1)

integration-tests/backend/test_flowcollectorslice.go-543-543 (1)

543-543: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not log authentication tokens.

Printing user0token exposes credentials in test logs.

Proposed fix

-e2e.Logf("testuser token: %s", user0token)
+e2e.Logf("retrieved testuser token")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/test_flowcollectorslice.go` at line 543, The test
currently logs sensitive credentials via e2e.Logf("testuser token: %s",
user0token); remove or replace that call so the raw user0token is never printed
— e.g., log a non-sensitive message (like "testuser token created" or the token
length) or mask the token before logging; keep any assertion that user0token is
non-empty (e.g., require.NotEmpty or t.Fatalf checks) but do not output the
token value. Ensure you update the e2e.Logf invocation that references
user0token accordingly.

integration-tests/backend/test_flowmetrics.go-112-113 (1)

112-113: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dashboard existence is not asserted.

The result of checkResourceExists is discarded, so this verification is currently ineffective.

Proposed fix

-_, _ = checkResourceExists(oc, "cm", "netobserv-"+dashName, "openshift-config-managed")
+exists, err := checkResourceExists(oc, "cm", "netobserv-"+dashName, "openshift-config-managed")
+o.Expect(err).NotTo(o.HaveOccurred())
+o.Expect(exists).To(o.BeTrue(), "expected dashboard configmap netobserv-"+dashName+" to exist")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/test_flowmetrics.go` around lines 112 - 113, The
call to checkResourceExists(oc, "cm", "netobserv-"+dashName,
"openshift-config-managed") discards its (bool, error) result so the test
doesn't actually assert dashboard existence; capture its return values (e.g.,
exists, err := checkResourceExists(...)) and then fail the test if err != nil or
exists is false (use t.Fatalf or your test assertion helper like
require.NoError/True) so the dashboard presence is asserted for the
"netobserv-"+dashName configmap.

integration-tests/backend/udn.go-327-351 (1)

integration-tests/backend/util.go-71-75 (1)

71-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix duplicated condition in proxy fallback logic.

Line 73 checks http_proxy again instead of https_proxy, making the https_proxy fallback unreachable.

Proposed fix

 if os.Getenv("http_proxy") != "" {
 	proxy = os.Getenv("http_proxy")
-} else if os.Getenv("http_proxy") != "" {
+} else if os.Getenv("https_proxy") != "" {
 	proxy = os.Getenv("https_proxy")
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/util.go` around lines 71 - 75, The proxy fallback
logic uses the wrong env check: change the second condition so it tests
os.Getenv("https_proxy") instead of repeating "http_proxy" so the https_proxy
branch becomes reachable; update the block around the proxy variable assignment
in integration-tests/backend/util.go (the if/else that sets proxy) to check
"https_proxy" in the else-if and keep the assignment to proxy =
os.Getenv("https_proxy").

integration-tests/backend/udn.go-409-425 (1)

integration-tests/backend/flowcollector_utils.go-335-340 (1)

335-340: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Accumulate Loki records across streams instead of overwriting.

Each loop iteration overwrites flowRecords; only the last stream survives. This loses data when Loki returns multiple result streams, causing tests to fail or see incomplete results.

Applies to lines 335-340, 370-375, and 393-398.

Proposed fix

for _, result := range res.Data.Result {
-	flowRecords, err = getFlowRecords(result.Values)
+	recs, err := getFlowRecords(result.Values)
 	if err != nil {
 		return []FlowRecord{}, err
 	}
+	flowRecords = append(flowRecords, recs...)
}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/flowcollector_utils.go` around lines 335 - 340, The
loop that processes Loki result streams overwrites flowRecords each iteration,
losing earlier streams; modify the code that iterates over res.Data.Result (and
the similar loops around getFlowRecords calls) to append into a single slice
instead of reassigning: call getFlowRecords(result.Values), check err, then
extend the existing flowRecords slice (e.g., use append(flowRecords,
returnedSlice...)) so all streams are accumulated across iterations while
preserving error handling in the same places where getFlowRecords is invoked.

integration-tests/backend/flowcollector.go-225-231 (1)

225-231: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Readiness check can pass on non-ready states.

strings.Contains(status, "Ready") returns true for "NotReady" and similar values. Use exact string comparison instead:

Proposed fix

- if strings.Contains(status, "Ready") {
+ if strings.Trim(status, "'") == "Ready" {
    return true, nil
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/flowcollector.go` around lines 225 - 231, The
readiness check currently uses strings.Contains(status, "Ready") which will
incorrectly match values like "NotReady"; update the check to perform an exact
comparison: normalize the returned status string from the jsonpath (trim
surrounding quotes/apostrophes and whitespace from the status variable) and then
compare equality to "Ready" (e.g. status == "Ready"). Locate the status variable
and the conditional in flowcollector.go (the call that sets status and the if
strings.Contains(...) block) and replace the Contains-based test with the
trimmed exact equality check so only the literal "Ready" passes.

integration-tests/backend/kafka.go-179-182 (1)

179-182: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use a containment check for Ready condition.

Line 181 compares status == "Ready". Condition output can include multiple entries, causing false negatives and unnecessary timeouts.

Proposed fix

-        if status == "Ready" {
+        if strings.Contains(status, "Ready") {
             return true, nil
         }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/kafka.go` around lines 179 - 182, The equality
check if status == "Ready" can miss cases where the status string contains
multiple entries; change the condition to a containment check (e.g., use
strings.Contains(status, "Ready")) and optionally trim whitespace/newlines on
status before checking. Update the code around the status variable (created with
strings.Replace) and the if statement that currently uses equality so it uses
strings.Contains(status, "Ready") (or
strings.Contains(strings.TrimSpace(status), "Ready")) instead of ==; keep the
e2e.Logf call as-is.

integration-tests/backend/kafka.go-155-160 (1)

155-160: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't terminate poll loops on transient oc get errors.

Returning non-nil errors from the condition function stops polling immediately instead of retrying until timeout. This causes premature test failures when resources are not yet visible or API calls encounter transient errors.

Proposed fix

+import apierrors "k8s.io/apimachinery/pkg/api/errors"
@@
-        if err != nil {
-            e2e.Logf("kafka status ready error: %v", err)
-            return false, err
-        }
+        if err != nil {
+            if apierrors.IsNotFound(err) {
+                return false, nil
+            }
+            e2e.Logf("kafka status ready error: %v", err)
+            return false, nil
+        }
@@
-        if err != nil {
-            e2e.Logf("kafka Topic status ready error: %v", err)
-            return false, err
-        }
+        if err != nil {
+            if apierrors.IsNotFound(err) {
+                return false, nil
+            }
+            e2e.Logf("kafka Topic status ready error: %v", err)
+            return false, nil
+        }

Applies to lines 155-160 and 172-177.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/kafka.go` around lines 155 - 160, The poll
condition passed to wait.PollUntilContextTimeout is returning non-nil errors on
transient oc get failures which aborts polling; instead, catch errors from
oc.AsAdmin().WithoutNamespace().Run("get").Args(...).Output() in the condition
functions (both the block around kafka status and the later block at 172-177),
log the error with e2e.Logf as you already do, and return (false, nil) so the
poll continues retrying until timeout; only return a non-nil error when the
error is terminal (not transient) or when you intentionally want to abort the
poll.

🟡 Minor comments (9)

integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml-54-56 (1)

54-56: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid secret-like password literals in committed testdata.

This value is likely synthetic, but it still trips secret scanners. Use a clearly fake value or parameterize it.

Proposed fix

-                password: byje-7cd2-i8et
+                password: test-password

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml`
around lines 54 - 56, The test data contains a secret-like literal in the
password field (keys: user, password, chpasswd) — replace the current
realistic-looking password with a clearly fake placeholder or parameterize it so
secret scanners won't flag it; update the YAML entry for password to use a
non-secret value like a constant "CHANGE_ME"/"fake-password" or a template
variable (e.g., {{VM_ROOT_PASSWORD}}) and ensure any test harness reads the
parameterized value from test config or environment instead of committing
real-looking credentials.

integration-tests/backend/sctp.go-58-58 (1)

58-58: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in skip message.

"woker" should be "worker".

📝 Proposed fix

-		g.Skip("Can not find any woker nodes in the cluster")
+		g.Skip("Can not find any worker nodes in the cluster")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/sctp.go` at line 58, The skip message passed to
g.Skip contains a typo; update the string used in the g.Skip(...) call in
integration-tests/backend/sctp.go (the g.Skip invocation) to replace "woker"
with "worker" (i.e., "Can not find any worker nodes in the cluster"), keeping
the rest of the message unchanged.

integration-tests/backend/sctp.go-31-31 (1)

31-31: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix typo in skip message.

"Yum insall" should be "Yum install".

📝 Proposed fix

-				g.Skip("Yum insall to enable sctp cannot work in a disconnected cluster, skip the test!!!")
+				g.Skip("Yum install to enable sctp cannot work in a disconnected cluster, skip the test!!!")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/sctp.go` at line 31, Update the skip message passed
to g.Skip to correct the typo: change "Yum insall to enable sctp cannot work in
a disconnected cluster, skip the test!!!" to "Yum install to enable sctp cannot
work in a disconnected cluster, skip the test!!!" so the g.Skip(...) call uses
the correct word "install".

integration-tests/backend/testdata/test-nginx-client_template.yaml-30-37 (1)

30-37: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Move privileged field to correct level.

privileged: false on line 34 is incorrectly nested under capabilities. It should be a sibling of allowPrivilegeEscalation at the securityContext level.

🔧 Proposed fix

       securityContext:
         allowPrivilegeEscalation: false
+        privileged: false
         capabilities:
           drop: ["ALL"]
-          privileged: false
         runAsNonRoot: true

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/test-nginx-client_template.yaml` around
lines 30 - 37, The `privileged: false` property is incorrectly nested under
`capabilities`; move it out so that `privileged` is a direct child of
`securityContext` (i.e., a sibling of `allowPrivilegeEscalation` and
`runAsNonRoot`) rather than nested under `capabilities` — update the YAML
structure around `securityContext`, `allowPrivilegeEscalation`, `capabilities`,
`privileged`, and `runAsNonRoot` so `capabilities` contains only `drop` and
`privileged` sits at the securityContext level.

integration-tests/backend/testdata/testuser-client-server_template.yaml-89-90 (1)

89-90: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix redirection order to silence both stdout and stderr.

The current 2>&1 > /dev/null redirects stderr to stdout before stdout goes to /dev/null, leaving stderr on the terminal. Use > /dev/null 2>&1 instead.

Proposed fix

-          \        curl nginx-service.${SERVER_NS}.svc:8080/data/${OBJECT_SIZE}  2>&1 >  /dev/null ; sleep 5 \n
+          \        curl nginx-service.${SERVER_NS}.svc:8080/data/${OBJECT_SIZE} > /dev/null 2>&1 ; sleep 5 \n

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/testuser-client-server_template.yaml`
around lines 89 - 90, The shell command string containing "curl
nginx-service.${SERVER_NS}.svc:8080/data/${OBJECT_SIZE}  2>&1 >  /dev/null ;
sleep 5" uses the wrong redirection order so stderr still prints; change the
redirection to send stdout to /dev/null first and then redirect stderr to stdout
by replacing "2>&1 >  /dev/null" with "> /dev/null 2>&1" in the command string
(the loop/command that invokes curl) so both stdout and stderr are silenced.

integration-tests/backend/testdata/test-tls-client_template.yaml-32-36 (1)

32-36: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the redirect order.

The stderr/stdout redirects should be > /dev/null 2>&1 not 2>&1 > /dev/null. Current order redirects stderr to the original stdout before redirecting stdout to /dev/null, leaving stderr visible.

🔧 Proposed fix

-            curl -sk --tlsv1.3 https://tls-server-service.${SERVER_NS}.svc:443/ 2>&1 > /dev/null
+            curl -sk --tlsv1.3 https://tls-server-service.${SERVER_NS}.svc:443/ > /dev/null 2>&1
             sleep 5
-            curl -sk --tlsv1.2 --tls-max 1.2 https://tls-server-service.${SERVER_NS}.svc:443/ 2>&1 > /dev/null
+            curl -sk --tlsv1.2 --tls-max 1.2 https://tls-server-service.${SERVER_NS}.svc:443/ > /dev/null 2>&1
             sleep 5
-            curl -s http://tls-server-service.${SERVER_NS}.svc:80/ 2>&1 > /dev/null
+            curl -s http://tls-server-service.${SERVER_NS}.svc:80/ > /dev/null 2>&1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/testdata/test-tls-client_template.yaml` around
lines 32 - 36, The curl invocations (e.g., the lines starting with "curl -sk
--tlsv1.3 https://tls-server-service.${SERVER_NS}.svc:443/", "curl -sk --tlsv1.2
--tls-max 1.2 https://tls-server-service.${SERVER_NS}.svc:443/", and "curl -s
http://tls-server-service.${SERVER_NS}.svc:80/") currently use the wrong
redirect order ("2>&1 > /dev/null"); change each to redirect stdout first and
then stderr to stdout by using "> /dev/null 2>&1" so stderr is also discarded.

integration-tests/backend/util.go-292-301 (1)

integration-tests/backend/flowcollector.go-186-191 (1)

186-191: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix field filtering logic to avoid including empty slices in parameters.

Line 187's comparison of every field to "" doesn't panic, but it incorrectly treats empty/nil slices as non-empty. This causes commands to include parameters like EBPFeatures=[], which shouldn't be added. Use field.IsZero() instead to properly skip zero-valued fields.

Proposed fix

 for i := 0; i < flowCollector.NumField(); i++ {
-	if flowCollector.Field(i).Interface() != "" {
-		if flowCollector.Type().Field(i).Name != "Template" {
-			parameters = append(parameters, fmt.Sprintf("%s=%s", flowCollector.Type().Field(i).Name, flowCollector.Field(i).Interface()))
-		}
-	}
+	field := flowCollector.Field(i)
+	name := flowCollector.Type().Field(i).Name
+	if name == "Template" || field.IsZero() {
+		continue
+	}
+	parameters = append(parameters, fmt.Sprintf("%s=%v", name, field.Interface()))
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/flowcollector.go` around lines 186 - 191, The loop
building parameters incorrectly checks fields against the empty string, which
treats empty or nil slices as non-empty; update the loop that iterates over
flowCollector (using flowCollector.NumField(), flowCollector.Field(i) and
flowCollector.Type().Field(i).Name) to skip zero-valued fields by using
flowCollector.Field(i).IsZero() and continue when true, and still ensure you
exclude the "Template" field by name before appending to the parameters slice so
empty slices (e.g., EBPFeatures) are not added.

integration-tests/backend/test_exporters.go-240-247 (1)

240-247: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Simplify grep calls by passing arguments directly instead of shell-wrapping.

The bash -c approach is unnecessary here since textToExist, textToNotExist, and podLogs are all internal, non-user-controlled values. Direct argument passing is cleaner and follows Go best practices.

Proposed fix

-grepCmd := fmt.Sprintf("grep %s %s", textToExist, podLogs)
-textToExistLogs, err := exec.Command("bash", "-c", grepCmd).Output()
+textToExistLogs, err := exec.Command("grep", textToExist, podLogs).Output()
...
-grepCmd = fmt.Sprintf("grep %s %s || true", textToNotExist, podLogs)
-textToNotExistLogs, err := exec.Command("bash", "-c", grepCmd).Output()
-o.Expect(err).ToNot(o.HaveOccurred())
+textToNotExistLogs, err := exec.Command("grep", textToNotExist, podLogs).Output()
+if err != nil {
+	// grep exits 1 when no matches; that's expected for the negative case
+	if exitErr, ok := err.(*exec.ExitError); ok && exitErr.ExitCode() == 1 {
+		textToNotExistLogs = []byte{}
+	} else {
+		o.Expect(err).ToNot(o.HaveOccurred())
+	}
+}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration-tests/backend/test_exporters.go` around lines 240 - 247, Replace
the shell-wrapped grep invocations with direct exec.Command calls: instead of
exec.Command("bash","-c", grepCmd) build the command with exec.Command("grep",
textToExist, podLogs) and call Output() to get textToExistLogs; for the second
case (previously "grep ... || true") use exec.Command("grep", textToNotExist,
podLogs) and capture output (Output() or CombinedOutput()) but do not treat a
non-zero exit error as fatal—assign textToNotExistLogs from the command output
even if err != nil so the subsequent length check still works; update references
to grepCmd only if still used or remove it. Ensure you use the existing symbols
textToExistLogs, textToNotExistLogs and the exec.Command calls in the same
function.

openshift-ci Bot added the do-not-merge/work-in-progress label Mar 24, 2026

openshift-ci-robot added the jira/valid-reference label Mar 24, 2026

memodi reviewed Mar 27, 2026

View reviewed changes

Comment thread integration-tests/backend/alerts.go Outdated

memodi reviewed Mar 27, 2026

View reviewed changes

Comment thread integration-tests/backend/test_exporters.go

openshift-ci Bot added needs-rebase and removed needs-rebase labels Apr 16, 2026

openshift-ci Bot removed the needs-rebase label Apr 27, 2026

oliver-smakal marked this pull request as ready for review April 28, 2026 11:45

openshift-ci Bot removed the do-not-merge/work-in-progress label Apr 28, 2026

Oliver Smakal added 3 commits April 29, 2026 22:41

backport updates to yaml files

2049a62

make update-bundle

94fbbd3

add missing yaml file

727caa4

memodi reviewed May 1, 2026

View reviewed changes

openshift-ci Bot requested review from OlivierCazade, jotak, jpinsonneau and leandroberetta May 1, 2026 20:38

Oliver Smakal added 6 commits May 3, 2026 17:23

rm commented code

a40a45f

use compat_otp and semver of ocp version sheck

9dcfc51

fix ocp ver detection

821b6b4

fix ocp ver detection

06e8de1

add version check for OCP-74977

df2f9e4

fix typo

15e2676

memodi reviewed May 6, 2026

View reviewed changes

openshift-ci Bot assigned memodi May 6, 2026

openshift-ci Bot added lgtm and removed lgtm labels May 6, 2026

Oliver Smakal added 2 commits May 12, 2026 09:46

undo changes to go mod; tidy+vendor

80534a6

init a new go module

0f3384f

oliver-smakal force-pushed the backend-tests-migration/NETOBSERV-2387 branch from 47e5bd2 to 0f3384f Compare May 12, 2026 07:56

Oliver Smakal added 4 commits May 12, 2026 09:57

clenaup

b542618

adjust linting to reflect new mod structure

853dd70

fix lint on backported code

9f7fcfa

make update-bundle

79850f8

coderabbitai Bot reviewed May 12, 2026

View reviewed changes

Oliver Smakal added 2 commits May 12, 2026 16:52

rm unused loki yamls

78a28c2

fix bugs discovered by coderabbit

e6a96cb

Conversation

oliver-smakal commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Review tips

How to run locally

Example output

How it could be used in CI.

Alternative approaches

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Mar 24, 2026

Uh oh!

openshift-ci-robot commented Mar 24, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

codecov Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openshift-ci-robot commented Mar 24, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

openshift-ci-robot commented Mar 25, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

Uh oh!

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example output

How it could be used in CI.

Alternative approaches

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example output

How it could be used in CI.

Alternative approaches

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example output

How it could be used in CI.

Alternative approaches

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example output

How it could be used in CI.

Alternative approaches

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example output

How it could be used in CI.

Alternative approaches

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example output

How it could be used in CI.

Alternative approaches

Uh oh!

openshift-ci-robot commented Apr 13, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

oliver-smakal commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Mar 24, 2026 •

edited by openshift-ci Bot

Loading

codecov Bot commented Mar 24, 2026 •

edited

Loading

openshift-ci-robot commented Mar 24, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Mar 25, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 13, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

openshift-ci-robot commented Apr 14, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 14, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 14, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 14, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 16, 2026 •

edited by openshift-ci Bot

Loading

memodi left a comment •

edited

Loading