NETOBSERV-2387: Backend tests migration#2586
Conversation
|
Skipping CI for Draft Pull Request. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2586 +/- ##
==========================================
+ Coverage 72.50% 72.58% +0.08%
==========================================
Files 107 107
Lines 11482 11481 -1
==========================================
+ Hits 8325 8334 +9
+ Misses 2658 2651 -7
+ Partials 499 496 -3
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (6)
📝 WalkthroughWalkthroughAdds a backend e2e Go module with Ginkgo suite, shared helpers (operators, Loki, Kafka, networking/UDN, cloud STS/WIF), Makefile/lint/OWNERS, and extensive YAML templates/fixtures for scenarios. ChangesBackend E2E module and helpers
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@oliver-smakal: This pull request references NETOBSERV-2387 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
thanks @oliver-smakal , some questions/comments.
/cc @jotak @OlivierCazade @leandroberetta @jpinsonneau
JSYK, in terms of code the files in testdata and all the code has been already reviewed in openshift-tests-private except for couple of new files (version_checker.go, backend_suite_test.go) , besides those new files, the most important thing here to review here go.mod for new dependencies.
|
New changes are detected. LGTM label has been removed. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
47e5bd2 to
0f3384f
Compare
There was a problem hiding this comment.
Actionable comments posted: 23
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (30)
integration-tests/backend/testdata/networking/baselineadminnetworkPolicy.yaml-19-26 (1)
19-26:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove duplicate
namekey in second ingress rule.Line 26 redefines
namein the same mapping as Line 20. This is invalid/ambiguous YAML and can break template processing or silently override the earlier value.Proposed fix
- action: Deny name: default-deny-ingress2 from: - namespaces: namespaceSelector: matchLabels: kubernetes.io/metadata.name: ${CLIENT2_NS} - name: default-deny-ns🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/networking/baselineadminnetworkPolicy.yaml` around lines 19 - 26, The YAML contains a duplicate "name" key in the second ingress rule: remove the redundant "name: default-deny-ns" that duplicates the earlier "name: default-deny-ingress2" mapping (or, if a distinct name was intended, replace it with the correct unique name); update the ingress rule containing namespaces→namespaceSelector→matchLabels referencing ${CLIENT2_NS} so it only has a single valid "name" field (identify the mapping by the existing name "default-deny-ingress2" and the duplicate "default-deny-ns").integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml-49-50 (1)
49-50:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPin the container disk image to an immutable reference.
Using a floating image here makes backend e2e non-deterministic and can break tests when upstream changes.
Proposed fix
- image: quay.io/containerdisks/fedora + image: quay.io/containerdisks/fedora@sha256:<pinned-digest>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml` around lines 49 - 50, The image field currently uses a floating reference "quay.io/containerdisks/fedora" for the container disk (name: rootdisk); replace that with an immutable reference (either a specific version tag or a digest) such as "quay.io/containerdisks/fedora:<fixed-tag>" or "quay.io/containerdisks/fedora@sha256:<digest>" so the virtualization template (the image entry for rootdisk) is pinned and e2e tests become deterministic—update the template entry where image: quay.io/containerdisks/fedora is defined and ensure any test fixtures that rely on this template use the same pinned reference.integration-tests/backend/virtualization.go-36-46 (1)
36-46:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle empty worker-node lists explicitly.
If no worker nodes are returned, this currently reports
true, which can mis-gate bare-metal-only flows.Proposed fix
func hasMetalWorkerNodes(oc *exutil.CLI) bool { workers, err := compat_otp.GetClusterNodesBy(oc, "worker") o.Expect(err).NotTo(o.HaveOccurred()) + if len(workers) == 0 { + e2e.Logf("Cluster has no worker nodes") + return false + } for _, w := range workers {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/virtualization.go` around lines 36 - 46, The function that checks for metal worker nodes uses compat_otp.GetClusterNodesBy and currently returns true when the returned workers slice is empty; add an explicit check after obtaining workers (e.g., if len(workers) == 0) to return false (or fail the check) so empty worker lists do not incorrectly pass, then proceed to the existing loop that queries each node with oc.AsAdmin().WithoutNamespace().Run("get") and inspects the node label node.kubernetes.io/instance-type for "metal".integration-tests/backend/testdata/networking/test-client-DSCP.yaml-14-17 (1)
14-17:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix shell redirection order and use block scalar for clarity
Line 16 has reversed redirection:
2>&1 > /dev/nullsends stderr to the terminal before redirecting stdout to/dev/null, allowing stderr to leak. Use>/dev/null 2>&1instead. Also replace the fragile escaped quoted string with a YAML block scalar (|) for better readability and maintainability.Proposed fix
- command: - sh - -c - - " - \ while : ; do\n - \ curl nginx-service.test-server-68125.svc:80/data/100K 2>&1 > /dev/null ; sleep 5 \n - \ done" + - | + while :; do + curl nginx-service.test-server-68125.svc:80/data/100K >/dev/null 2>&1 + sleep 5 + done🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/networking/test-client-DSCP.yaml` around lines 14 - 17, Replace the escaped quoted shell string block with a YAML block scalar (|) containing the loop, and fix the redirection order in the curl command: change the fragment that contains "while : ; do ... curl nginx-service.test-server-68125.svc:80/data/100K 2>&1 > /dev/null ; sleep 5 ; done" so it reads as a multi-line block scalar with the curl redirection as >/dev/null 2>&1 (i.e., redirect stdout to /dev/null first, then stderr to stdout); update the YAML entry that currently holds the escaped string accordingly.integration-tests/backend/testdata/flowcollectorSlice_v1alpha1_template.yaml-13-13 (1)
13-13:⚠️ Potential issue | 🟠 Major | ⚡ Quick winConflicting template syntax for SubnetLabels.
Using
"${{SubnetLabels}}"combines quotes with double-brace parameter substitution. The double braces instruct OpenShift to parse the parameter value as YAML/JSON, but the surrounding quotes force string conversion.If
SubnetLabelsis intended to be an array or object, the quotes will break the parsing.🔧 Proposed fix
Remove the quotes to allow proper YAML parsing:
- subnetLabels: "${{SubnetLabels}}" + subnetLabels: ${{SubnetLabels}}Alternatively, if it must be a string, use single braces:
- subnetLabels: "${{SubnetLabels}}" + subnetLabels: "${SubnetLabels}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/flowcollectorSlice_v1alpha1_template.yaml` at line 13, The template uses subnetLabels: "${{SubnetLabels}}" which wraps the ${{...}} parameter in quotes, forcing string conversion and breaking YAML/JSON parsing for objects/arrays; change it to subnetLabels: ${{SubnetLabels}} so the SubnetLabels parameter is injected and parsed as YAML/JSON, or if SubnetLabels must be a literal string instead, use subnetLabels: "${SubnetLabels}" (single-brace parameter) to preserve it as a string.integration-tests/backend/ip_utils.go-35-56 (1)
35-56:⚠️ Potential issue | 🟠 Major | ⚡ Quick winArray bounds error and confusing IP ordering logic.
Line 42 accesses
.status.podIPs[1].ipwithout verifying the array has two elements. If the pod has only one IP (e.g., during initialization or in single-stack mode miscategorized as dual-stack), this will fail.Additionally, the function fetches
podIPs[1]first, thenpodIPs[0], then conditionally reorders based on IP version. This is confusing and error-prone.🛡️ Proposed fix with bounds checking and clearer logic
func getPodIP(oc *exutil.CLI, namespace, podName, ipStack string) (string, string) { if (ipStack == "ipv6single") || (ipStack == "ipv4single") { podIP, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs[0].ip}").Output() o.Expect(err).NotTo(o.HaveOccurred()) e2e.Logf("The pod %s IP in namespace %s is %q", podName, namespace, podIP) return podIP, "" } else if ipStack == "dualstack" { + // Fetch all IPs as JSON array + podIPsJSON, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs}").Output() + o.Expect(err).NotTo(o.HaveOccurred()) + // Parse and validate we have exactly 2 IPs, then order by IPv4/IPv6 + // ... (add JSON parsing logic) - podIP1, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs[1].ip}").Output() - o.Expect(err).NotTo(o.HaveOccurred()) - e2e.Logf("The pod's %s 1st IP in namespace %s is %q", podName, namespace, podIP1) - podIP2, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("pod", "-n", namespace, podName, "-o=jsonpath={.status.podIPs[0].ip}").Output() - o.Expect(err).NotTo(o.HaveOccurred()) - e2e.Logf("The pod's %s 2nd IP in namespace %s is %q", podName, namespace, podIP2) - if netutils.IsIPv6String(podIP1) { - e2e.Logf("This is IPv4 primary dual stack cluster with IP %s", podIP1) - return podIP1, podIP2 - } - e2e.Logf("This is IPv6 primary dual stack cluster with IP %s", podIP2) - return podIP2, podIP1 } return "", "" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/ip_utils.go` around lines 35 - 56, The getPodIP function is accessing .status.podIPs[1] without bounds checks and makes confusing repeated calls and ordering; fix it by fetching the podIPs array once (use a jsonpath like {.status.podIPs[*].ip} or retrieve the whole status.podIPs), verify the slice length is >=2 before accessing index 1 and handle the single-IP case gracefully (return the single IP and empty string or surface an error), then determine which IP is primary by calling netutils.IsIPv6String on the two retrieved IPs and return them in the correct primary,secondary order; update references in getPodIP to stop making two separate oc.Run("get") calls and add the length check to avoid out-of-bounds panics.integration-tests/backend/ip_utils.go-14-25 (1)
14-25:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftReplace character counting with proper JSON array parsing.
The
serviceNetworkfield returns a JSON array format like["10.0.0.0/16","fd00::/48"]. Counting colons and dots in this JSON string is unreliable—brackets and quotes affect the count, and the logic is semantically incorrect.Parse the JSON array and iterate through each CIDR to determine IPv4/IPv6 presence. Consider using
net.ParseCIDR()or similar to validate and classify each network:var networks []string err := json.Unmarshal([]byte(svcNetwork), &networks) o.Expect(err).NotTo(o.HaveOccurred()) // Then check each network for IPv4/IPv6🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/ip_utils.go` around lines 14 - 25, The current checkIPStackType uses character counting on svcNetwork JSON output which is unreliable; instead, unmarshal svcNetwork into a []string (e.g., var networks []string and json.Unmarshal([]byte(svcNetwork), &networks)) inside checkIPStackType, o.Expect no error, then iterate networks and for each use net.ParseCIDR to determine if it is IPv4 or IPv6, track presence of v4 and v6 and return "dualstack", "ipv4single", or "ipv6single" accordingly (ensure you still call oc.WithoutNamespace().AsAdmin().Run(...).Output() to get svcNetwork and use the same function name checkIPStackType for the fix).integration-tests/backend/testdata/netobserv-loki-reader-multitenant-crb.yaml-9-9 (1)
9-9:⚠️ Potential issue | 🟠 Major | ⚡ Quick winParameterize ClusterRoleBinding name to avoid cross-test conflicts.
A static cluster-scoped name can be overwritten when tests bind different users, causing flaky auth behavior and permission bleed.
Proposed diff
metadata: - name: netobserv-user-reader + name: "netobserv-user-reader-${USERNAME}"Also applies to: 17-20
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/netobserv-loki-reader-multitenant-crb.yaml` at line 9, The ClusterRoleBinding uses a static name ("netobserv-user-reader") which causes cross-test conflicts; change the ClusterRoleBinding name to be unique per-test (e.g., use metadata.generateName or a templated/parameterized name like "{{testID}}-netobserv-user-reader" or include the release/test name) wherever the static name appears (the "name: netobserv-user-reader" entries and the ClusterRoleBinding resource referenced around lines 17-20); ensure any code that references this binding (subjects/roleRef) uses the same generated/templated value so each test gets an isolated, non-conflicting ClusterRoleBinding.integration-tests/backend/testdata/networking/nmstate/operatorgroup-template.yaml-9-11 (1)
9-11:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove redundant
generateNamefield and hardcoded suffix.When
nameis present, Kubernetes ignoresgenerateNameper the ObjectMeta specification. Remove the pointlessgenerateNameline and the brittle-dzrmxsuffix to use a single, deterministic naming strategy for repeatable tests.Proposed diff
metadata: - name: "${NAME}-dzrmx" + name: "${NAME}" namespace: "${NAMESPACE}" - generateName: "${NAME}-"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/networking/nmstate/operatorgroup-template.yaml` around lines 9 - 11, Remove the redundant generateName field and the hardcoded "-dzrmx" suffix: update the resource metadata by deleting the generateName line and change the name value from "${NAME}-dzrmx" to a deterministic "${NAME}" (or another stable token) so tests use a single, repeatable naming strategy; ensure you only modify the metadata.name and remove generateName to avoid Kubernetes ignoring generateName when name is present.integration-tests/backend/testdata/test-tls-server_template.yaml-59-63 (1)
59-63:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd
fsGroupto pod securityContext to enable non-root init container writes to emptyDir.With
runAsNonRoot: trueand nofsGroupdefined, the init container cannot reliably write certificates to/etc/nginx/ssl. Kubernetes usesfsGroupto set group ownership on emptyDir volumes, enabling the non-root process to write.Proposed diff
spec: securityContext: runAsNonRoot: true + fsGroup: 1001 seccompProfile: type: RuntimeDefaultAlso applies to: lines 73–79, 98–103
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/test-tls-server_template.yaml` around lines 59 - 63, Add an fsGroup entry to the pod-level securityContext so non-root init containers can write to emptyDir volumes: update the securityContext block where runAsNonRoot: true is set (the same blocks near initContainers and the other occurrences) to include fsGroup: <appropriate-gid> (matching the group owner used by the containers) so Kubernetes will chown the emptyDir volume (e.g., the /etc/nginx/ssl mount) and allow non-root init container writes.integration-tests/backend/testdata/test-nginx-server_template.yaml-20-23 (1)
20-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix invalid securityContext placement and privileged field nesting.
securityContextat lines 20–23 is misplaced at Deployment spec level; it must be underspec.template.spec. Theprivilegedfield at line 39 is incorrectly nested undercapabilities(which only acceptsadd/drop); it must be a sibling ofcapabilitiesundersecurityContext. These structural errors will cause the manifest to be rejected or silently drop security constraints.Proposed diff
spec: - securityContext: - runAsNonRoot: true - seccompProfile: - type: RuntimeDefault replicas: 1 @@ template: metadata: labels: app: nginx spec: + securityContext: + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault containers: - name: nginx securityContext: allowPrivilegeEscalation: false + privileged: false capabilities: drop: ["ALL"] - privileged: false🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/test-nginx-server_template.yaml` around lines 20 - 23, The manifest places securityContext at the Deployment spec root and nests privileged under capabilities; move the top-level securityContext into the Pod template (place it under spec.template.spec.securityContext) so pod-level settings are applied, and for the container-level security settings move the privileged boolean out of capabilities and make it a sibling of capabilities under the container's securityContext (i.e., container.securityContext.privileged alongside container.securityContext.capabilities). Ensure you update the blocks that reference securityContext, spec.template.spec, capabilities, and privileged accordingly so the structure matches Kubernetes schema.integration-tests/backend/testdata/exporters/ipfix-collector.yaml-20-20 (1)
20-20:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPin the image tag to a specific version.
Using
:latestmakes tests non-deterministic and can cause unexpected failures when the upstream image changes.📌 Proposed fix
- image: antrea/ipfix-collector:latest + image: antrea/ipfix-collector:v0.7.0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/exporters/ipfix-collector.yaml` at line 20, Replace the non-deterministic image reference "image: antrea/ipfix-collector:latest" with a specific, immutable tag or digest (e.g., "image: antrea/ipfix-collector:vX.Y.Z" or "image: antrea/ipfix-collector@sha256:...") so tests are reproducible; update the ipfix-collector image entry in the YAML and ensure the chosen version is recorded in test dependencies or CI config.integration-tests/backend/testdata/exporters/ipfix-collector.yaml-18-32 (1)
18-32:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd securityContext and resource limits.
The container lacks security constraints and resource limits. For test reliability and cluster safety, add:
securityContextwithrunAsNonRoot,allowPrivilegeEscalation: false, dropped capabilitiesresourceswith limits and requests🛡️ Proposed fix
containers: - name: ipfix-collector image: antrea/ipfix-collector:latest + securityContext: + runAsNonRoot: true + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + seccompProfile: + type: RuntimeDefault + resources: + limits: + memory: 256Mi + cpu: 200m + requests: + memory: 128Mi + cpu: 100m args:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/exporters/ipfix-collector.yaml` around lines 18 - 32, The ipfix-collector container spec (container name "ipfix-collector") is missing securityContext and resource limits; update the container definition to add a securityContext that sets runAsNonRoot: true, allowPrivilegeEscalation: false and drops capabilities (e.g., capabilities.drop: ["ALL"]) and optionally a seccompProfile/runtimeClass if desired, and add a resources block with sensible requests and limits (cpu and memory) for both requests and limits to protect cluster resources; apply these changes in the container stanza that contains image: antrea/ipfix-collector:latest and the args list so the securityContext and resources are applied to that container.integration-tests/backend/testdata/testuser-template-crb.yaml-9-10 (1)
9-10:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid static cluster-scoped RBAC names to prevent collisions.
Fixed names for
ClusterRole/ClusterRoleBindingcan conflict in parallel CI runs or reruns. Add a suffix parameter and include it in both names and roleRef.Proposed fix
- name: testuser-templating-cr + name: testuser-templating-cr-${RESOURCE_SUFFIX} @@ - name: testuser-templating-crb + name: testuser-templating-crb-${RESOURCE_SUFFIX} @@ - name: testuser-templating-cr + name: testuser-templating-cr-${RESOURCE_SUFFIX} @@ parameters: - name: USERNAME required: true +- name: RESOURCE_SUFFIX + required: trueAlso applies to: 23-27, 32-34
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/testuser-template-crb.yaml` around lines 9 - 10, The ClusterRole and ClusterRoleBinding use static names (e.g., "testuser-templating-cr") which can collide; add a template parameter (e.g., a suffix variable like "{{ .Values.suffix }}" or similar used in this repo) and append it to the ClusterRole metadata.name, the ClusterRoleBinding metadata.name, and the roleRef.name so all three match the suffixed name; update every occurrence (including the other instances noted around lines 23-27 and 32-34) to use the same suffix variable so parallel CI runs produce unique names.integration-tests/backend/alerts.go-24-24 (1)
24-24:⚠️ Potential issue | 🟠 Major | ⚡ Quick winError from json.Unmarshal is silently ignored.
If unmarshaling fails,
alertStatuswill be empty and the function will return an empty map, masking the real error. This makes debugging difficult.🛡️ Proposed fix
var alertStatus []interface{} - _ = json.Unmarshal([]byte(alertOut), &alertStatus) + if err := json.Unmarshal([]byte(alertOut), &alertStatus); err != nil { + return make(map[string]interface{}), err + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/alerts.go` at line 24, The json.Unmarshal call currently ignores its returned error which can mask failures: capture the error from json.Unmarshal([]byte(alertOut), &alertStatus) and handle it (e.g., return the error from the surrounding function or fail the test/log a descriptive message) instead of discarding it; update the function that uses alertStatus so it propagates or reports the unmarshalling error to callers/tests to avoid returning an empty map silently.integration-tests/backend/multitenants.go-73-75 (1)
73-75:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDrop shell-wrapped
htpasswdexecution.Using
bash -cwith dynamic strings is unnecessary and expands injection risk. Pass arguments directly toexec.Command.Proposed fix
- cmd := fmt.Sprintf("htpasswd -b %v %v %v", *passwdFile, users[i].Username, users[i].Password) - err := exec.Command("bash", "-c", cmd).Run() + err := exec.Command("htpasswd", "-b", *passwdFile, users[i].Username, users[i].Password).Run() if err != nil { return err } @@ - cmd := fmt.Sprintf("htpasswd -D %v %v", usersHTpassFile, users[i].Username) - err := exec.Command("bash", "-c", cmd).Run() + err := exec.Command("htpasswd", "-D", usersHTpassFile, users[i].Username).Run() o.Expect(err).NotTo(o.HaveOccurred())Also applies to: lines 124-126
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/multitenants.go` around lines 73 - 75, The test currently constructs a shell command string in variable cmd and runs it via exec.Command("bash","-c", cmd).Run(), which risks shell injection; change the call to pass arguments directly to exec.Command by invoking exec.Command with the program "htpasswd" and the argument list "-b", *passwdFile, users[i].Username, users[i].Password (and analogous replacements for the similar block at lines 124-126), removing the intermediate cmd string so exec.Command receives explicit args instead of a shell-wrapped command.integration-tests/backend/loki.go-73-78 (1)
73-78:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReadiness failures are overwritten
Lines 73-78 overwrite
erron each iteration. A failed component can be masked by a later successful check, returning nil incorrectly.Suggested fix
for _, deploy := range []string{l.Name + "-distributor", l.Name + "-gateway", l.Name + "-querier", l.Name + "-query-frontend"} { - err = waitForDeploymentPodsToBeReady(oc, l.Namespace, deploy) + if err = waitForDeploymentPodsToBeReady(oc, l.Namespace, deploy); err != nil { + return err + } } for _, ss := range []string{l.Name + "-compactor", l.Name + "-index-gateway", l.Name + "-ingester"} { - err = waitForStatefulsetReady(oc, l.Namespace, ss) + if err = waitForStatefulsetReady(oc, l.Namespace, ss); err != nil { + return err + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/loki.go` around lines 73 - 78, The loops calling waitForDeploymentPodsToBeReady and waitForStatefulsetReady currently assign to err on each iteration, allowing later successes to overwrite earlier failures; change the logic in the block that iterates over []string{l.Name + "-distributor", ...} and []string{l.Name + "-compactor", ...} to preserve errors (e.g., if err != nil keep/aggregate it or return early) instead of blindly reassigning err each time, so failures from waitForDeploymentPodsToBeReady or waitForStatefulsetReady are not lost.integration-tests/backend/operator.go-171-175 (1)
171-175:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNotFound detection is broken due malformed formatting
Line 173 uses
fmt.Sprint("v%", sub), producing a literal string and preventing reliableNotFoundmatching at Line 174. This can skip subscription creation.Suggested fix
- msg := fmt.Sprint("v%", sub) + msg := fmt.Sprintf("%v", err) if strings.Contains(msg, "NotFound") {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/operator.go` around lines 171 - 175, The NotFound detection is broken because msg is built with fmt.Sprint("v%", sub) which produces a literal string; replace that with a proper formatting or conversion (e.g., fmt.Sprintf("%v", sub) or fmt.Sprint(sub)) and change the strings.Contains check to inspect the actual output variable (sub) or the corrected msg; update the block around oc.AsAdmin().WithoutNamespace().Run("get").Args("sub", "-n", so.Namespace, so.PackageName).Output(), the variable sub, and the strings.Contains(msg, "NotFound") check so subscription creation via so.setCatalogSourceObjects(oc) triggers correctly.integration-tests/backend/azure_utils.go-296-296 (1)
296-296:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRegion query can return invalid multi-value output
Line 296 uses
.items[]...region, which can return multiple values (space-concatenated). That can break Azure SDK calls expecting one region string.Suggested fix
-func getAzureClusterRegion(oc *exutil.CLI) (string, error) { - return oc.AsAdmin().WithoutNamespace().Run("get").Args("node", `-ojsonpath={.items[].metadata.labels.topology\.kubernetes\.io/region}`).Output() -} +func getAzureClusterRegion(oc *exutil.CLI) (string, error) { + return oc.AsAdmin().WithoutNamespace().Run("get").Args("infrastructure", "cluster", "-o=jsonpath={.status.platformStatus.azure.region}").Output() +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/azure_utils.go` at line 296, The current call using oc.AsAdmin().WithoutNamespace().Run("get").Args("node", `-ojsonpath={.items[].metadata.labels.topology\.kubernetes\.io/region}`).Output() can return multiple space-concatenated regions; change the query to return a single region (e.g. select the first item) or fetch JSON and parse to a single value so Azure SDK receives one region string. Update the Args(...) jsonpath to target {.items[0].metadata.labels.topology.kubernetes.io/region} or replace the jsonpath call with -o json and parse the first item's label in Go before returning to ensure only one region is returned by the function.integration-tests/backend/loki_client.go-274-274 (1)
274-274:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd timeout to HTTP client to prevent indefinite test hangs
The http.Client created at line 274 is missing a Timeout field. If the endpoint becomes stalled, test execution will hang indefinitely.
Suggested fix
-client := &http.Client{Transport: tr} +client := &http.Client{ + Transport: tr, + Timeout: 30 * time.Second, +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/loki_client.go` at line 274, The http.Client created as client := &http.Client{Transport: tr} lacks a Timeout causing potential test hangs; update the initializer to include a sensible Timeout (e.g., Timeout: 10 * time.Second) and add the time import if it's not present so the client is created as &http.Client{Transport: tr, Timeout: <duration>}. Ensure the change is applied where client is declared so all requests from this loki client use the timeout.integration-tests/backend/loki_client.go-265-270 (1)
265-270:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRemove hardcoded InsecureSkipVerify from Loki client or add justification
Lines 265 and 270 disable TLS certificate verification unconditionally. Even in integration tests, TLS verification should be enabled—configure self-signed certificates for the test environment instead. If disabling verification is necessary, add a comment explaining why and consider making it conditional on an explicit flag.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/loki_client.go` around lines 265 - 270, The http.Transport creation in integration-tests/backend/loki_client.go currently sets TLSClientConfig: &tls.Config{InsecureSkipVerify: true} (both when Proxy: http.ProxyURL(proxyURL) is used and in the else branch); remove this hardcoded InsecureSkipVerify or make it conditional: either enable normal TLS verification and wire the test harness to use self-signed certs for the Loki test server, or gate InsecureSkipVerify behind an explicit test flag (e.g., a boolean like allowInsecureTLS) and document the justification in a comment next to the TLSClientConfig line; update any code paths that create tr (the http.Transport) so they use the new flag/behavior instead of always disabling verification.integration-tests/backend/test_flowcollectorslice.go-543-543 (1)
543-543:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not log authentication tokens.
Printing
user0tokenexposes credentials in test logs.Proposed fix
-e2e.Logf("testuser token: %s", user0token) +e2e.Logf("retrieved testuser token")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/test_flowcollectorslice.go` at line 543, The test currently logs sensitive credentials via e2e.Logf("testuser token: %s", user0token); remove or replace that call so the raw user0token is never printed — e.g., log a non-sensitive message (like "testuser token created" or the token length) or mask the token before logging; keep any assertion that user0token is non-empty (e.g., require.NotEmpty or t.Fatalf checks) but do not output the token value. Ensure you update the e2e.Logf invocation that references user0token accordingly.integration-tests/backend/test_flowmetrics.go-112-113 (1)
112-113:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDashboard existence is not asserted.
The result of
checkResourceExistsis discarded, so this verification is currently ineffective.Proposed fix
-_, _ = checkResourceExists(oc, "cm", "netobserv-"+dashName, "openshift-config-managed") +exists, err := checkResourceExists(oc, "cm", "netobserv-"+dashName, "openshift-config-managed") +o.Expect(err).NotTo(o.HaveOccurred()) +o.Expect(exists).To(o.BeTrue(), "expected dashboard configmap netobserv-"+dashName+" to exist")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/test_flowmetrics.go` around lines 112 - 113, The call to checkResourceExists(oc, "cm", "netobserv-"+dashName, "openshift-config-managed") discards its (bool, error) result so the test doesn't actually assert dashboard existence; capture its return values (e.g., exists, err := checkResourceExists(...)) and then fail the test if err != nil or exists is false (use t.Fatalf or your test assertion helper like require.NoError/True) so the dashboard presence is asserted for the "netobserv-"+dashName configmap.integration-tests/backend/udn.go-327-351 (1)
327-351:⚠️ Potential issue | 🟠 Major | ⚡ Quick winLayer2 dual-stack UDN path skips readiness wait.
The
layer2+dualstackpath callscreateLayer2DualStackUDNCRD(oc)but omits thewaitUDNCRDAppliedcall, while thelayer2+defaultpath and all other paths correctly include it. This creates a race condition where subsequent steps may proceed before the CRD is fully applied.Add the wait call after
createLayer2DualStackUDNCRD(oc)to match the pattern in thedefaultpath within the same switch statement.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/udn.go` around lines 327 - 351, The layer2+dualstack branch calls udnCRDResource.createLayer2DualStackUDNCRD(oc) but fails to wait for the CRD to be applied; after calling createLayer2DualStackUDNCRD(oc) add the same readiness check used in the default branch: call waitUDNCRDApplied(oc, namespace, udncrd.crdname), capture the error into err, and assert o.Expect(err).NotTo(o.HaveOccurred()); this mirrors the behavior in the default path and prevents the race.integration-tests/backend/util.go-71-75 (1)
71-75:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix duplicated condition in proxy fallback logic.
Line 73 checks
http_proxyagain instead ofhttps_proxy, making thehttps_proxyfallback unreachable.Proposed fix
if os.Getenv("http_proxy") != "" { proxy = os.Getenv("http_proxy") -} else if os.Getenv("http_proxy") != "" { +} else if os.Getenv("https_proxy") != "" { proxy = os.Getenv("https_proxy") }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/util.go` around lines 71 - 75, The proxy fallback logic uses the wrong env check: change the second condition so it tests os.Getenv("https_proxy") instead of repeating "http_proxy" so the https_proxy branch becomes reachable; update the block around the proxy variable assignment in integration-tests/backend/util.go (the if/else that sets proxy) to check "https_proxy" in the else-if and keep the assignment to proxy = os.Getenv("https_proxy").integration-tests/backend/udn.go-409-425 (1)
409-425:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTrim pod IP command output before using it.
execCommandInSpecificPodreturns untrimmed shell output which typically includes trailing newlines. Passing these directly tonet.JoinHostPortcreates malformed addresses (e.g., "192.168.1.1\n:8080"), breaking the curl commands inCurlPod2PodFailUDNandCurlPod2PodPassUDN.Add
strings.TrimSpace()calls after eachexecCommandInSpecificPodinvocation for bothpodIPv4andpodIPv6.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/udn.go` around lines 409 - 425, The pod IP strings returned by execCommandInSpecificPod are not trimmed and may include trailing newlines, which breaks address construction used by CurlPod2PodFailUDN and CurlPod2PodPassUDN; after every call to execCommandInSpecificPod that assigns podIPv4 or podIPv6 (including the "ipv4single", "ipv6single", and default branches in the switch) wrap the returned value with strings.TrimSpace() before logging and returning so podIPv4 and podIPv6 contain clean IPs for subsequent net.JoinHostPort/curl usage.integration-tests/backend/flowcollector_utils.go-335-340 (1)
335-340:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAccumulate Loki records across streams instead of overwriting.
Each loop iteration overwrites
flowRecords; only the last stream survives. This loses data when Loki returns multiple result streams, causing tests to fail or see incomplete results.Applies to lines 335-340, 370-375, and 393-398.
Proposed fix
for _, result := range res.Data.Result { - flowRecords, err = getFlowRecords(result.Values) + recs, err := getFlowRecords(result.Values) if err != nil { return []FlowRecord{}, err } + flowRecords = append(flowRecords, recs...) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/flowcollector_utils.go` around lines 335 - 340, The loop that processes Loki result streams overwrites flowRecords each iteration, losing earlier streams; modify the code that iterates over res.Data.Result (and the similar loops around getFlowRecords calls) to append into a single slice instead of reassigning: call getFlowRecords(result.Values), check err, then extend the existing flowRecords slice (e.g., use append(flowRecords, returnedSlice...)) so all streams are accumulated across iterations while preserving error handling in the same places where getFlowRecords is invoked.integration-tests/backend/flowcollector.go-225-231 (1)
225-231:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReadiness check can pass on non-ready states.
strings.Contains(status, "Ready")returns true for "NotReady" and similar values. Use exact string comparison instead:Proposed fix
- if strings.Contains(status, "Ready") { + if strings.Trim(status, "'") == "Ready" { return true, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/flowcollector.go` around lines 225 - 231, The readiness check currently uses strings.Contains(status, "Ready") which will incorrectly match values like "NotReady"; update the check to perform an exact comparison: normalize the returned status string from the jsonpath (trim surrounding quotes/apostrophes and whitespace from the status variable) and then compare equality to "Ready" (e.g. status == "Ready"). Locate the status variable and the conditional in flowcollector.go (the call that sets status and the if strings.Contains(...) block) and replace the Contains-based test with the trimmed exact equality check so only the literal "Ready" passes.integration-tests/backend/kafka.go-179-182 (1)
179-182:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse a containment check for Ready condition.
Line 181 compares
status == "Ready". Condition output can include multiple entries, causing false negatives and unnecessary timeouts.Proposed fix
- if status == "Ready" { + if strings.Contains(status, "Ready") { return true, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/kafka.go` around lines 179 - 182, The equality check if status == "Ready" can miss cases where the status string contains multiple entries; change the condition to a containment check (e.g., use strings.Contains(status, "Ready")) and optionally trim whitespace/newlines on status before checking. Update the code around the status variable (created with strings.Replace) and the if statement that currently uses equality so it uses strings.Contains(status, "Ready") (or strings.Contains(strings.TrimSpace(status), "Ready")) instead of ==; keep the e2e.Logf call as-is.integration-tests/backend/kafka.go-155-160 (1)
155-160:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't terminate poll loops on transient
oc geterrors.Returning non-nil errors from the condition function stops polling immediately instead of retrying until timeout. This causes premature test failures when resources are not yet visible or API calls encounter transient errors.
Proposed fix
+import apierrors "k8s.io/apimachinery/pkg/api/errors" @@ - if err != nil { - e2e.Logf("kafka status ready error: %v", err) - return false, err - } + if err != nil { + if apierrors.IsNotFound(err) { + return false, nil + } + e2e.Logf("kafka status ready error: %v", err) + return false, nil + } @@ - if err != nil { - e2e.Logf("kafka Topic status ready error: %v", err) - return false, err - } + if err != nil { + if apierrors.IsNotFound(err) { + return false, nil + } + e2e.Logf("kafka Topic status ready error: %v", err) + return false, nil + }Applies to lines 155-160 and 172-177.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/kafka.go` around lines 155 - 160, The poll condition passed to wait.PollUntilContextTimeout is returning non-nil errors on transient oc get failures which aborts polling; instead, catch errors from oc.AsAdmin().WithoutNamespace().Run("get").Args(...).Output() in the condition functions (both the block around kafka status and the later block at 172-177), log the error with e2e.Logf as you already do, and return (false, nil) so the poll continues retrying until timeout; only return a non-nil error when the error is terminal (not transient) or when you intentionally want to abort the poll.
🟡 Minor comments (9)
integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml-54-56 (1)
54-56:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAvoid secret-like password literals in committed testdata.
This value is likely synthetic, but it still trips secret scanners. Use a clearly fake value or parameterize it.
Proposed fix
- password: byje-7cd2-i8et + password: test-password🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/virtualization/test-vm-localnet_template.yaml` around lines 54 - 56, The test data contains a secret-like literal in the password field (keys: user, password, chpasswd) — replace the current realistic-looking password with a clearly fake placeholder or parameterize it so secret scanners won't flag it; update the YAML entry for password to use a non-secret value like a constant "CHANGE_ME"/"fake-password" or a template variable (e.g., {{VM_ROOT_PASSWORD}}) and ensure any test harness reads the parameterized value from test config or environment instead of committing real-looking credentials.integration-tests/backend/sctp.go-58-58 (1)
58-58:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix typo in skip message.
"woker" should be "worker".
📝 Proposed fix
- g.Skip("Can not find any woker nodes in the cluster") + g.Skip("Can not find any worker nodes in the cluster")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/sctp.go` at line 58, The skip message passed to g.Skip contains a typo; update the string used in the g.Skip(...) call in integration-tests/backend/sctp.go (the g.Skip invocation) to replace "woker" with "worker" (i.e., "Can not find any worker nodes in the cluster"), keeping the rest of the message unchanged.integration-tests/backend/sctp.go-31-31 (1)
31-31:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix typo in skip message.
"Yum insall" should be "Yum install".
📝 Proposed fix
- g.Skip("Yum insall to enable sctp cannot work in a disconnected cluster, skip the test!!!") + g.Skip("Yum install to enable sctp cannot work in a disconnected cluster, skip the test!!!")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/sctp.go` at line 31, Update the skip message passed to g.Skip to correct the typo: change "Yum insall to enable sctp cannot work in a disconnected cluster, skip the test!!!" to "Yum install to enable sctp cannot work in a disconnected cluster, skip the test!!!" so the g.Skip(...) call uses the correct word "install".integration-tests/backend/testdata/test-nginx-client_template.yaml-30-37 (1)
30-37:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMove
privilegedfield to correct level.
privileged: falseon line 34 is incorrectly nested undercapabilities. It should be a sibling ofallowPrivilegeEscalationat thesecurityContextlevel.🔧 Proposed fix
securityContext: allowPrivilegeEscalation: false + privileged: false capabilities: drop: ["ALL"] - privileged: false runAsNonRoot: true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/test-nginx-client_template.yaml` around lines 30 - 37, The `privileged: false` property is incorrectly nested under `capabilities`; move it out so that `privileged` is a direct child of `securityContext` (i.e., a sibling of `allowPrivilegeEscalation` and `runAsNonRoot`) rather than nested under `capabilities` — update the YAML structure around `securityContext`, `allowPrivilegeEscalation`, `capabilities`, `privileged`, and `runAsNonRoot` so `capabilities` contains only `drop` and `privileged` sits at the securityContext level.integration-tests/backend/testdata/testuser-client-server_template.yaml-89-90 (1)
89-90:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix redirection order to silence both stdout and stderr.
The current
2>&1 > /dev/nullredirects stderr to stdout before stdout goes to /dev/null, leaving stderr on the terminal. Use> /dev/null 2>&1instead.Proposed fix
- \ curl nginx-service.${SERVER_NS}.svc:8080/data/${OBJECT_SIZE} 2>&1 > /dev/null ; sleep 5 \n + \ curl nginx-service.${SERVER_NS}.svc:8080/data/${OBJECT_SIZE} > /dev/null 2>&1 ; sleep 5 \n🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/testuser-client-server_template.yaml` around lines 89 - 90, The shell command string containing "curl nginx-service.${SERVER_NS}.svc:8080/data/${OBJECT_SIZE} 2>&1 > /dev/null ; sleep 5" uses the wrong redirection order so stderr still prints; change the redirection to send stdout to /dev/null first and then redirect stderr to stdout by replacing "2>&1 > /dev/null" with "> /dev/null 2>&1" in the command string (the loop/command that invokes curl) so both stdout and stderr are silenced.integration-tests/backend/testdata/test-tls-client_template.yaml-32-36 (1)
32-36:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the redirect order.
The stderr/stdout redirects should be
> /dev/null 2>&1not2>&1 > /dev/null. Current order redirects stderr to the original stdout before redirecting stdout to /dev/null, leaving stderr visible.🔧 Proposed fix
- curl -sk --tlsv1.3 https://tls-server-service.${SERVER_NS}.svc:443/ 2>&1 > /dev/null + curl -sk --tlsv1.3 https://tls-server-service.${SERVER_NS}.svc:443/ > /dev/null 2>&1 sleep 5 - curl -sk --tlsv1.2 --tls-max 1.2 https://tls-server-service.${SERVER_NS}.svc:443/ 2>&1 > /dev/null + curl -sk --tlsv1.2 --tls-max 1.2 https://tls-server-service.${SERVER_NS}.svc:443/ > /dev/null 2>&1 sleep 5 - curl -s http://tls-server-service.${SERVER_NS}.svc:80/ 2>&1 > /dev/null + curl -s http://tls-server-service.${SERVER_NS}.svc:80/ > /dev/null 2>&1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/testdata/test-tls-client_template.yaml` around lines 32 - 36, The curl invocations (e.g., the lines starting with "curl -sk --tlsv1.3 https://tls-server-service.${SERVER_NS}.svc:443/", "curl -sk --tlsv1.2 --tls-max 1.2 https://tls-server-service.${SERVER_NS}.svc:443/", and "curl -s http://tls-server-service.${SERVER_NS}.svc:80/") currently use the wrong redirect order ("2>&1 > /dev/null"); change each to redirect stdout first and then stderr to stdout by using "> /dev/null 2>&1" so stderr is also discarded.integration-tests/backend/util.go-292-301 (1)
292-301:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd timeout to HTTP client in tests.
The client lacks a timeout (line 301), which can cause test hangs. However,
InsecureSkipVerify: trueis likely necessary for test cluster endpoints using self-signed certificates—replacing it withMinVersion: tls.VersionTLS12would not preserve the same behavior and would break certificate validation. If test infrastructure supports proper certificate setup, that should be addressed separately.Add
Timeout: 30 * time.Second(or appropriate duration) to thehttp.Clientcreation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/util.go` around lines 292 - 301, The http.Client created using the local variable tr (Transport) lacks a timeout and can hang tests; modify the client construction (where client := &http.Client{Transport: tr}) to include a timeout, e.g. Timeout: 30 * time.Second, while keeping the existing Transport and TLSClientConfig (InsecureSkipVerify: true) intact; ensure the time package is imported if not already.integration-tests/backend/flowcollector.go-186-191 (1)
186-191:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix field filtering logic to avoid including empty slices in parameters.
Line 187's comparison of every field to
""doesn't panic, but it incorrectly treats empty/nil slices as non-empty. This causes commands to include parameters likeEBPFeatures=[], which shouldn't be added. Usefield.IsZero()instead to properly skip zero-valued fields.Proposed fix
for i := 0; i < flowCollector.NumField(); i++ { - if flowCollector.Field(i).Interface() != "" { - if flowCollector.Type().Field(i).Name != "Template" { - parameters = append(parameters, fmt.Sprintf("%s=%s", flowCollector.Type().Field(i).Name, flowCollector.Field(i).Interface())) - } - } + field := flowCollector.Field(i) + name := flowCollector.Type().Field(i).Name + if name == "Template" || field.IsZero() { + continue + } + parameters = append(parameters, fmt.Sprintf("%s=%v", name, field.Interface())) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/flowcollector.go` around lines 186 - 191, The loop building parameters incorrectly checks fields against the empty string, which treats empty or nil slices as non-empty; update the loop that iterates over flowCollector (using flowCollector.NumField(), flowCollector.Field(i) and flowCollector.Type().Field(i).Name) to skip zero-valued fields by using flowCollector.Field(i).IsZero() and continue when true, and still ensure you exclude the "Template" field by name before appending to the parameters slice so empty slices (e.g., EBPFeatures) are not added.integration-tests/backend/test_exporters.go-240-247 (1)
240-247:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSimplify grep calls by passing arguments directly instead of shell-wrapping.
The
bash -capproach is unnecessary here sincetextToExist,textToNotExist, andpodLogsare all internal, non-user-controlled values. Direct argument passing is cleaner and follows Go best practices.Proposed fix
-grepCmd := fmt.Sprintf("grep %s %s", textToExist, podLogs) -textToExistLogs, err := exec.Command("bash", "-c", grepCmd).Output() +textToExistLogs, err := exec.Command("grep", textToExist, podLogs).Output() ... -grepCmd = fmt.Sprintf("grep %s %s || true", textToNotExist, podLogs) -textToNotExistLogs, err := exec.Command("bash", "-c", grepCmd).Output() -o.Expect(err).ToNot(o.HaveOccurred()) +textToNotExistLogs, err := exec.Command("grep", textToNotExist, podLogs).Output() +if err != nil { + // grep exits 1 when no matches; that's expected for the negative case + if exitErr, ok := err.(*exec.ExitError); ok && exitErr.ExitCode() == 1 { + textToNotExistLogs = []byte{} + } else { + o.Expect(err).ToNot(o.HaveOccurred()) + } +}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integration-tests/backend/test_exporters.go` around lines 240 - 247, Replace the shell-wrapped grep invocations with direct exec.Command calls: instead of exec.Command("bash","-c", grepCmd) build the command with exec.Command("grep", textToExist, podLogs) and call Output() to get textToExistLogs; for the second case (previously "grep ... || true") use exec.Command("grep", textToNotExist, podLogs) and capture output (Output() or CombinedOutput()) but do not treat a non-zero exit error as fatal—assign textToNotExistLogs from the command output even if err != nil so the subsequent length check still works; update references to grepCmd only if still used or remove it. Ensure you use the existing symbols textToExistLogs, textToNotExistLogs and the exec.Command calls in the same function.
Description
This PR is intended to show how the e2e backend migration of tests could look like and where we can discuss more details.
Changes compare to the openshift-tests-private repo:
ginkgodirectly without wrapperBeforeSuitepossible, but we need to be carefull about not usingcompat_otp.NewCLIand similar utilities which haveg.BeforeEachcallsTestcases were migrated, there were some changes made due to the golanci linter complaining. Minor things were applied, however there were some things such as passing heavy structs without using * which I don't really see as a problem -> I have adjusted the Makefile to omit the integration-tests from linting/testing.
This PR can be tested with flexy cluster via the Network Observability Backend Tests.
Review tips
Important files to definitely check:
To check changes in the testcases checkout the latest
openshift-tests-privateon the main branch and use:git diff --no-index ~/Repos/openshift-tests-private/test/extended/netobserv/test_flowcollector.go test_flowcollector.goto check file diff or
to check whole dirs.
How to run locally
Either:
or if the ginkgo cli is installed:
Other standard flags of ginkgo such as
--dry-runor-valso work.Example output
The following is an example output of of a run:
This implementation cannot control the
[1776103444] Backend Suite - 3/7488 specsline. I don't think it will be issue in any way, as the rest of the report can make it really clear to us what is actually being run. For running it in prow it should also not be a problem if we preserve the way it works with theopenshift-test-privaterepo and use junit report.More examples:
How it could be used in CI.
Currently the
openshift-test-privateimplementation uses junit to handle the result.See lines 398-459 that junit result is generated and parsed and later is used by the openshift-e2e-test-qe-report step to fail the prow job if necessary). For the nice output in prow, it seems like the function
handle_resultin theopenshift-extended-teststep is used as it seems to be formatting and renaming the junit file with the help ofhandleresult.pypython script.Though this is probably not directly reusable for implementation as the junit in
openshift-test-privatedoes not use default ginkgo implementation, we could use the same idea and use the junit output with minor transformation in prow to report result in a nice way.Alternative approaches
We could create something a bit more custom using more low level ginkgo functionality like in commit fbb6fc36b17bc37b6eb847b8e51f42d26a9b29d8, to remove things like the
[1776103444] Backend Suite - 3/7488 specsline. However, I don't think it would be worth the tradeoff of not using the default ginkgo runner viaRunSpecs()as we would use the possibility to run tests in parallel and some other features.Summary by CodeRabbit
Chores
Tests