WINC-1778: [ote] Migrate logging & monitoring tests to OTE (batch 3/6)#3921
WINC-1778: [ote] Migrate logging & monitoring tests to OTE (batch 3/6)#3921weinliu wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
@weinliu: This pull request references WINC-1778 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request extends the WMCO (Windows Machine Config Operator) test suite with additional observability and validation checks. A new batch of five test specifications was registered in the extension test harness, each invoking a corresponding check function. A new 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: weinliu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@weinliu: This pull request references WINC-1778 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
ote/test/extended/logging_monitoring.go (1)
23-35: Avoid re-fetching token and route on every metric query
queryThanosrunsoc create tokenandoc get routeeach call, which scales poorly across pods/metrics. Resolve these once per check and reuse.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ote/test/extended/logging_monitoring.go` around lines 23 - 35, queryThanos currently calls oc create token and oc get route on every invocation which is inefficient; refactor so the token and thanosHost are retrieved once and reused: add a helper like getThanosCredentials (or use a sync.Once-backed package-level cache) that calls oc.AsAdmin().Output for the token and route a single time, then change queryThanos to accept token and thanosHost as parameters (or read the cached values) and update its callers to supply them; ensure error handling remains and that the cache refresh strategy (if any) is documented where you add getThanosCredentials/queryThanos changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ote/test/extended/logging_monitoring.go`:
- Around line 60-68: thanosHasResult currently only unmarshals into Data.Result
and treats any non-result as "no metric"; update it to also unmarshal/inspect
top-level fields (e.g., status and error) from the Thanos response and return an
error when status == "error" or an error field is present. Specifically, extend
the result struct (or unmarshal into a wrapper) to include Status string and
Error (or ErrorType/ ErrorObject) fields, check those after json.Unmarshal of
response in thanosHasResult, and when an error-status is detected return a
formatted error containing the Thanos error details instead of returning false.
- Line 234: The PromQL selector passed to queryThanos is only filtering by pod
(query+`{pod="`+pod+`"}`), which can match same-named pods in other namespaces;
update the selector to include namespace by changing the call(s) to include
namespace=" + wmcoNamespace (e.g.
query+`{pod="`+pod+`",namespace="`+wmcoNamespace+`"}`) so the query is scoped to
the WMCO namespace; update both occurrences where queryThanos is invoked with
the pod-only selector (including the call around the symbol queryThanos at the
earlier occurrence and the similar call at the later occurrence).
- Around line 46-55: The current client.Do(req) response handling treats non-200
HTTP responses as normal payloads; after performing client.Do(req) and before
returning the body in the code around resp, err := client.Do(req) / defer
resp.Body.Close(), add a status check: if resp.StatusCode is not 200 (or not in
the expected 2xx range) read a short snippet of resp.Body (e.g., first ~200
bytes), and return a formatted error that includes resp.Status and the trimmed
snippet to fail fast on auth/outage errors; keep the existing error wraps for
io.ReadAll and ensure resp.Body is still closed (defer) after the nil err check.
- Around line 153-154: The test currently uses a fixed destDir (destDir :=
"/tmp/must-gather-33783") which can collide across runs; replace this by
creating a unique temp directory via os.MkdirTemp (e.g., call os.MkdirTemp("",
"must-gather-") and assign to destDir) and keep the deferred cleanup via defer
os.RemoveAll(destDir) to remove the created directory; update references that
use destDir accordingly and handle the MkdirTemp error (fail the test or
t.Fatal) before deferring RemoveAll.
- Around line 41-46: The HTTP client currently disables TLS verification, has no
timeout, and doesn't validate HTTP status codes; update the client creation to
remove InsecureSkipVerify (use default TLS config or the cluster CA), add a
sensible timeout on the http.Client (e.g., Client{Timeout: ...}), and after
client.Do(req) check resp != nil then validate resp.StatusCode is 2xx—on non-2xx
return a clear error (including status and body) and ensure resp.Body is closed
before returning; reference the Transport/TLSClientConfig/InsecureSkipVerify,
http.Client{Timeout: ...}, and the client.Do(req)/resp handling code paths to
locate and fix the logic.
---
Nitpick comments:
In `@ote/test/extended/logging_monitoring.go`:
- Around line 23-35: queryThanos currently calls oc create token and oc get
route on every invocation which is inefficient; refactor so the token and
thanosHost are retrieved once and reused: add a helper like getThanosCredentials
(or use a sync.Once-backed package-level cache) that calls oc.AsAdmin().Output
for the token and route a single time, then change queryThanos to accept token
and thanosHost as parameters (or read the cached values) and update its callers
to supply them; ensure error handling remains and that the cache refresh
strategy (if any) is documented where you add getThanosCredentials/queryThanos
changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 04d2ffb6-21e0-43ba-8be2-0b21061beaba
📒 Files selected for processing (2)
ote/cmd/wmco-tests-ext/main.goote/test/extended/logging_monitoring.go
| client := &http.Client{ | ||
| Transport: &http.Transport{ | ||
| TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, //nolint:gosec | ||
| }, | ||
| } | ||
| resp, err := client.Do(req) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the full context of the queryThanos function and surrounding code
cat -n ote/test/extended/logging_monitoring.go | head -80Repository: openshift/windows-machine-config-operator
Length of output: 3080
🏁 Script executed:
# Check if there are any environment-specific comments or configurations explaining the insecure client
rg -A 5 -B 5 "InsecureSkipVerify" ote/test/extended/logging_monitoring.goRepository: openshift/windows-machine-config-operator
Length of output: 442
🏁 Script executed:
# Check if CA certificates or kubelet configuration is available in the test environment
rg -i "ca.*cert|kubelet|tls.*config" ote/test/extended/ | head -20Repository: openshift/windows-machine-config-operator
Length of output: 940
🏁 Script executed:
# Look for any timeout or context cancellation mechanisms in queryThanos or its callers
rg -A 10 "func queryThanos" ote/test/extended/logging_monitoring.goRepository: openshift/windows-machine-config-operator
Length of output: 555
🏁 Script executed:
# Check if this test runs against real/self-signed certs or mocked endpoints
rg -i "thanos|prometheus" ote/test/extended/logging_monitoring.go | grep -E "(url|endpoint|host|cert)" | head -10Repository: openshift/windows-machine-config-operator
Length of output: 179
🏁 Script executed:
# Check if CA certificates are typically available in OpenShift clusters
# Look for any documentation or other test patterns handling TLS
rg -i "kubernetes.io.*ca|serviceaccount.*ca" ote/test/ | head -10Repository: openshift/windows-machine-config-operator
Length of output: 67
🏁 Script executed:
# Check if there are other HTTP clients in the test code with proper TLS handling
rg -B 3 -A 3 "http.Client\{" ote/test/Repository: openshift/windows-machine-config-operator
Length of output: 617
🏁 Script executed:
# Check if there are timeout patterns used elsewhere in test code
rg "time.Second|Timeout:" ote/test/ | head -20Repository: openshift/windows-machine-config-operator
Length of output: 67
🏁 Script executed:
# Look at the actual Thanos API usage - does the code check status codes?
cat -n ote/test/extended/logging_monitoring.go | sed -n '46,56p'Repository: openshift/windows-machine-config-operator
Length of output: 442
🏁 Script executed:
# Check if there's context passed to queryThanos that could provide timeout info
rg "queryThanos\(" ote/test/extended/logging_monitoring.go | head -5Repository: openshift/windows-machine-config-operator
Length of output: 260
Secure Thanos client with proper TLS verification, timeout, and error handling
The HTTP client is missing three critical safeguards:
- Line 43:
InsecureSkipVerify: truedisables certificate validation, exposing the bearer token to MITM attacks. OpenShift clusters provide proper CA certificates in-cluster; use them. - Lines 41–45: No timeout configured. The client will hang indefinitely if Thanos is slow or unresponsive, causing test flakiness.
- Line 46–55: No HTTP status code validation. Errors like 401 (auth failures) or 500 (server errors) are silently treated as valid responses and passed to the JSON parser.
Suggested fix
import (
"context"
"crypto/tls"
"encoding/json"
"fmt"
"io"
"net/http"
"net/url"
"os"
"strconv"
"strings"
+ "time"
@@
client := &http.Client{
+ Timeout: 30 * time.Second,
Transport: &http.Transport{
- TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, //nolint:gosec
+ TLSClientConfig: &tls.Config{},
},
}
resp, err := client.Do(req)
if err != nil {
return "", fmt.Errorf("thanos query failed: %w", err)
}
+ if resp.StatusCode != http.StatusOK {
+ return "", fmt.Errorf("thanos returned %d: %s", resp.StatusCode, http.StatusText(resp.StatusCode))
+ }
defer resp.Body.Close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| client := &http.Client{ | |
| Transport: &http.Transport{ | |
| TLSClientConfig: &tls.Config{InsecureSkipVerify: true}, //nolint:gosec | |
| }, | |
| } | |
| resp, err := client.Do(req) | |
| import ( | |
| "context" | |
| "crypto/tls" | |
| "encoding/json" | |
| "fmt" | |
| "io" | |
| "net/http" | |
| "net/url" | |
| "os" | |
| "strconv" | |
| "strings" | |
| "time" | |
| ) | |
| client := &http.Client{ | |
| Timeout: 30 * time.Second, | |
| Transport: &http.Transport{ | |
| TLSClientConfig: &tls.Config{}, | |
| }, | |
| } | |
| resp, err := client.Do(req) | |
| if err != nil { | |
| return "", fmt.Errorf("thanos query failed: %w", err) | |
| } | |
| if resp.StatusCode != http.StatusOK { | |
| return "", fmt.Errorf("thanos returned %d: %s", resp.StatusCode, http.StatusText(resp.StatusCode)) | |
| } | |
| defer resp.Body.Close() |
🧰 Tools
🪛 ast-grep (0.42.0)
[warning] 42-42: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{InsecureSkipVerify: true}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures
(missing-ssl-minversion-go)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ote/test/extended/logging_monitoring.go` around lines 41 - 46, The HTTP
client currently disables TLS verification, has no timeout, and doesn't validate
HTTP status codes; update the client creation to remove InsecureSkipVerify (use
default TLS config or the cluster CA), add a sensible timeout on the http.Client
(e.g., Client{Timeout: ...}), and after client.Do(req) check resp != nil then
validate resp.StatusCode is 2xx—on non-2xx return a clear error (including
status and body) and ensure resp.Body is closed before returning; reference the
Transport/TLSClientConfig/InsecureSkipVerify, http.Client{Timeout: ...}, and the
client.Do(req)/resp handling code paths to locate and fix the logic.
| resp, err := client.Do(req) | ||
| if err != nil { | ||
| return "", fmt.Errorf("thanos query failed: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
| body, err := io.ReadAll(resp.Body) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to read thanos response: %w", err) | ||
| } | ||
| return string(body), nil |
There was a problem hiding this comment.
Validate HTTP status codes from Thanos
On Line 46, non-200 responses are currently treated like normal payloads, which can misclassify auth/outage failures as “no metric”. Fail fast on unexpected status with a short body snippet.
Suggested fix
resp, err := client.Do(req)
if err != nil {
return "", fmt.Errorf("thanos query failed: %w", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return "", fmt.Errorf("failed to read thanos response: %w", err)
}
+ if resp.StatusCode != http.StatusOK {
+ snippet := string(body)
+ if len(snippet) > 200 {
+ snippet = snippet[:200]
+ }
+ return "", fmt.Errorf("thanos query returned HTTP %d: %s", resp.StatusCode, snippet)
+ }
return string(body), nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| resp, err := client.Do(req) | |
| if err != nil { | |
| return "", fmt.Errorf("thanos query failed: %w", err) | |
| } | |
| defer resp.Body.Close() | |
| body, err := io.ReadAll(resp.Body) | |
| if err != nil { | |
| return "", fmt.Errorf("failed to read thanos response: %w", err) | |
| } | |
| return string(body), nil | |
| resp, err := client.Do(req) | |
| if err != nil { | |
| return "", fmt.Errorf("thanos query failed: %w", err) | |
| } | |
| defer resp.Body.Close() | |
| body, err := io.ReadAll(resp.Body) | |
| if err != nil { | |
| return "", fmt.Errorf("failed to read thanos response: %w", err) | |
| } | |
| if resp.StatusCode != http.StatusOK { | |
| snippet := string(body) | |
| if len(snippet) > 200 { | |
| snippet = snippet[:200] | |
| } | |
| return "", fmt.Errorf("thanos query returned HTTP %d: %s", resp.StatusCode, snippet) | |
| } | |
| return string(body), nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ote/test/extended/logging_monitoring.go` around lines 46 - 55, The current
client.Do(req) response handling treats non-200 HTTP responses as normal
payloads; after performing client.Do(req) and before returning the body in the
code around resp, err := client.Do(req) / defer resp.Body.Close(), add a status
check: if resp.StatusCode is not 200 (or not in the expected 2xx range) read a
short snippet of resp.Body (e.g., first ~200 bytes), and return a formatted
error that includes resp.Status and the trimmed snippet to fail fast on
auth/outage errors; keep the existing error wraps for io.ReadAll and ensure
resp.Body is still closed (defer) after the nil err check.
| var result struct { | ||
| Data struct { | ||
| Result []interface{} `json:"result"` | ||
| } `json:"data"` | ||
| } | ||
| if err := json.Unmarshal([]byte(response), &result); err != nil { | ||
| return false, fmt.Errorf("failed to parse thanos response: %w", err) | ||
| } | ||
| return len(result.Data.Result) > 0, nil |
There was a problem hiding this comment.
Handle Thanos API error payloads explicitly
thanosHasResult only checks data.result. If Thanos returns {"status":"error",...}, this is reported as missing metric instead of query failure. Parse and validate top-level status/error fields.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ote/test/extended/logging_monitoring.go` around lines 60 - 68,
thanosHasResult currently only unmarshals into Data.Result and treats any
non-result as "no metric"; update it to also unmarshal/inspect top-level fields
(e.g., status and error) from the Thanos response and return an error when
status == "error" or an error field is present. Specifically, extend the result
struct (or unmarshal into a wrapper) to include Status string and Error (or
ErrorType/ ErrorObject) fields, check those after json.Unmarshal of response in
thanosHasResult, and when an error-status is detected return a formatted error
containing the Thanos error details instead of returning false.
| destDir := "/tmp/must-gather-33783" | ||
| defer os.RemoveAll(destDir) |
There was a problem hiding this comment.
Use a unique must-gather destination directory
On Line 153, fixed path /tmp/must-gather-33783 can collide across concurrent runs and stale leftovers. Use os.MkdirTemp and defer cleanup of the created dir.
Suggested fix
- destDir := "/tmp/must-gather-33783"
- defer os.RemoveAll(destDir)
+ destDir, err := os.MkdirTemp("", "must-gather-33783-*")
+ if err != nil {
+ return fmt.Errorf("failed to create temp must-gather directory: %w", err)
+ }
+ defer os.RemoveAll(destDir)
- msg, err := oc.AsAdmin().Output("adm", "must-gather", "--dest-dir="+destDir)
+ msg, err := oc.AsAdmin().Output("adm", "must-gather", "--dest-dir="+destDir)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| destDir := "/tmp/must-gather-33783" | |
| defer os.RemoveAll(destDir) | |
| destDir, err := os.MkdirTemp("", "must-gather-33783-*") | |
| if err != nil { | |
| return fmt.Errorf("failed to create temp must-gather directory: %w", err) | |
| } | |
| defer os.RemoveAll(destDir) | |
| msg, err := oc.AsAdmin().Output("adm", "must-gather", "--dest-dir="+destDir) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ote/test/extended/logging_monitoring.go` around lines 153 - 154, The test
currently uses a fixed destDir (destDir := "/tmp/must-gather-33783") which can
collide across runs; replace this by creating a unique temp directory via
os.MkdirTemp (e.g., call os.MkdirTemp("", "must-gather-") and assign to destDir)
and keep the deferred cleanup via defer os.RemoveAll(destDir) to remove the
created directory; update references that use destDir accordingly and handle the
MkdirTemp error (fail the test or t.Fatal) before deferring RemoveAll.
| storage, err := oc.AsAdmin().Output("get", "node", host, | ||
| "-o=jsonpath={.status.allocatable['ephemeral-storage']}") | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get ephemeral storage for node %s: %w", host, err) | ||
| } | ||
| val, err := strconv.ParseInt(strings.TrimSuffix(storage, "Ki"), 10, 64) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to parse storage value %q for node %s: %w", storage, host, err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the file
fd -t f "logging_monitoring.go" | head -5Repository: openshift/windows-machine-config-operator
Length of output: 124
🏁 Script executed:
# Check the file size before reading
wc -l ote/test/extended/logging_monitoring.goRepository: openshift/windows-machine-config-operator
Length of output: 128
🏁 Script executed:
# Read the specific lines in question
sed -n '190,210p' ote/test/extended/logging_monitoring.goRepository: openshift/windows-machine-config-operator
Length of output: 735
🏁 Script executed:
# Check current imports in the file
head -30 ote/test/extended/logging_monitoring.goRepository: openshift/windows-machine-config-operator
Length of output: 918
🏁 Script executed:
# Search for other uses of resource.ParseQuantity in the codebase to see if this pattern is used elsewhere
rg "resource\.ParseQuantity" --type goRepository: openshift/windows-machine-config-operator
Length of output: 332
🏁 Script executed:
# Check if k8s.io/apimachinery is already a dependency in the project
fd -t f "go.mod" | xargs head -50Repository: openshift/windows-machine-config-operator
Length of output: 2834
🏁 Script executed:
# Check what dependencies ote tests typically need
grep -r "import" ote/test/extended/*.go | head -20Repository: openshift/windows-machine-config-operator
Length of output: 171
🏁 Script executed:
# Look for any other quantity/resource parsing in ote module
rg -t go "ParseQuantity|ParseInt|resource\." ote/Repository: openshift/windows-machine-config-operator
Length of output: 198
🏁 Script executed:
# Check if ote module would be able to add k8s.io/apimachinery as a dependency by checking its usage elsewhere
grep -r "k8s.io" ote/ --type goRepository: openshift/windows-machine-config-operator
Length of output: 202
🏁 Script executed:
# Verify the exact Kubernetes quantity format that gets returned from the jsonpath query
# by looking at how ephemeral-storage is documented in similar code
rg -t go "ephemeral-storage" -B 3 -A 5Repository: openshift/windows-machine-config-operator
Length of output: 8946
Use resource.ParseQuantity to handle all Kubernetes quantity formats, not just Ki
The current code breaks for quantities returned in formats other than Ki (e.g., Gi, G, Mi, M). The jsonpath query for ephemeral-storage returns a Kubernetes quantity, which can be expressed with binary (Ki, Mi, Gi, Ti, Pi, Ei) or decimal (K, M, G, T, P, E) suffixes. Replace the fragile strings.TrimSuffix(storage, "Ki") approach with resource.ParseQuantity().
This requires adding k8s.io/apimachinery to ote/go.mod (the root module already has it at v0.35.2, and the pattern is already used in pkg/services/services.go). The fix also updates the sign check from numeric to use Quantity.Sign():
Suggested fix
import (
@@
"strconv"
"strings"
"github.com/openshift/windows-machine-config-operator/ote/test/extended/cli"
+ "k8s.io/apimachinery/pkg/api/resource"
)
@@
- val, err := strconv.ParseInt(strings.TrimSuffix(storage, "Ki"), 10, 64)
+ q, err := resource.ParseQuantity(storage)
if err != nil {
return fmt.Errorf("failed to parse storage value %q for node %s: %w", storage, host, err)
}
- if val <= 0 {
- return fmt.Errorf("expected positive storage value for node %s, got %d", host, val)
+ if q.Sign() <= 0 {
+ return fmt.Errorf("expected positive storage value for node %s, got %q", host, storage)
}|
|
||
| for _, pod := range pods { | ||
| for metricName, query := range metrics { | ||
| resp, err := queryThanos(oc, query+`{pod="`+pod+`"}`) |
There was a problem hiding this comment.
Scope PromQL to WMCO namespace to avoid false positives
Both queries filter only by pod. A same-named pod in another namespace can satisfy the check. Add namespace="<wmcoNamespace>" to the selector.
Suggested fix
- resp, err := queryThanos(oc, query+`{pod="`+pod+`"}`)
+ resp, err := queryThanos(oc, query+`{pod="`+pod+`",namespace="`+wmcoNamespace+`"}`)
@@
- resp, err := queryThanos(oc, query+`{pod="`+pod+`"}`)
+ resp, err := queryThanos(oc, query+`{pod="`+pod+`",namespace="`+wmcoNamespace+`"}`)Also applies to: 272-272
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ote/test/extended/logging_monitoring.go` at line 234, The PromQL selector
passed to queryThanos is only filtering by pod (query+`{pod="`+pod+`"}`), which
can match same-named pods in other namespaces; update the selector to include
namespace by changing the call(s) to include namespace=" + wmcoNamespace (e.g.
query+`{pod="`+pod+`",namespace="`+wmcoNamespace+`"}`) so the query is scoped to
the WMCO namespace; update both occurrences where queryThanos is invoked with
the pod-only selector (including the call around the symbol queryThanos at the
earlier occurrence and the similar call at the later occurrence).
4956075 to
a7e9d6b
Compare
|
@weinliu: This pull request references WINC-1778 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest security |
|
@weinliu: The Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@weinliu: This pull request references WINC-1778 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Migrate the following logging & monitoring smokerun tests from openshift-tests-private to the OTE extension binary: - OCP-33779: Retrieving Windows node logs (oc adm node-logs) - OCP-33783: Enable must gather on Windows node [Slow][Disruptive] - OCP-73595: Verify Display of Filesystem Graphs for Windows Nodes [Serial] - OCP-70922: Monitor CPU, Memory, and Filesystem graphs for Windows Pods - OCP-73752: Monitor Network In, and Network Out graphs for Windows Pods OCP-70922 and OCP-73752 query Thanos directly via the thanos-querier route using a prometheus-k8s service account token, avoiding the compat_otp PrometheusMonitor dependency.
a7e9d6b to
23749c5
Compare
|
@weinliu: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Closing, please open a separate request if this work is still required. |
Summary
Migrates the Logging & Monitoring batch of WINC tests as part of WINC-1778.
Batch 3 — Logging & Monitoring (3 tests):
oc adm node-logs)[Slow][Disruptive][Serial]Planned migration batches (WINC-1778):
Test Results (OCP 4.21 nightly, IPI on AWS)
Full test output
OCP-33779
[{"name": "[sig-windows] Windows_Containers Author:sgao-Smokerun-Critical-33779-[wmco] Retrieving Windows node logs [OTP]", "duration": 2631, "startTime": "2026-04-02 22:57:23.184498 UTC", "endTime": "2026-04-02 22:57:25.816161 UTC", "result": "passed", "output": ""}]OCP-33783
[{"name": "[sig-windows] Windows_Containers Author:sgao-Smokerun-Critical-33783-[wmco] Enable must gather on Windows node [Slow][Disruptive] [OTP]", "duration": 366717, "startTime": "2026-04-02 22:59:45.240881 UTC", "endTime": "2026-04-02 23:05:51.958555 UTC", "result": "passed", "output": ""}]OCP-73595
[{"name": "[sig-windows] Windows_Containers Author:weinliu-Smokerun-Medium-73595-[wmco] Verify Display of Filesystem Graphs (metrics) for Windows Nodes [Serial] [OTP]", "duration": 424, "startTime": "2026-04-02 22:57:31.315903 UTC", "endTime": "2026-04-02 22:57:31.740254 UTC", "result": "passed", "output": ""}]Cluster: winc-regression 6211 (OCP 4.21.0-0.nightly-2026-04-02, IPI on AWS)