OCPEDGE-2711 : register pacemaker alert agents for fencing events to taint/untaint nodes#1625
OCPEDGE-2711 : register pacemaker alert agents for fencing events to taint/untaint nodes#1625vimauro wants to merge 5 commits into
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds ConfigureAlerts to register two pacemaker alert agents for fencing; includes helpers to check script executability, detect/create/update alerts via pcs and cibadmin, and inject optional CIB filters. TNF setup and update-setup call ConfigureAlerts after fencing.
Changes
Pacemaker Alert Agent Configuration
Layer / File(s)
Summary
Alert configuration contract and test pkg/tnf/pkg/pcs/alerts.go, pkg/tnf/pkg/pcs/alerts_test.go
Defines alertConfig and the static alertConfigs list containing two alert IDs, expected script filesystem paths, and optional <select> XML fragments. Adds TestAlertConfigs asserting exactly two entries and validating their fields.
ConfigureAlerts entrypoint and runner integration pkg/tnf/pkg/pcs/alerts.go, pkg/tnf/setup/runner.go, pkg/tnf/update-setup/runner.go
Adds ConfigureAlerts(ctx) which logs start/success and iterates configured alerts, returning on first error. RunTnfSetup and RunTnfUpdateSetup now call pcs.ConfigureAlerts(ctx) immediately after fencing and abort on error.
Alert creation and helpers pkg/tnf/pkg/pcs/alerts.go
configureAlert verifies the alert script is executable on the host, checks for existing alerts via pcs alert config --output-format json (unmarshals JSON), updates existing alerts via cibadmin --modify --xml-text or creates alerts with a fixed timeout and optionally applies a CIB <select> filter. Helper functions applyAlertSelectFilter, alertExists, pcsAlertConfigOutput, and fileExistsOnHost were added.
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~20 minutes
[!IMPORTANT]
Pre-merge checks failed
Please resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (2 errors, 1 warning)
Check name
Status
Explanation
Resolution
Container-Privileges
❌ Error
PR adds Kubernetes manifests with privileged: true, allowPrivilegeEscalation: true, and hostPID: true without justification in job.yaml manifest.
Add justification comments to bindata/tnfdeployment/job.yaml explaining why these privileged settings are required, matching the pattern in cronjob.yaml.
No-Sensitive-Data-In-Logs
❌ Error
Code logs command stdout/stderr (lines 75, 105, 120, 137 in alerts.go) which could expose sensitive operational data, command output, or system details in error messages.
Remove or sanitize logging of stdout/stderr from command executions; log only error codes or sanitized error messages without raw command output.
Docstring Coverage
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/tnf/pkg/pcs/alerts.go`:
- Around line 84-89: The fileExistsOnHost function currently treats any
exec.Execute error as "file missing"; instead, update fileExistsOnHost to
distinguish a normal non-zero exit (command ran but file absent) from real
execution failures: call exec.Execute(ctx, cmd) and if err is nil return true,
nil; if err represents a process exit/non-zero status treat that as file not
present and return false, nil; otherwise return false, err to propagate real
runtime/exec errors. Ensure you reference the fileExistsOnHost function and the
exec.Execute call when making this change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 07b04fa5-83d2-4758-9ba1-dbe9f780a6a6
📒 Files selected for processing (4)
pkg/tnf/pkg/pcs/alerts.gopkg/tnf/pkg/pcs/alerts_test.gopkg/tnf/setup/runner.gopkg/tnf/update-setup/runner.go
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/tnf/pkg/pcs/alerts.go (1)
117-122:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOnly treat
test -xexit code1as "not present".Lines 117-120 still map every
*os/exec.ExitErrortofalse, nil. Fortest -x, exit code1means "missing/not executable", but exit code2means a real command error. Collapsing both can still hide broken command construction and turn an actionable failure into a misleading retry.💡 Minimal fix
if err != nil { var exitErr *osexec.ExitError if errors.As(err, &exitErr) { - return false, nil + if exitErr.ExitCode() == 1 { + return false, nil + } } return false, fmt.Errorf("failed to check file %s on host: %w", path, err) }#!/bin/bash set -euo pipefail printf '\n== exec.Execute implementation ==\n' rg -n -C4 --type go '^func Execute\(' printf '\n== current ExitError handling ==\n' rg -n -C3 --type go 'ExitError|ExitCode\(' pkg/tnf/pkgAs per coding guidelines,
**/*.go: Go security (prodsec-skills):Never ignore error returns.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tnf/pkg/pcs/alerts.go` around lines 117 - 122, The current errors.As block in the file-checking logic erroneously treats all *os/exec.ExitError values as success (return false, nil); change the handling in the function that checks a path (uses variables path, err and the exitErr *osexec.ExitError) to inspect the exit code via exitErr.ExitCode(): if ExitCode() == 1 return false, nil (file missing/not executable), but for any other exit code return false with the wrapped error (fmt.Errorf("failed to check file %s on host: %w", path, err)) so real command failures (e.g., exit 2) are surfaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/tnf/pkg/pcs/alerts.go`:
- Around line 117-122: The current errors.As block in the file-checking logic
erroneously treats all *os/exec.ExitError values as success (return false, nil);
change the handling in the function that checks a path (uses variables path, err
and the exitErr *osexec.ExitError) to inspect the exit code via
exitErr.ExitCode(): if ExitCode() == 1 return false, nil (file missing/not
executable), but for any other exit code return false with the wrapped error
(fmt.Errorf("failed to check file %s on host: %w", path, err)) so real command
failures (e.g., exit 2) are surfaced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 982cc16f-9d35-4a3b-bd33-435735d38f94
📒 Files selected for processing (2)
pkg/tnf/pkg/pcs/alerts.gopkg/tnf/pkg/pcs/alerts_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/tnf/pkg/pcs/alerts_test.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/tnf/pkg/pcs/alerts.go (1)
110-118: ⚡ Quick winInclude command output in
alertExistsfailures.If
pcsemits unexpected text, the setup flows fail here but the returned error drops the stdout that explains what actually came back. Include both streams in the exec-failure and JSON-parse paths.💡 Proposed fix
stdOut, stdErr, err := exec.Execute(ctx, cmd) if err != nil { - return false, fmt.Errorf("pcs alert config failed: stderr=%s: %w", stdErr, err) + return false, fmt.Errorf("pcs alert config failed: stdout=%q stderr=%q: %w", stdOut, stdErr, err) } var config pcsAlertConfigOutput if err := json.Unmarshal([]byte(stdOut), &config); err != nil { - return false, fmt.Errorf("failed to parse pcs alert config JSON: %w", err) + return false, fmt.Errorf("failed to parse pcs alert config JSON: stdout=%q stderr=%q: %w", stdOut, stdErr, err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tnf/pkg/pcs/alerts.go` around lines 110 - 118, In alertExists, include both stdOut and stdErr in the returned errors: when exec.Execute(ctx, cmd) returns err, wrap it with both stdout and stderr (use stdOut and stdErr) instead of only stderr; similarly, when json.Unmarshal into pcsAlertConfigOutput fails, return an error that contains the raw stdOut and stdErr so callers see what pcs actually emitted. Update the error messages around exec.Execute and json.Unmarshal to concatenate or format stdOut/stdErr alongside the original error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/tnf/pkg/pcs/alerts.go`:
- Around line 110-118: In alertExists, include both stdOut and stdErr in the
returned errors: when exec.Execute(ctx, cmd) returns err, wrap it with both
stdout and stderr (use stdOut and stdErr) instead of only stderr; similarly,
when json.Unmarshal into pcsAlertConfigOutput fails, return an error that
contains the raw stdOut and stdErr so callers see what pcs actually emitted.
Update the error messages around exec.Execute and json.Unmarshal to concatenate
or format stdOut/stdErr alongside the original error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d6df571c-91a5-4551-9020-ba890f61b515
📒 Files selected for processing (1)
pkg/tnf/pkg/pcs/alerts.go
|
/label tide/merge-method-squash |
…y for existing alerts
|
@vimauro: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
New Features
Tests