Skip to content

fix(setup.sh): guard SIGPIPE in site-agent pod lookup (spurious [7h] Code 141)#2676

Open
kirson-git wants to merge 2 commits into
NVIDIA:mainfrom
kirson-git:fix/setup-sigpipe-head
Open

fix(setup.sh): guard SIGPIPE in site-agent pod lookup (spurious [7h] Code 141)#2676
kirson-git wants to merge 2 commits into
NVIDIA:mainfrom
kirson-git:fix/setup-sigpipe-head

Conversation

@kirson-git

Copy link
Copy Markdown

Problem

In the [7h] site-agent verification, the pod lookup is:

_POD="$(kubectl get pods -n nico-rest -l ... -o name 2>/dev/null | head -1)"

The script runs under set -euo pipefail (line 77). head -1 closes the pipe after the first line, kubectl receives SIGPIPE → exit 141, and pipefail propagates it, so the command substitution fails and the phase aborts with SETUP FAILED ... Code: 141even though site-agent deployed fine (StatefulSet ready, Site CR HandshakeComplete). Reproduced on a clean v0.10.3 install; same code on main.

Fix

Append || true so the SIGPIPE-induced 141 is tolerated.

🤖 Generated with Claude Code

…lure)

`_POD="$(kubectl get pods ... -o name 2>/dev/null | head -1)"` runs under
`set -euo pipefail`. `head -1` closes the pipe early, kubectl gets SIGPIPE and
exits 141; pipefail propagates it, so the [7h] site-agent phase aborts with
'SETUP FAILED ... Code: 141' even though site-agent deployed successfully
(StatefulSet ready, Site CR HandshakeComplete). Add '|| true' so the pipe
result is tolerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: kirson-git <ekirson@nvidia.com>
@kirson-git kirson-git requested review from a team and shayan1995 as code owners June 17, 2026 14:51
@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c59e7f19-210a-4741-89a6-f562306d111b

📥 Commits

Reviewing files that changed from the base of the PR and between bd899c6 and ad52d81.

📒 Files selected for processing (1)
  • helm-prereqs/setup.sh

Summary by CodeRabbit

  • Bug Fixes
    • Improved setup script resilience by enhancing error handling during deployment initialization, preventing script failures when pod selection returns no results and ensuring more reliable setup processes.

Walkthrough

In helm-prereqs/setup.sh, the _POD assignment command within the NICo REST site-agent gRPC connection verification loop is amended by appending || true to the kubectl ... | head -1 pipeline, preventing a non-zero exit code from terminating the script under set -euo pipefail when no matching pods are found.

Changes

NICo REST Site-Agent gRPC Verification Loop

Layer / File(s) Summary
Guard kubectl pipeline against empty pod list
helm-prereqs/setup.sh
|| true is appended to the _POD assignment pipeline so that a zero-result kubectl get pods invocation does not propagate a non-zero exit code and abort the script under set -euo pipefail.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately identifies the core issue (SIGPIPE guard in site-agent pod lookup) and references the spurious exit code (141), directly corresponding to the primary change in the changeset.
Description check ✅ Passed The description provides a comprehensive explanation of the problem, root cause analysis, and the applied fix, all directly related to the changeset modification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@mxh-0xbb

Copy link
Copy Markdown

/ok to test ad52d81

@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 264 6 23 99 6 130
machine-validation-runner 704 34 183 258 35 194
machine_validation 704 34 183 258 35 194
nvmetal-carbide 704 34 183 258 35 194
TOTAL 2382 108 572 879 111 712

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants