feat(tier4): settle_tier4_pr.py — one-command --settle-apply + receipt-backed preapproval gate#8382
feat(tier4): settle_tier4_pr.py — one-command --settle-apply + receipt-backed preapproval gate#8382an0mium wants to merge 38 commits into
Conversation
2cad937 to
7d50312
Compare
Collapses the manual per-PR Tier 4 settlement dance (record-settlement → post aragora/human-settlement status → look up failed quorum run id → rerun) into a single gated command, eliminating the error-prone run-id substitution step. `--settle-apply --pr N --head <sha>`: - reuses the existing --settle-only trusted-invoker gate (two-pass evaluate_tier4_settlement_preconditions; refuses unless the live gh login is in the trusted operator allowlist), so it never merges and never elevates privilege; - writes the durable external-settlement receipt via `review-queue record-settlement` (reason templated, overridable with --reason); - posts the exact-head settlement comment/status (idempotent: skipped when aragora/human-settlement is already success at head); - reruns only the failed aragora-merge-quorum jobs, looking up the run id automatically from the head's check-runs (no-op when there is no failed run — the gate re-evaluates on its own). The tested --settle-only and --merge-apply paths are left byte-for-byte unchanged; --settle-apply is a new mutually-exclusive mode with its own main() branch. Adds 9 unit tests (url/run-id parsing, rerun targeting, happy path, idempotent skip, untrusted-invoker refusal). Part of the Tier 4 settlement ergonomics; reduces operator toil draining the ODR settlement queue. https://claude.ai/code/session_018jfJj5gb9VoLs6VBMnzhrP
7d50312 to
5a71d7b
Compare
settle_tier4_pr.py --check accepted a Tier-4 PR on the strength of the repo-visible settlement comment + aragora/human-settlement status alone, while review_queue's merge-packet correctly requires those signals to be backed by a local, operator-controlled human-risk receipt (human_preapproval_recorded). Because the settle helper's own --settle-only path posts the status WITHOUT writing that receipt, the two tools diverged: --check authorized PRs that merge-packet still blocked, and a receipt-less automated settlement could self-clear Tier-4. Tighten the lax checker to match the strict gate: --check now refuses when merge-packet reports requires_human_preapproval=true and human_preapproval_recorded=false, pointing at the existing atomic writer (review-queue record-settlement --action approve --post-github-status). This preserves the local receipt as the one signal automation cannot forge from GitHub comments/status alone, and keeps settle_tier4_pr.py --check in agreement with merge-packet. https://claude.ai/code/session_018jfJj5gb9VoLs6VBMnzhrP
Grok independent model reviewReviewer: grok (xai) - independent adversarial model review via the Aragora Grok reviewer, grounded on the exact PR head. Verdict: PASS
This is review evidence only; it is not merge authorization and not Tier 4 settlement/preapproval authorization. dogfood: yes |
Claude independent model reviewReviewer: claude (anthropic) - independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head. Verdict: PASS No blocking issues found. The change is well-structured, the new receipt-backed blocker correctly mirrors Minor observations (non-blocking):
Security/regression checks I specifically looked for and cleared:
This is review evidence only; it is not merge authorization and not Tier 4 settlement/preapproval authorization. dogfood: yes |
Aragora Code ReviewAdvisory-only review. No issues found. |
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Codex/OpenAI current-head reviewReviewer: codex (openai) - independent current-head review from Codex Desktop, grounded on the exact PR head after the merge-conflict repair. Verdict: CHANGES-REQUESTED Files reviewed:
Findings:
Validation reviewed:
This is review evidence only. It is not merge authorization and not Tier 4 settlement/preapproval authorization. dogfood: yes |
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Prefer the Actions details URL when deriving the rerunnable workflow run id for aragora-merge-quorum, and preserve explicit --repo routing for the settlement receipt helper. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Grok independent model reviewReviewer: grok (xai) — independent adversarial model review via the Aragora Grok reviewer, grounded on the exact PR head. Verdict: PASS
dogfood: yes |
Post the Tier-4 settlement comment before durable settlement recording, delegate the exact-head human-settlement status to the review-queue record-settlement path, reject --reason outside settle-apply, and paginate merge-quorum check-run lookup. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Claude independent model reviewReviewer: claude (anthropic) — independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head. Verdict: PASS No blocking issues. The new guard ( Minor nits (non-blocking):
dogfood: yes |
Update the human-preapproval receipt blocker to show the actual review-queue record-settlement positional PR syntax, and cover the remediation command text with a focused regression assertion. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Grok independent model reviewReviewer: grok (xai) — independent adversarial model review via the Aragora Grok reviewer, grounded on the exact PR head. Verdict: PASS
This is review evidence only; it is not merge authorization and not Tier 4 settlement/preapproval authorization. dogfood: yes |
Claude independent model reviewReviewer: claude (anthropic) — independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head. Verdict: PASS
This is review evidence only; it is not merge authorization and not Tier 4 settlement/preapproval authorization. dogfood: yes |
Ensure --settle-apply posts the exact-head settlement comment when only the human-settlement status already exists, and rerun the newest failed aragora-merge-quorum workflow run instead of the first failed run returned by the API. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
…2khm5j # Conflicts: # docs/METRICS.md
Raise pydantic-settings to the fixed 2.14.2 release so the locked dependency audit no longer reports GHSA-4xgf-cpjx-pc3j. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
…2khm5j # Conflicts: # uv.lock
Regenerate docs/METRICS.md against the current merge context so metrics validation stays green after resolving PR #8382 against origin/main. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
…2khm5j # Conflicts: # docs/METRICS.md
…2khm5j # Conflicts: # docs/METRICS.md # scripts/settle_tier4_pr.py
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Ensure --settle-apply does not report success until merge-packet recognizes exact-head Tier 4 preapproval, while preserving receipt-backed diagnostics for recovery. Make quorum rerun selection fall back to numeric check-run/workflow IDs when timestamps are absent. Co-authored-by: codex[bot] <197425009+codex[bot]@users.noreply.github.com>
Grok independent model reviewReviewer: grok (xai) — independent adversarial model review via direct Grok CLI harness, grounded on the exact PR head. Verdict: PASS No findings. This is review evidence only; it is not merge authorization and not Tier 4 settlement/preapproval authorization. dogfood: yes |
Handle GitHub Actions run URLs with query strings or fragments when resolving failed merge-quorum runs. Add focused regressions for query, fragment, and job URL suffix forms. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
|
Queue-drain close: freezing this Tier 4 PR instead of continuing settlement/evidence churn. Reason: exact head No branch was deleted. No Tier 4 settlement, mark-ready, CI rerun, or merge was attempted. Reopen only with explicit human preapproval and a narrower replacement plan for the settlement-tooling surface. |
|
Queue-recovery re-close. This PR was already classified as a frozen churner in the drain ledger and was reopened without an operator note rescinding that freeze decision. Re-closing to keep queue pressure durable. Preserved branch: No branch was deleted. This close preserves the remote branch/SHA recovery path; reopen only with an explicit operator note rescinding the frozen-churner classification and naming the new owner/path to green. |
Problem
Settling a Tier 4 PR currently means running a fragile multi-step dance per PR, as the active
scarmaniidentity:review-queue record-settlement --head-sha <HEAD> --action approve --reason ... --jsongh api --method POST repos/synaptent/aragora/statuses/<HEAD> -f state=success -f context=aragora/human-settlement ...aragora-merge-quorumrun id by hand from the check-runs URLgh run rerun <RUN_ID> --failedStep 3 is the error-prone one — pasting a
<RUN_ID>placeholder verbatim fails, and the per-PR repetition adds real operator toil while draining the ODR settlement queue.Change (commit 1 —
--settle-apply)Adds a
--settle-applymode toscripts/settle_tier4_pr.pythat chains all four steps in one gated command:--settle-onlytrusted-invoker gate (two-passevaluate_tier4_settlement_preconditions): refuses unless the liveghlogin is in the trusted operator allowlist. It never merges and never elevates privilege.review-queue record-settlement(reason templated, overridable with--reason).aragora/human-settlementis alreadysuccessat the head.aragora-merge-quorumjobs, looking up the run id automatically from the head's check-runs. No-op when there is no failed run.The tested
--settle-onlyand--merge-applypaths are left byte-for-byte unchanged.Change (commit 2 — receipt-backed preapproval gate)
Closes a real divergence between the two Tier-4 settlement tools.
settle_tier4_pr.py --checkaccepted a Tier-4 PR on the strength of the repo-visible settlement comment +aragora/human-settlementstatus alone, whilereview_queue's merge-packet correctly requires those signals to be backed by a local, operator-controlled human-risk receipt (human_preapproval_recorded). Because the settle helper's own--settle-onlypath posts the status without writing that receipt, the two tools diverged:--checkwould authorize a PR that merge-packet still blocks, and a receipt-less automated settlement could self-clear Tier-4.This tightens the lax checker to match the strict gate:
--checknow refuses when merge-packet reportsrequires_human_preapproval=trueandhuman_preapproval_recorded=false, and points the operator at the existing atomic writer (review-queue record-settlement --action approve --post-github-status, which writes the receipt and status together so a green status can never exist without its backing receipt). This preserves the local receipt as the one signal automation cannot forge from GitHub comments/status alone.Relationship to #8406
#8406 fixes the same divergence in the opposite direction — it loosens
review_queuemerge-packet so the repo-visible comment+status pair alone clearshuman_preapproval_recorded, dropping the local-receipt requirement. That defeats the documented intent of the receipt (review_queue.py:_has_recorded_human_risk_settlement: "must not infer approval from GitHub comments alone") and lets a receipt-less automated settlement self-clear Tier-4. This PR's commit 2 is the safer resolution (raise the lax tool to the strict requirement, not lower the strict tool to the lax one). Recommend dropping #8406'sreview_queue.pypreapproval hunk while keeping its genuinely-good REST-fallback / strict-branch-protection fail-closed parts.Validation
pytest tests/scripts/test_settle_tier4_pr.py→ 64 passed (+12 total new across both commits), 1 pre-existing environmental failure (test_untrusted_member_comment_does_not_authorizeshells out to a realghbinary not present in the sandbox; fails identically on unmodifiedmain).--checkwith the receipt blocker and yields no authorized actions; recorded receipt clears it; helper truth-table (requires/recorded, non-preapproval PR, unknown PR).ruff check+ruff format --checkclean on both files.Tier
Touches
scripts/settle_tier4_pr.py, a Tier 4 merge-authority surface — so this PR will itself require Tier 4 settlement.Generated by Claude Code