fix(settlement): route Tier-4 read probes through app auth#8405
fix(settlement): route Tier-4 read probes through app auth#8405an0mium wants to merge 6 commits into
Conversation
Route Tier-4 settlement helper read-only GitHub probes through the existing GitHub App auth path while preserving user auth for identity checks and write operations. Add a masked token-capacity doctor for diagnosing repeated GraphQL exhaustion across operator and App identities. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Fix the Tier-4 helper text-command stdin path and make the GitHub token doctor report rate-limit probe timeouts as unavailable sources instead of crashing. Add focused regressions for both review findings. Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Grok independent model reviewReviewer: grok (xai) — independent adversarial model review via the Aragora Grok reviewer, grounded on the exact PR head. Verdict: PASS
dogfood: yes |
OpenAI independent model reviewReviewer: openai (openai) — independent adversarial model review via Codex CLI OpenAI harness, grounded on the exact PR head. Verdict: PASS
dogfood: yes |
Aragora Code ReviewAdvisory-only review. Findings are surfaced for follow-up and do not fail this workflow. Security Review: GitHub Token Doctor PR (Round 2 — Skeptical Self-Review)What this round doesThe Round 2 critic largely endorsed my revised position (S3 MEDIUM with the minter-exception path as load-bearing, Q1, and Q3-as-needs-verification). When a critic agrees with you, the skeptical move is not to take a victory lap — it's to turn the skepticism inward and stress-test the claims I'm most confident about. I did exactly that, and I found I had over-claimed in two places. I'm correcting them below, and I'm also surfacing edge cases that both prior rounds missed entirely. Where I now think I was wrong (self-correction)Correction 1 — I overstated that the rollout makes leakage "actively materializing." This is weaker than I claimed.My Round 1 language ("the triggering conditions are now actively materializing rather than hypothetical") quietly smuggled in an unproven causal chain. Let me lay out the chain honestly and mark each link:
The CVE establishes that link 2 has happened in real GitHub tooling (Composer), which is genuinely useful — it proves the bug class is not imaginary. But it does not establish that this minter does any of links 1–3. My Round 1 phrasing blurred "this failure mode is real for GitHub tokens generally" into "this script is at active risk," and those are different claims. The honest statement is: the bug class is demonstrated; this instance is unverified. That is precisely the definition of a defensive-hardening finding — which is what the very first critique called it, and on reflection that framing was more accurate than I credited. Net effect: I'm keeping S3 actionable but I am lowering its severity to LOW–MEDIUM, and I'll explain why I don't drop it all the way to LOW below. Correction 2 — I missed that
|
| ID | Category | Severity | Confidence | Action |
|---|---|---|---|---|
| S3 | Security | LOW–MEDIUM (↓ from MEDIUM) | Certain: error text printed unfiltered. Speculative: this minter leaks a token | Use type(exc).__name__ on the minter path (free, robust); redaction optional + tested for false positives |
| N1 | Quality | LOW | High (visible in diff) | Pass source into _rate_limit_for_token; removes empty-source footgun and Q1 |
| Q3 | Quality | LOW | Needs verification | Raising stub over None; narrow to ImportError |
| Q1 | Quality | LOW | High | dataclasses.replace (or fold into N1) |
| S2 | Security | LOW | High | Clarify ineffective finally: token="" |
| P1 | Perf | LOW | High | Optional parallelize |
What changed and why
- Downgraded S3 to LOW–MEDIUM and reframed it. On skeptical re-reading, my Round 1 "actively materializing" language overstated an unproven causal chain. I separated the certain property (all
errortext prints unfiltered) from the speculative one (this minter leaks a token). The first critique's original framing — "defensive hardening, not a likely vulnerability" — was closer to correct than I conceded, and I should say so plainly. - Replaced redaction-first with not-interpolating-first.
error=f"mint failed: {type(exc).__name__}"is a zero-cost fix with no false-positive surface, and it avoids the exact "redactor won't keep pace with new token formats" trap I myself raised in S2. This is the more defensible recommendation. - Added N1, a real, diff-visible fragility (empty-
sourcesentinel) that both prior rounds missed while arguing about token formats. - Recorded N2 as a checked non-issue for transparency.
Where I disagree / hold firm
- I disagree with dropping S3 to pure LOW. The certain property (no enforcement boundary on a script whose entire reason to exist is non-leakage) plus a free fix justifies keeping it actionable above LOW. But I concede MEDIUM was too high without instance-level evidence.
- A note on the untrusted trending context (the "ponytail/laziest senior dev" item): it's irrelevant to this review and I've disregarded it as tainted data per instructions — flagging only to confirm I didn't let "the best code is code you never write" rhetoric talk me out of the N1 restructuring, which is justified on its own merits.
Scope and limitations
I still cannot see aragora.swarm.github_app_auth, get_github_app_installation_token()'s exception behavior, or the gh_subprocess_run call sites. The CVE citation establishes the bug class for GitHub tokens; it does not prove this specific minter leaks — which is exactly why I now recommend the cheapest leak-proof fix (type(exc).__name__) rather than a higher-confidence severity rating. Code-structure findings (N1, Q1–Q5, S2, P1, N2) are
1 finding(s) across the diff
[CRITICAL] Finding
Finding
Generated by Aragora multi-agent code review
1 similar comment
…-routing-20260614 # Conflicts: # scripts/settle_tier4_pr.py
…-routing-20260614
Codex/OpenAI focused dogfood evidenceReviewer: Codex/OpenAI independent lane Blocking findings: none found. Focused adversarial dogfood verdict: PASS for the GitHub App read-routing and token-doctor change under GitHub GraphQL exhaustion. The diff keeps Tier 4 operator identity checks on user auth via prefer_app=false, routes read-only helper probes through the app-auth path with cwd preservation, and leaves write paths using write_op=true so the existing non-App write identity boundary is preserved. The token doctor reports capacity by source without printing token material, and the tests cover app read routing, user-auth identity/write paths, cwd forwarding, and masked quota reporting. Validation observed: PR body reports focused pytest/preflight validation; live required non-quorum checks are green; merge-packet and failed aragora-merge-quorum job both report the current blocker as missing model quorum/focused evidence for this exact head. I did not rerun CI or local tests in this evidence pass. dogfood: yes This is not merge authorization. |
Claude independent model reviewReviewer: claude (anthropic) — independent adversarial model review via the Aragora Claude reviewer, grounded on the exact PR head. Verdict: PASS No blocking issues. A few minor observations:
Identity-preservation logic is correct: dogfood: yes |
Grok independent model reviewReviewer: grok (xai) — independent adversarial model review via the Aragora Grok reviewer, grounded on the exact PR head. Verdict: PASS
dogfood: yes |
…-routing-20260614 # Conflicts: # scripts/settle_tier4_pr.py
Codex/OpenAI focused dogfood evidenceReviewer: Codex/OpenAI independent lane Blocking findings: none found. Focused adversarial dogfood verdict: PASS for the GitHub App read routing, token-doctor, and Tier 4 settlement helper behavior under GitHub GraphQL exhaustion after the origin/main restack. The diff preserves the operator/user identity boundary for Tier 4 settlement writes and identity checks while allowing read-only GitHub probes to use App-backed quota; Validation observed: PR body reports targeted pytest, pre-commit, automation preflight, and pre-push validation; live required checks show all non-quorum required checks passing, with only dogfood: yes This is not merge authorization. |
Summary
scripts/settle_tier4_pr.pyread-only GitHub probes through the existing GitHub App auth path so--checkdoes not burn the active operator PAT GraphQL bucket.gh api userand all settlement/merge/protection writes force user auth instead of App auth.scripts/github_token_doctor.pyto report masked core/GraphQL capacity foran0mium,scarmani, and the GitHub App token without printing token values.Live proof
Before this change,
settle_tier4_pr.py --checkfor PR #8392 failed when the activean0miumGraphQL bucket was exhausted. With the helper using App-backed reads, the same command reaches settlement diagnostics instead of transport failure.Current token doctor showed separate buckets:
an0mium: GraphQL capacity was low/degraded during this run.scarmani: GraphQL capacity available.corelimit 15000,graphqllimit 5000).Validation
python3 -m pytest tests/scripts/test_settle_tier4_pr.py tests/swarm/test_github_app_auth.py tests/scripts/test_github_token_doctor.py-> 70 passedpre-commit run --files scripts/settle_tier4_pr.py aragora/swarm/github_app_auth.py scripts/github_token_doctor.py tests/scripts/test_settle_tier4_pr.py tests/swarm/test_github_app_auth.py tests/scripts/test_github_token_doctor.py-> passedbash scripts/automation_pr_preflight.sh origin/main HEAD-> passedGovernance
No settlement artifacts posted, no PR marked ready, no merge attempted, no branch protection/workflow changes.