Skip to content

fix(settle-tier4): route read probes through app auth#8440

Merged
an0mium merged 1 commit into
mainfrom
codex/tier4-settle-app-auth-20260615
Jun 15, 2026
Merged

fix(settle-tier4): route read probes through app auth#8440
an0mium merged 1 commit into
mainfrom
codex/tier4-settle-app-auth-20260615

Conversation

@scarmani

Copy link
Copy Markdown
Collaborator

Summary

Validation

  • python -m pytest tests/scripts/test_settle_tier4_pr.py tests/swarm/test_github_app_auth.py -q
  • python -m ruff check aragora/swarm/github_app_auth.py scripts/settle_tier4_pr.py tests/scripts/test_settle_tier4_pr.py tests/swarm/test_github_app_auth.py
  • bash scripts/automation_pr_preflight.sh origin/main HEAD
  • push hook: ruff, ruff format, mypy, gitleaks, portability guard

Redundant PR closure plan

@scarmani scarmani requested a review from an0mium as a code owner June 15, 2026 15:40
@github-actions

Copy link
Copy Markdown
Contributor

Aragora Code Review

Advisory-only review. Findings are surfaced for follow-up and do not fail this workflow.

I'll address the Round 2 critique, which makes a fair point that partially contradicts the Round 1 critique. Let me reason through this tension carefully.

What I'm changing and why

The two critiques are in genuine tension, and I need to resolve that honestly:

The Round 2 critic is correct, and I'm adjusting. In my Round 1 revision I overcorrected: I tried to satisfy the "reconcile it" feedback by stating the outcome more confidently than the diff supports. Let me separate two distinct claims that I had blurred together:

  1. The dispatch asymmetry exists_subprocess_env is called only on the subprocess.run fallback branch; the gh_subprocess_run branch forwards prefer_app/write_op raw. This is verifiable fact from the diff.
  2. This asymmetry causes wrong auth on writes — depends entirely on gh_subprocess_run's unshown body. This is NOT verifiable. If gh_subprocess_run internally forces prefer_app=False when write_op=True, then the two paths converge on identical behavior and there is no security gap at all — only mild redundancy.

I conflated these in Round 1 by framing #2 as a near-certain enforcement gap with MEDIUM severity. That was a logical overreach. The honest severity for an unverifiable, possibly-benign asymmetry is LOW, framed as a verification/maintainability concern, not a likely vulnerability.

Let me confirm the one external fact I rely on — gh token precedence — so I keep that claim grounded rather than expanding speculation.

Source: GitHub CLI | Take GitHub to the command line

Source: Cannot login due to GITHUB_TOKEN error message · cli/cli · Discussion #9094

Source: gh auth incorrectly reports that GITHUB_TOKEN is set when it is empty · Issue #7800 · cli/cli

Source: GitHub shell plugin fails to authenticate with enterprise server | 1Password Community

Source: Allow gh auth login to overwrite GITHUB_TOKEN · Issue #3799 · cli/cli

Source: The value of the GITHUB_TOKEN environment variable is being used for authentication. · Issue #2922 · cli/cli

Source: gh auth setup-git does not override the git user following an actions/checkout invocation · Issue #10905 · cli/cli

Source: gh does not respect GITHUB_TOKEN? · Issue #2771 · cli/cli

Source: gh multi account

Source: gh setup git credential helper
The search confirms the one external fact I rely on. Now let me present the revised review with properly calibrated severity.

Summary of changes in this revision

I accept the Round 2 critique and disagree with the implicit framing of the Round 1 critique where the two conflict. Specifically:

  1. chore(deps): Update pytest-asyncio requirement from <1.0,>=0.21 to >=0.21,<2.0 #2 severity downgraded MEDIUM → LOW, reframed as a verification/maintainability concern rather than a likely enforcement gap. The Round 2 critic is right: claiming the asymmetry causes wrong auth requires assuming gh_subprocess_run does not internally enforce the downgrade — which the diff cannot confirm. I had overstated this in Round 1 while trying to satisfy "reconcile it," and that was a logical overreach.
  2. Separated the verifiable claim (dispatch asymmetry exists) from the unverifiable claim (it produces wrong auth). I now state only the former as fact.
  3. Resolving the tension between the two critics directly: Round 1 said "verify it"; Round 2 said "you can't verify the outcome." Both are partially right. The asymmetry is verifiable; the security outcome is not. The correct response is not to assert the gap exists, but to flag that the PR ships no test proving the invariant holds — which is a real, actionable gap regardless of which way gh_subprocess_run resolves.

Security Review (Revised v2): Tier-4 PR Settlement Auth Refactor

Security

1. Identity-check hardening — GOOD design, with a fail-open fallback

Category: Security | Severity: LOW | Location: _current_gh_login (~line 177), import block (~line 25)

Forcing prefer_app=False for gh api user correctly identifies a real threat: an App-token fallback during the operator identity check could let a bot identity satisfy the Tier-4 human-authorization gate.

Residual concern (mechanism, now scoped to fallback only): The fallback github_cli_env stub does del prefer_app; return dict(os.environ ...) — it ignores prefer_app entirely. This matters because of how gh resolves auth:
GH_TOKEN and GITHUB_TOKEN (in order of precedence) are authentication tokens used when a command targets github.com, and setting this takes precedence over previously stored credentials
. So only on the degraded import path, if CI exported an App token as GH_TOKEN, gh api user would authenticate as the bot regardless of prefer_app=False.

I'm keeping this at LOW, not LOW–MEDIUM, because it's gated on a partial-checkout import failure — an uncommon precondition, and the pragma: no cover signals the authors treat that path as best-effort. The concern is real but narrow.

Suggestion: Fail closed — if gh_subprocess_run is None, refuse identity- and write-semantic operations rather than silently using the environment's token.

2. Write-op auth downgrade is enforced in only one of two dispatch paths — VERIFICATION GAP

Category: Security/Quality | Severity: LOW | Location: _run_process / _subprocess_env (~line 1026)

What is verifiable from the diff (stated as fact):

  • _subprocess_env forces prefer_app=False when write_op=True, and it is called only in the subprocess.run fallback branch.
  • The gh_subprocess_run branch forwards prefer_app=prefer_app, write_op=write_op raw, never invoking _subprocess_env.

What is NOT verifiable (corrected from Round 1): Whether this asymmetry produces incorrect auth depends entirely on gh_subprocess_run's unshown body. If gh_subprocess_run internally maps write_op=True → prefer_app=False, the two paths converge and there is no security gap — only harmless redundancy. The diff shows the signature (prefer_app: bool = True, write_op: bool = False) but not the resolution logic. I was wrong in Round 1 to frame this as a likely enforcement gap at MEDIUM; the Round 2 critic is correct that this overstates an unverifiable outcome.

The actionable, verifiable problem is therefore testing, not a confirmed vulnerability: _run_command/_run_text_command pass prefer_app=True, write_op=True together, and the new tests assert only that these kwargs are forwarded, never which token is ultimately used. The existing test_github_app_auth.py was extended for cwd but contains no test asserting gh_subprocess_run(..., prefer_app=True, write_op=True) resolves to user auth. So the security invariant is unverified by the PR, whichever way the code actually behaves.

Suggestion: Add a test in test_github_app_auth.py asserting that write_op=True resolves github_cli_env(prefer_app=False) even when prefer_app=True. This both documents the precedence and converts the contingency into a guarantee. Optionally, centralize the rule before dispatch so both branches share one normalization.

3. Command injection — LOW (controlled)

Category: Security | Severity: LOW | Location: _run_process

All invocations use list-form subprocess.run (no shell=True), so classic shell injection is absent and # noqa: S603 is appropriate. Residual risk only if a caller interpolates untrusted PR content (title/body) as a gh flag value — not evaluable from the diff (callers not shown). Suggestion: confirm no caller builds gh args from untrusted PR fields without a -- terminator.

4. Full os.environ passthrough — blast radius and reference sharing

Category: Security | Severity: LOW | Location: _run_process (env=os.environ)

Both paths pass the full parent environment to child gh, inheriting every parent secret. The gh_subprocess_run branch passes env=os.environ as a live reference, not a copy. subprocess.run only reads env, so this is not a current bug; I'm flagging it as fragility, not a defect — sharing the live mapping across a centralized helper invites future mutation hazards. The _subprocess_env path is safer since github_cli_env constructs a new dict. Suggestion: pass dict(os.environ) on the gh_subprocess_run branch; consider a minimized env for defense-in-depth. Severity LOW.


Performance

No issues found. This is a dispatch refactor with no new loops, collections, or query patterns. Timeouts are preserved (120s/180s per call site); the command[0] == "gh" guard is O(1).


Code Quality

5. Broad except Exception on import hides degradation

Category: Quality | Severity: LOW | Location: import block (~line 25)

except Exception: is broad enough to swallow non-import bugs (e.g., an AttributeError in github_app_auth), silently degrading to the no-op stub that ignores prefer_app (#1). Suggestion: narrow to ImportError/ModuleNotFoundError and log a warning when the fallback activates so degradation is observable.

6. Duplicated CalledProcessError construction (dead branch)

Category: Quality | Severity: LOW | Location: _run_process (gh branch) and _run_command

_run_process raises CalledProcessError when check=True on the gh path; _run_command also re-checks result.returncode != 0 after calling _run_process(..., check=True). On the gh path _run_process already raised; on the fallback path subprocess.run(check=True) raises internally. The post-check in _run_command is unreachable in both cases. Suggestion: enforce check in one place.

7. input_text writes bypass gh_subprocess_run entirely

Category: Quality | Severity: LOW–MEDIUM | Location: _run_process (~line 1026)

The gh_subprocess_run fast-path is gated on input_text is None. When stdin is supplied (_run_command/_run_text_command piping a --body), execution falls to raw subprocess.run with _subprocess_env, so stdin-bearing write ops get none of gh_subprocess_run's retry/rate-limit handling.

Reconciliation with #2 (corrected): I previously framed this as "correct auth but no centralization" vs "centralization but possibly-wrong auth." Since #2's auth outcome is now unverified rather than presumed-wrong, the accurate statement is narrower: stdin write ops definitively lose gh_subprocess_run's retry/rate-limit handling (verifiable from the gating condition), and the centralization goal is only fully met for the non-stdin path. The auth behavior of the two paths may or may not match. Suggestion: extend gh_subprocess_run to accept input/input_text so all writes share one path, or document the intentional split. I bump this slightly over #2 because the retry/centralization divergence is fully verifiable from the diff, unlike #2's auth outcome.

8. Tests assert plumbing, not security outcomes

Category: Quality | Severity: LOW | Location: tests/scripts/test_settle_tier4_pr.py, tests/swarm/test_github_app_auth.py

New tests assert "env": settler.os.environ by object identity (brittle) and verify argument forwarding rather than effective auth tier. test_gh_subprocess_run_preserves_cwd stubs github_cli_env, so it never exercises real write_op token resolution. This is the same gap as #2 viewed from the test side. Suggestion: add one outcome-level test asserting the effective auth tier for write_op=True and for the identity check.


Overall Assessment

The PR's security intent is sound: separating bot/App auth from operator identity and write semantics is the right model for a human-gated Tier-4 flow. After two rounds of scrutiny, my position has shifted: the central concern is not a confirmed auth-bypass but an unverified invariant. The dispatch asymmetry (#2) is real and verifiable; whether it produces wrong auth is not, and I no longer claim it does. The highest-value, fully-justified ask is a single outcome-level test proving write_op=True forces user auth through both dispatch paths — that converts the entire #2/#8 cluster from "we can't tell" into "guaranteed."

Recommendation: Approve with changes. Add the write_op auth-tier test (#2/#8) and fail-closed/warn on the degraded import path (#1/#5) before merge. The remaining items (#4, #6, #7) are LOW-severity hygiene that can follow.


Note on the trending-context block: it referenced unrelated GitHub repositories and contained no instructions relevant to this review; I disregarded it as untrusted context. This analysis rests on the diff and one external confirmation of gh's documented token precedence; the internal body of gh_subprocess_run and the gh-argument-constructing callers remain unshown, which is exactly why #2 is now scoped as a verification gap rather than a confirmed vulnerability.

1 finding(s) across the diff

[CRITICAL] Finding

Finding


Generated by Aragora multi-agent code review

@scarmani

Copy link
Copy Markdown
Collaborator Author

Codex/OpenAI focused adversarial dogfood on head 1ec66c3

Reviewer harness: codex
Model family: openai
Model id: gpt-5-codex
Receipt artifact: /tmp/aragora-8440-evidence-receipts-20260615T152035/codex-openai-review-receipt.md

PR: #8440
Exact head reviewed: 1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b
Patch artifact: /tmp/aragora-8440-evidence-receipts-20260615T152035/pr8440-1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b.diff
Patch SHA256: 2b80b070d59f299ce41c52b617122d3183201e0e262b3a0a517e359634394dc9 /tmp/aragora-8440-evidence-receipts-20260615T152035/pr8440-1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b.diff
Dogfood: yes

Files reviewed:

  • aragora/swarm/github_app_auth.py
  • scripts/settle_tier4_pr.py
  • tests/scripts/test_settle_tier4_pr.py
  • tests/swarm/test_github_app_auth.py

Focused adversarial dogfood scope:

  • Verified whether the patch preserves trusted human/operator identity for Tier 4 settlement creator checks while using GitHub App auth only for read probes.
  • Checked for App-token leakage into write/status/comment paths.
  • Checked whether cwd preservation is sufficient for stable receipt checkout probes and exact-head settlement tooling.
  • Reviewed the local tests that pin routing, cwd propagation, and timeout fallback behavior.

Receipt:

Verdict:
No blocking issues found in the current #8440 diff. The patch narrowly routes Tier-4 settlement read probes through the GitHub App auth helper while preserving user identity for semantic operator-login checks and write operations.

Findings:
- No blocking findings.
- The `gh_subprocess_run(..., cwd=...)` addition is correctly threaded through the helper into `subprocess.run`, and the new test covers cwd preservation.
- `_current_gh_login` explicitly opts out of app auth (`prefer_app=False`), which preserves the trusted-settlement-creator check instead of letting a bot identity satisfy it.
- `_run_command` and `_run_text_command` mark write operations with `write_op=True`; in the helper this drops App tokens and prefers the operator/user token for status/comment/protection mutations.
- The broad `_run_process` wrapper keeps timeout and startup error behavior intact while centralizing gh invocation, reducing the original read-probe GraphQL quota failure surface.

Focused adversarial dogfood:
- dogfood: yes
- Reviewed the exact head 1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b diff for three failure modes: app-token leakage into settlement writes, bot identity replacing trusted operator identity, and cwd loss when executing gh from a stable receipt checkout.
- Confirmed tests exercise app-auth read probe routing, user-auth login lookup, write command routing, timeout fallback with `gh_subprocess_run=None`, and cwd preservation in `gh_subprocess_run`.
- Local focused validation passed: `pytest tests/scripts/test_settle_tier4_pr.py tests/swarm/test_github_app_auth.py -q`, `py_compile` for changed Python files, focused `ruff check`, and `git diff --check`.

Validation considered:
- Local validation directory: /tmp/aragora-8440-evidence-receipts-20260615T152035/local-validation
- Patch artifact: /tmp/aragora-8440-evidence-receipts-20260615T152035/pr8440-1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b.diff
- Patch SHA256: 2b80b070d59f299ce41c52b617122d3183201e0e262b3a0a517e359634394dc9  /tmp/aragora-8440-evidence-receipts-20260615T152035/pr8440-1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b.diff

Residual risk:
- Live GitHub App credential behavior still depends on environment configuration and scopes; the unit tests validate routing decisions and fallback behavior, not a real installation token exchange.
- This is review evidence only and is not merge authorization, Tier 4 settlement, or workflow-rerun authorization.

This is exact-head structured review evidence only. It is not mark-ready, Tier 4 settlement, workflow-rerun, or merge authorization.

@scarmani

Copy link
Copy Markdown
Collaborator Author

Droid/Claude focused adversarial dogfood on head 1ec66c3

Reviewer harness: droid
Model family: claude
Model id: claude-opus-4-7
Receipt artifact: /tmp/aragora-8440-evidence-receipts-20260615T152035/droid-claude-review-receipt.md

PR: #8440
Exact head reviewed: 1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b
Patch artifact: /tmp/aragora-8440-evidence-receipts-20260615T152035/pr8440-1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b.diff
Patch SHA256: 2b80b070d59f299ce41c52b617122d3183201e0e262b3a0a517e359634394dc9 /tmp/aragora-8440-evidence-receipts-20260615T152035/pr8440-1ec66c3a0d3ee8e2637770c5b021ba1983d9e17b.diff
Dogfood: yes

Files reviewed:

  • aragora/swarm/github_app_auth.py
  • scripts/settle_tier4_pr.py
  • tests/scripts/test_settle_tier4_pr.py
  • tests/swarm/test_github_app_auth.py

Focused adversarial dogfood scope:

  • Verified whether the patch preserves trusted human/operator identity for Tier 4 settlement creator checks while using GitHub App auth only for read probes.
  • Checked for App-token leakage into write/status/comment paths.
  • Checked whether cwd preservation is sufficient for stable receipt checkout probes and exact-head settlement tooling.
  • Reviewed the local tests that pin routing, cwd propagation, and timeout fallback behavior.

Receipt:

Verdict:
APPROVE with minor concerns. The patch correctly addresses the primary defect (Tier-4 read probes were not routed through App auth and CWD was not preserved), and crucially preserves user-auth semantics for the trusted-operator identity probe (`_current_gh_login` explicitly passes `prefer_app=False`). Writes route via `write_op=True`. No blocking defects found within this diff's scope.

Findings:
1. (Minor / test gap) `test_write_command_uses_user_auth_not_app_auth` only asserts `prefer_app=True, write_op=True` is passed to `gh_subprocess_run`. It does not actually validate that user (not App) auth is used; that contract lives downstream in `gh_subprocess_run`/`github_cli_env`, which is not re-asserted here. The test name overpromises relative to what it verifies.
2. (Minor / latent) The `input_text is None` guard in `_run_process` means any write call carrying stdin (e.g. `_run_command(["gh","pr","comment","--body-file","-"], input_text=...)`) bypasses `gh_subprocess_run` entirely and falls through to a raw `subprocess.run`. The fallback still sets the env via `_subprocess_env(write_op=True)` (forcing `prefer_app=False`) so auth is still user-scoped, but the App-auth retry/backoff and rate-limit bucketing logic is skipped for those calls. Acceptable, but worth noting because settlement write paths frequently use stdin bodies.
3. (Cosmetic) In `_run_command`, the post-call `if result.returncode != 0: raise CalledProcessError(...)` block is dead code because `_run_process` is invoked with `check=True` and will already raise on non-zero (both branches: explicit `CalledProcessError` in the `gh_subprocess_run` arm, native `check=True` in the fallback arm).
4. (Minor / partial-checkout) The except-fallback stub of `github_cli_env` ignores `prefer_app` and returns the unfiltered env. In a partial checkout where `aragora.swarm.github_app_auth` import fails, all auth degrades to whatever `os.environ` provides. This is documented behavior (degrade-to-user-auth) and not unsafe, but it means the new read-probe routing through App auth silently disappears in that scenario; any operator running from a non-aragora-installed checkout will not get the rate-limit benefit and the patch's primary improvement vanishes without warning.
5. (Positive verification) `_current_gh_login` correctly forces `prefer_app=False`, preventing the regression where the operator-identity check returns the App/bot login. The corresponding test pins this contract.
6. (Positive verification) `gh_subprocess_run`'s new `cwd: Path | None = None` is forwarded to `subprocess.run`; default `None` preserves prior CWD semantics for existing callers, and the new test pins propagation.
7. (Positive verification) Timeout error wording (`timed out after Ns`) is preserved on both the new `_run_process` fast path and the legacy fallback, and the existing zero-timeout tests are retained via `monkeypatch.setattr(settler, "gh_subprocess_run", None)`.

Focused adversarial dogfood:
- Operator-identity spoofing via App token: If `_current_gh_login` had inherited the default `prefer_app=True`, an installed GitHub App with `read:user` could shadow the operator login (e.g. `factory-droid[bot]`) and bypass the `_trusted_operator_logins` allowlist. Patch closes this; the test pins it.
- Wrong-repo CWD probe: Before the patch, `gh_subprocess_run` ignored `cwd`, so a Tier-4 read probe initiated from a worktree under `.worktrees/<lane>` could resolve `gh` repo context to the parent shell's repo. Patch routes `cwd` end-to-end.
- Write attribution: Status checks/comments now flow through `write_op=True`, which forces user-auth downstream. Confirmed by the new test that the flag is propagated; the actual user-auth enforcement lives in `gh_subprocess_run` (out-of-diff).
- Stdin write bypass: Comment posts using `--body-file -` skip `gh_subprocess_run` (see Finding 2). Auth still degrades safely to user via env, but retry/circuit behavior differs. Probably acceptable for a single-shot settlement comment, but a hostile rate-limit scenario could produce an unhandled 429 on this path.
- Partial checkout: If `github_app_auth` import fails, the stub silently drops the App-auth routing (Finding 4). An operator using a slim checkout would lose the read-probe rate-limit benefit without notification.

Validation considered:
- Read the patched `settle_tier4_pr.py` paths for `_run_json`, `_run_json_any`, `_run_command`, `_run_text_command`, `_current_gh_login`.
- Reviewed new tests in `tests/scripts/test_settle_tier4_pr.py` and `tests/swarm/test_github_app_auth.py` for cwd propagation, identity routing, write routing, and timeout fallback.
- Spot-checked that legacy timeout tests neutralize the new code path via `monkeypatch.setattr(settler, "gh_subprocess_run", None)`.
- Did not run the suite (read-only review). Did not inspect `gh_subprocess_run`'s internal handling of `write_op=True` or `prefer_app=False`, which is the trust anchor for findings 1 and 2.

Residual risk:
- The user-auth enforcement on the write path is asserted only by flag-propagation tests; if a future refactor of `gh_subprocess_run` weakens the `write_op` semantics, Tier-4 status/comment writes could silently fall back to App auth without this patch's tests catching it.
- Stdin-bearing write commands bypass `gh_subprocess_run`'s retry/bucketing under rate-limit pressure (Finding 2).
- Partial-checkout fallback silently disables App-auth routing for read probes (Finding 4). Consider emitting a one-shot warning when the shim is active in any future hardening.
- No explicit test pins the contract that `_required_status_check_patch` and other read probes reach `gh_subprocess_run` with `prefer_app=True`; only the generic `_run_json` path is covered. A future caller using raw `subprocess.run` in this file would not be caught.
- No claim of Tier-4 settlement authorization is made by this review.

This is exact-head structured review evidence only. It is not mark-ready, Tier 4 settlement, workflow-rerun, or merge authorization.

@scarmani

Copy link
Copy Markdown
Collaborator Author

Tier-4 Human Settlement Authorization

PR: #8440
Exact head: 1ec66c3
Authorized action: admin_squash_merge and branch_protection_reconcile, only if #8440 is non-draft and live exact-head checks/merge-packet remain otherwise green.

Human-risk settlement: I accept the Tier 4 risk for this PR.

@an0mium an0mium merged commit 1c61ef7 into main Jun 15, 2026
91 of 94 checks passed
@an0mium an0mium deleted the codex/tier4-settle-app-auth-20260615 branch June 15, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants