Skip to content

fix(review-queue): block p0 p1 evidence dissent#8467

Merged
scarmani merged 2 commits into
mainfrom
codex/review-queue-p1-dissent
Jun 16, 2026
Merged

fix(review-queue): block p0 p1 evidence dissent#8467
scarmani merged 2 commits into
mainfrom
codex/review-queue-p1-dissent

Conversation

@scarmani

Copy link
Copy Markdown
Collaborator

Summary

  • Treat standalone [P0] and [P1] findings in structured review/evidence comments as blocking/negative verdicts.
  • Prevent those comments from counting as supportive reviewer signals or focused dogfood evidence.
  • Add evidence-lint and merge-quorum regressions while preserving positive PASS evidence parsing.

Validation

  • /Users/armand/.pyenv/versions/3.11.11/bin/python -m pytest tests/cli/commands/test_review_queue.py -k "quorum or evidence or reviewer or dogfood or dissent or blocking"
  • ruff check aragora/cli/commands/review_queue_comment_verdicts.py tests/cli/commands/test_review_queue.py
  • git diff --check
  • bash scripts/automation_pr_preflight.sh origin/main HEAD
  • review-queue evidence-lint probes: [P0] and [P1] now would_count=false; positive Verdict: PASS remains would_count=true

Scope

No #8461 comments, ready-state changes, settlements, workflow reruns, or merges were performed.

@scarmani

Copy link
Copy Markdown
Collaborator Author

Codex independent semantic review on head add383e

Reviewer harness: codex
Model family: openai
Model id: GPT-5 Codex
Receipt artifact: /private/tmp/aragora-8467-evidence-20260616T034501Z/codex-openai-review.md

Verdict: PASS

Blocking findings: none.

I reviewed only the #8467 exact-head diff against current origin/main. The change is narrowly scoped to the review-queue comment verdict helper plus focused review-queue tests. The implementation reuses the existing decoration stripper before checking high-priority finding markers, so decorated priority findings are rejected consistently with the parser's existing handling for headings, blockquotes, bullets, and numbered lists.

Validation performed:

  • pytest tests/cli/commands/test_review_queue.py -k 'quorum or evidence or reviewer or dogfood or dissent or blocking': 139 passed, 137 deselected.
  • ruff check aragora/cli/commands/review_queue_comment_verdicts.py tests/cli/commands/test_review_queue.py: passed.
  • git diff --check origin/main...HEAD: passed.

Focused adversarial dogfood: I exercised the exact dissent-laundering surface with plain, bullet, numbered, blockquote, and heading high-priority marker forms plus an explicit changes-requested verdict. Every dissent probe returned would_count=false, and a clean positive PASS probe still returned would_count=true.

@scarmani

Copy link
Copy Markdown
Collaborator Author

Claude independent semantic review on head add383e

Reviewer harness: claude
Model family: claude
Model id: Claude Code 2.1.178
Receipt artifact: /private/tmp/aragora-8467-evidence-20260616T034501Z/claude-review.md

Verdict: PASS

Blocking findings: none.

Claude Code reviewed only the #8467 exact-head diff against current origin/main and found the implementation minimal, scoped, and correctly routed through the existing decoration-stripping helper before high-priority marker detection. The review confirmed that plain, bullet, numbered, blockquote, heading, and bold marker forms route to blocking/non-counting evidence, while positive evidence without those dissent markers remains countable.

Focused adversarial dogfood: the independent review checked the P0/P1 and changes-requested dissent paths, quorum-level non-counting behavior for dissenting comments, and positive-evidence preservation for clean PASS comments.

@scarmani scarmani marked this pull request as ready for review June 16, 2026 04:07
@scarmani scarmani requested a review from an0mium as a code owner June 16, 2026 04:07
@github-actions

Copy link
Copy Markdown
Contributor

Aragora Code Review

Advisory-only review. No issues found.

@scarmani scarmani merged commit 1c291f7 into main Jun 16, 2026
157 of 161 checks passed
@scarmani scarmani deleted the codex/review-queue-p1-dissent branch June 16, 2026 04:20
scarmani added a commit that referenced this pull request Jun 16, 2026
Forces off_limits=LEAVE on any PR touching merge-authority surfaces
(review_queue*, swarm/quorum_evidence, review/evidence, settle_*/settlement,
.github/workflows/aragora-merge-quorum*) regardless of tier/quorum, so the drain
can never self-modify the gate. Surfaced after the first pass merged an
evidence-parser change (#8467). +2 tests; file-paths fetched for classification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scarmani added a commit that referenced this pull request Jun 16, 2026
…n default) (#8471)

* feat(swarm): scripts/boss_drain_pass.py — runnable bounded drain pass (dry-run default)

Wires the tested drain core (boss_drain/drain_pass/drain_policy) to real gh I/O.
--dry-run (DEFAULT) prints the plan and executes nothing; --apply acts. MERGE
re-confirms authorization with settle_one_pr at apply time (gate stays sole
authority; tier-4/over-tier never auto-merge); CLOSE only on empty; off-limits
(structex/, claude/fusion-, pinned PRs) LEFT untouched; REPAIR is label-only
(drain-repair) in v1 and bounded by the pass caps. Lets the operator/boss loop
drain the backlog over-cap instead of idling — no boss_loop.py edit required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(swarm): drain merge-authority guard — never auto-merge gate logic

Forces off_limits=LEAVE on any PR touching merge-authority surfaces
(review_queue*, swarm/quorum_evidence, review/evidence, settle_*/settlement,
.github/workflows/aragora-merge-quorum*) regardless of tier/quorum, so the drain
can never self-modify the gate. Surfaced after the first pass merged an
evidence-parser change (#8467). +2 tests; file-paths fetched for classification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(swarm): REPAIR auto-dispatch v2 (scope-locked, double-flag-gated)

Turns REPAIR from label-only into a bounded worker dispatch that fixes a
red-but-useful PR's failing required checks on its own branch. Safety:
- make_repair_order/make_repair_prompt (pure, tested): scope-locked directive —
  work only on the PR branch, minimal diff, NEVER touch gate-logic surfaces,
  bounded effort (STOP rather than thrash).
- Dispatch requires BOTH --apply AND --enable-repair-dispatch; without the
  explicit enable flag it only prints the plan (autonomous repair can never start
  by accident). Bounded by drain-pass caps (<=2/pass). Isolated worktree on the
  PR branch; one failure never aborts the pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(swarm): wire drain into the boss cycle (over-cap -> drain, not idle)

When the backlog gate reports `shepherd` (open PRs >= cap), the cycle now runs
one bounded drain pass instead of only idling/manufacturing work — merge
fully-green PRs through the settle gate, close empty ones, PLAN repairs. Repair
is never auto-dispatched here (no --enable-repair-dispatch), so it only prints
the bounded plan. Default OFF (ARAGORA_DRAIN_ENABLED=1 to turn on); dry-run
unless ARAGORA_DRAIN_APPLY=1; off-limits structex/ + claude/. Best-effort —
never fails the cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(swarm): address quorum-review blockers on drain repair-dispatch

Claude's merge-quorum review requested changes; fixed:
- dispatch_repair: push the worktree to the PR branch ONLY if the agent run
  succeeded (run.returncode == 0), and report success only if the push landed.
  A broken / scope-violating / timed-out worker no longer propagates its
  leftovers to the branch.
- build_candidates: restore the cheap pre-filter — off-limits-by-id/prefix,
  owned, and superseded PRs route to LEAVE/CLOSE without a `gh pr view`. Only
  path-authority candidates pay for the detail fetch (was +300 gh calls/pass
  at queue size 60-300 — every structex/* and claude/* branch).
- dispatch_repair: shutil.rmtree backstop in finally so a failed `worktree add`
  doesn't leak the mkdtemp dir.
- tests: off_limits/owned -> LEAVE regardless of has_changes (proves a repair
  worker can never spawn on another fleet's branch); cheap pre-filter skips the
  detail fetch for off-limits-by-prefix PRs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scarmani added a commit that referenced this pull request Jun 18, 2026
…red authority (#8471 follow-ups) (#8473)

* feat(swarm): scripts/boss_drain_pass.py — runnable bounded drain pass (dry-run default)

Wires the tested drain core (boss_drain/drain_pass/drain_policy) to real gh I/O.
--dry-run (DEFAULT) prints the plan and executes nothing; --apply acts. MERGE
re-confirms authorization with settle_one_pr at apply time (gate stays sole
authority; tier-4/over-tier never auto-merge); CLOSE only on empty; off-limits
(structex/, claude/fusion-, pinned PRs) LEFT untouched; REPAIR is label-only
(drain-repair) in v1 and bounded by the pass caps. Lets the operator/boss loop
drain the backlog over-cap instead of idling — no boss_loop.py edit required.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(swarm): drain merge-authority guard — never auto-merge gate logic

Forces off_limits=LEAVE on any PR touching merge-authority surfaces
(review_queue*, swarm/quorum_evidence, review/evidence, settle_*/settlement,
.github/workflows/aragora-merge-quorum*) regardless of tier/quorum, so the drain
can never self-modify the gate. Surfaced after the first pass merged an
evidence-parser change (#8467). +2 tests; file-paths fetched for classification.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(swarm): REPAIR auto-dispatch v2 (scope-locked, double-flag-gated)

Turns REPAIR from label-only into a bounded worker dispatch that fixes a
red-but-useful PR's failing required checks on its own branch. Safety:
- make_repair_order/make_repair_prompt (pure, tested): scope-locked directive —
  work only on the PR branch, minimal diff, NEVER touch gate-logic surfaces,
  bounded effort (STOP rather than thrash).
- Dispatch requires BOTH --apply AND --enable-repair-dispatch; without the
  explicit enable flag it only prints the plan (autonomous repair can never start
  by accident). Bounded by drain-pass caps (<=2/pass). Isolated worktree on the
  PR branch; one failure never aborts the pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(swarm): wire drain into the boss cycle (over-cap -> drain, not idle)

When the backlog gate reports `shepherd` (open PRs >= cap), the cycle now runs
one bounded drain pass instead of only idling/manufacturing work — merge
fully-green PRs through the settle gate, close empty ones, PLAN repairs. Repair
is never auto-dispatched here (no --enable-repair-dispatch), so it only prints
the bounded plan. Default OFF (ARAGORA_DRAIN_ENABLED=1 to turn on); dry-run
unless ARAGORA_DRAIN_APPLY=1; off-limits structex/ + claude/. Best-effort —
never fails the cycle.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(swarm): harden boss-drain repair-apply, cheap pre-filter, anchored authority (PR #8471 follow-ups)

Quorum-reviewer follow-ups to the drain stack, separate from the landing PR:

P2 — correctness:
- dispatch_repair (boss_drain_pass): the apply path ran the agent then did an
  unconditional `git push` and returned True on the AGENT's returncode, ignoring
  whether anything was committed or the push succeeded. Now require a new commit
  beyond the PR's remote tip, assert push returncode==0, and confirm the remote
  tip actually advanced to the new commit — return False otherwise.
- build_candidates (boss_drain): restore the cheap pre-filter. Off-limits-by-
  id/prefix, owned, and explicitly-superseded PRs now classify from the list view
  alone (no per-PR `gh pr view`); only survivors fetch detail for the merge-
  authority probe + changedFiles. Avoids up to +300 gh calls/pass at --list-limit 300.

P3 — hardening:
- dispatch_repair worktree leak: worktree now lives at a fresh non-existent subpath
  of an mkdtemp base; finally adds a shutil.rmtree(base) backstop so a failed
  `git worktree add` (never registered, so `worktree remove` can't clean it) no
  longer leaks the temp dir.
- _proxy_authorized: collapse-by-last-write over duplicate check names replaced
  with any-SUCCESS semantics, so a stale/later pending re-run row can't mask an
  earlier green (ordering-independent). settle_one_pr still re-confirms before merge.
- MERGE_AUTHORITY_PATTERNS anchored to known path segments; bare "settlement"
  removed (it fenced unrelated modules like debate/settlement.py, markets/
  credit_settlement.py). Real gate surfaces stay fenced.
- The 5 required-check names single-sourced as REQUIRED_CHECK_NAMES in boss_drain;
  the repair prompt and boss_drain_pass._REQUIRED both derive from it (no drift).

TDD: tests/swarm/test_boss_drain.py extended (+11 cases). mypy clean; pre-commit clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: codex[bot] <codex[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant