Add review-loop: automated review/triage/fix cycle on a branch#90
Conversation
A new orchestrator drives a Claude review (diff vs base) → triage (one sub-agent per finding) → fix (apply, commit, push) cycle until one of: N consecutive clean rounds, iteration cap, cost cap, or manual cancel. Skipped/deferred items get summarised in a sticky PR comment so reviewers can see what was knowingly left undone. Surfaced as a default-visible "Review Loop" toolbar button (toggleable per project) and a Review Loop tab in the session workspace that shows phase, per-round triage decisions, cumulative cost, and live log. Workspace defaults and per-project overrides for caps and the toggle live in Settings.
Adds three new screenshots (running loop, completed loop, settings) and a Review Loop section in the README. Other screenshots were re-captured from the existing stories — now showing the new toolbar button and tab that the review-loop feature added — so they no longer drift from the current UI. Mock infrastructure: mockReviewLoopRunning/Completed seeds plus a window override hook so stories can pin which state mockApi.getState returns (the panel calls refreshState on mount, which would otherwise clobber the seeded value).
- Cap claude subprocess stdout/stderr buffers at 5MB to prevent OOM on long sessions (review-loop.service.ts). - Remove unreachable empty-skipped branch in renderStickyComment; writeStickyPRComment already early-returns when skippedIssues is empty (review-loop.service.ts). - Correct mockReviewLoopCompleted.cumulativeCostUsd from 2.07 to 1.64 to match the sum of round costs (mockData.ts). - Guard loadButtons with an in-flight promise so concurrent calls share a single seed-and-save pass, avoiding the read→save race in seedBuiltInButtons (buttonStore.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- cancel: spawn claude detached and signal whole process group so sub-agents and gh subprocesses don't get orphaned after cancel/timeout - finalize: drop completed loops from activeLoops to stop the in-memory state (rounds, transcripts, raw issues) leaking for the app's lifetime - review phase: unlink the issues file before invoking claude and assert it was rewritten, so a model that silently skips Write can't be misread as a clean round and trigger a false 'converged' stop - runLoop: re-check the cost cap between phases, not only between rounds, so a $5 cap actually bounds spend instead of letting one extra review→triage→fix overshoot - runClaude: add a 30-minute per-phase wall-clock timeout that kills the subtree and reports a phase error so the loop self-heals on a hung CLI - App.tsx: include all five loaders in the bootstrap effect's deps array - fix prompt: enumerate the exact files flagged for fixing this round and forbid edits outside that allowlist, so a permissions-skipping run can't ship collateral edits upstream - sticky comment dedup: rank 'defer' over 'skip' when collapsing repeats so a later skip can't silently overwrite a prior defer - ReviewLoopSettings: type the project-override delta as Partial<ReviewLoopProjectOverride> instead of casting to any so key typos and value-type mismatches are caught at compile time Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review Loop — issues left openGenerated by the Crucible Code review loop on branch The loop ran for 2 rounds and chose not to fix the following 7 items: ↷ Skipped: Cancel mid-phase leaves loop stuck in 'running' state —
|
…rate trade-offs Previously "skip" meant either a false positive OR an accepted trade-off, and both ended up in the sticky PR comment. Triage was also told to mark unconfirmed issues as "skip", which polluted the comment with non-issues. Tighten the prompt: - skip = real issue / trade-off we are deliberately not fixing (PR-mentioned) - defer = real issue, out of scope for this branch (PR-mentioned) - noop = false positive / unconfirmed / pre-existing (NOT PR-mentioned) Conservative-fallback rule flipped from "skip" to "noop" so unverified findings are excluded from reviewer-facing notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- buttonStore.test.ts: loadButtons seeds the built-in Review Loop button on
first run, which made the "stores api result" test see 2 buttons instead of
the 1 returned by the API mock. Pre-seed the localStorage sentinel so the
seeding pass is a no-op for this test, isolating the assertion.
- app.spec.ts ("Match System toggle" e2e): the bare role-radio "On" locator
matched 5 elements once Review Loop settings (workspace + per-project)
rendered their On/Off toggles. Scope the click to the Match System
container so it stays unambiguous as more toggles get added.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
My previous .filter({ has: getByText('Match System') }).first() matched the
outermost ancestor div, which still contained all five On/Off toggles on the
settings page. Walk up two parents from the unique "Match System" label to
land on the row that owns the adjacent ToggleGroup, so the strict-mode click
sees exactly one On radio.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…heme-IW1IX # Conflicts: # README.md # mock/mockApi.ts # src/main/ipc/register.ts # src/preload/index.ts # src/renderer/stories/FullApp.stories.tsx # src/renderer/stories/helpers/storeSetup.ts # src/shared/constants.ts
Summary
A new orchestrator that drives a Claude review (diff vs. base) → triage (one sub-agent per finding) → fix (apply, commit, push) cycle on a session's branch. The loop stops on the first of:
All four are user-configurable per workspace, with per-project overrides in Settings → Review Loop (including a toggle that hides the toolbar button and prevents the loop from running for that scope).
After the loop ends, any
skip/defertriage decisions are summarised in a single sticky comment on the open PR (using a hidden marker so subsequent rounds update the same comment instead of re-posting), giving reviewers a record of what was knowingly left undone and why.How it surfaces
review-loop:start,review-loop:cancel,review-loop:toggle-tab.Screenshots
Review Loop tab — mid-loop (round 2 in triage):
Review Loop tab — converged after 2 clean rounds:
Settings — workspace defaults + per-project overrides:
Architecture
src/main/services/review-loop.service.tssrc/main/ipc/review-loop.ipc.tssrc/preload/index.tssrc/renderer/stores/reviewLoopStore.tssrc/renderer/components/review-loop/ReviewLoopPanel.tsxsrc/renderer/components/settings/ReviewLoopSettings.tsxsrc/renderer/stores/buttonStore.tsEach round writes its findings to
<worktree>/.crucible/review-loop/round-N-{issues,triage}.jsonso phase agents can communicate via files. Cost is read from the Claude CLI's--output-format jsonenvelope (total_cost_usd).Caveats worth flagging on review
clauderuns with--dangerously-skip-permissions— necessary for the unattended loop, but a real footgun. We may want a first-time confirmation per project.claudebinary in the build sandbox), so the prompts and JSON parsing are unverified. First real run on macOS may need prompt tweaks if Claude doesn't reliably emit the expected JSON.prNumber. If the loop runs on a branch without an open PR, skipped items aren't surfaced anywhere outside the tab.Test plan
claudesubprocess and finalises the state ascancelledcumulativeCostUsd ≥ costCapUsdmaxIterationsroundsconsecutiveCleanRoundsconsecutive zero-fix roundsnpm run build) and Storybook stories render the running/completed/settings panels correctlyhttps://claude.ai/code/session_013V6ENheXA8v4fVfF6MYwjC
Generated by Claude Code