Add review-loop: automated review/triage/fix cycle on a branch by mattmoran56 · Pull Request #90 · mattmoran56/Crucible-Code

mattmoran56 · 2026-05-02T17:31:35Z

Note on the branch name: the branch is named claude/add-colorful-theme-IW1IX for historical reasons — the contents are the review-loop feature, not a theme. Worth renaming before merge.

Summary

A new orchestrator that drives a Claude review (diff vs. base) → triage (one sub-agent per finding) → fix (apply, commit, push) cycle on a session's branch. The loop stops on the first of:

Convergence — N consecutive rounds with zero actionable triage decisions (default 2)
Iteration cap — hard limit on rounds (default 5)
Cost cap — total USD spent across all sub-sessions (default $5)
Manual cancel from the Review Loop tab

All four are user-configurable per workspace, with per-project overrides in Settings → Review Loop (including a toggle that hides the toolbar button and prevents the loop from running for that scope).

After the loop ends, any skip / defer triage decisions are summarised in a single sticky comment on the open PR (using a hidden marker so subsequent rounds update the same comment instead of re-posting), giving reviewers a record of what was knowingly left undone and why.

How it surfaces

A default-visible Review Loop toolbar button (seeded once via a stable id so re-removal sticks). Hidden when the per-project toggle is off.
A new Review Loop core tab in the session workspace showing status pill, current phase, cumulative cost, per-round triage decisions, and a per-round log.
Three new app actions usable as custom-button commands: review-loop:start, review-loop:cancel, review-loop:toggle-tab.

Screenshots

Review Loop tab — mid-loop (round 2 in triage):

Review Loop tab — converged after 2 clean rounds:

Settings — workspace defaults + per-project overrides:

Architecture

Concern	Where
Orchestrator (phases, stop conditions, cancel, sticky PR comment)	`src/main/services/review-loop.service.ts`
IPC handlers + persistence (electron-store)	`src/main/ipc/review-loop.ipc.ts`
Preload bridge + state-update event subscription	`src/preload/index.ts`
Renderer state (settings, per-session loop state, effective-config helper)	`src/renderer/stores/reviewLoopStore.ts`
Tab UI (status, rounds, triage decisions, log)	`src/renderer/components/review-loop/ReviewLoopPanel.tsx`
Settings section	`src/renderer/components/settings/ReviewLoopSettings.tsx`
Built-in toolbar button (seeded once) + per-project filter	`src/renderer/stores/buttonStore.ts`

Each round writes its findings to <worktree>/.crucible/review-loop/round-N-{issues,triage}.json so phase agents can communicate via files. Cost is read from the Claude CLI's --output-format json envelope (total_cost_usd).

Caveats worth flagging on review

The spawned claude runs with --dangerously-skip-permissions — necessary for the unattended loop, but a real footgun. We may want a first-time confirmation per project.
Couldn't run end-to-end (no claude binary in the build sandbox), so the prompts and JSON parsing are unverified. First real run on macOS may need prompt tweaks if Claude doesn't reliably emit the expected JSON.
The sticky comment requires prNumber. If the loop runs on a branch without an open PR, skipped items aren't surfaced anywhere outside the tab.

Test plan

https://claude.ai/code/session_013V6ENheXA8v4fVfF6MYwjC

Generated by Claude Code

A new orchestrator drives a Claude review (diff vs base) → triage (one sub-agent per finding) → fix (apply, commit, push) cycle until one of: N consecutive clean rounds, iteration cap, cost cap, or manual cancel. Skipped/deferred items get summarised in a sticky PR comment so reviewers can see what was knowingly left undone. Surfaced as a default-visible "Review Loop" toolbar button (toggleable per project) and a Review Loop tab in the session workspace that shows phase, per-round triage decisions, cumulative cost, and live log. Workspace defaults and per-project overrides for caps and the toggle live in Settings.

Adds three new screenshots (running loop, completed loop, settings) and a Review Loop section in the README. Other screenshots were re-captured from the existing stories — now showing the new toolbar button and tab that the review-loop feature added — so they no longer drift from the current UI. Mock infrastructure: mockReviewLoopRunning/Completed seeds plus a window override hook so stories can pin which state mockApi.getState returns (the panel calls refreshState on mount, which would otherwise clobber the seeded value).

- Cap claude subprocess stdout/stderr buffers at 5MB to prevent OOM on long sessions (review-loop.service.ts). - Remove unreachable empty-skipped branch in renderStickyComment; writeStickyPRComment already early-returns when skippedIssues is empty (review-loop.service.ts). - Correct mockReviewLoopCompleted.cumulativeCostUsd from 2.07 to 1.64 to match the sum of round costs (mockData.ts). - Guard loadButtons with an in-flight promise so concurrent calls share a single seed-and-save pass, avoiding the read→save race in seedBuiltInButtons (buttonStore.ts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- cancel: spawn claude detached and signal whole process group so sub-agents and gh subprocesses don't get orphaned after cancel/timeout - finalize: drop completed loops from activeLoops to stop the in-memory state (rounds, transcripts, raw issues) leaking for the app's lifetime - review phase: unlink the issues file before invoking claude and assert it was rewritten, so a model that silently skips Write can't be misread as a clean round and trigger a false 'converged' stop - runLoop: re-check the cost cap between phases, not only between rounds, so a $5 cap actually bounds spend instead of letting one extra review→triage→fix overshoot - runClaude: add a 30-minute per-phase wall-clock timeout that kills the subtree and reports a phase error so the loop self-heals on a hung CLI - App.tsx: include all five loaders in the bootstrap effect's deps array - fix prompt: enumerate the exact files flagged for fixing this round and forbid edits outside that allowlist, so a permissions-skipping run can't ship collateral edits upstream - sticky comment dedup: rank 'defer' over 'skip' when collapsing repeats so a later skip can't silently overwrite a prior defer - ReviewLoopSettings: type the project-override delta as Partial<ReviewLoopProjectOverride> instead of casting to any so key typos and value-type mismatches are caught at compile time Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mattmoran56 · 2026-05-03T11:33:41Z

Review Loop — issues left open

Generated by the Crucible Code review loop on branch claude/add-colorful-theme-IW1IX (base: main).

The loop ran for 2 rounds and chose not to fix the following 7 items:

↷ Skipped: Cancel mid-phase leaves loop stuck in 'running' state — `src/main/services/review-loop.service.ts:205`

Reason: Not a real bug in current code: runLoop already checks loop.cancelled after each phase and calls finalize(loop,'cancelled') before the early return, so the cancel path does finalize the state. The described stuck-running scenario does not occur.

Original finding

When the user cancels while a phase is in flight, cancelReviewLoop sets loop.cancelled=true and SIGTERMs the child. runClaude resolves with {ok:false,error:'cancelled'}. runReviewPhase / runTriagePhase / runFixPhase then check if (loop.cancelled) return false BEFORE the error branch, returning false without calling finalize(). Back in runLoop, if (!reviewOk) return // finalize already called by phase exits with no finalize call — yet finalize was never invoked on the cancel path. The loop coroutine returns and state.status remains 'running' forever; the UI Cancel button keeps showing, future state updates never arrive, and isReviewLoopActive(sessionId) keeps returning true so the user can't restart. Either call finalize(loop,'cancelled') in each phase's cancellation branch or check loop.cancelled in runLoop before the (!reviewOk) early return.

↷ Skipped: runClaude doc comment falsely claims it streams text to the renderer — `src/main/services/review-loop.service.ts:371`

Reason: False positive: the implementation now uses --output-format stream-json with incremental NDJSON parsing, pushTranscript calls, and throttled emitState updates, so the JSDoc accurately describes the live progress streaming. The 'only buffers final JSON' claim no longer matches the code.

Original finding

The JSDoc on runClaude says it 'Streams text to the renderer as log lines so users can watch progress live.' The implementation only buffers stdout in stdoutBuf, parses one final JSON object on exit, and never sends incremental progress to the renderer (no emitState in the stdout/stderr handlers). Misleading documentation that will confuse maintainers — either implement streaming or drop the claim.

⏭ Deferred: No timeout / watchdog on claude subprocess — `src/main/services/review-loop.service.ts:374`

Reason: Real hardening concern but out of scope for this PR. The feature is functional with manual cancel and per-round caps; a per-phase wall-clock watchdog that SIGTERMs and finalizes with 'error' is a follow-up improvement once the core review-loop cycle is merged.

Original finding

The spawned claude subprocess has no timeout. The cost cap and iteration cap are only checked between rounds, and the cancel path is the only way to abort an in-flight phase. A hung claude (network stall, infinite tool loop, etc.) keeps the loop pinned indefinitely, the cost cap never fires, and the user must click Cancel manually — which itself has the bug above. A reasonable per-phase wall-clock timeout that SIGTERMs the child and finalizes with 'error' would bound worst-case cost and cleanly recover.

⏭ Deferred: mainWindow reference is set once and never refreshed — `src/main/services/review-loop.service.ts:33`

Reason: Window recreation does not occur in practice today — the app quits on window-all-closed, so the stale-reference path is unreachable. Pattern matches other services in the codebase. Real concern only if a future refactor introduces window recreation; defer until that lifecycle is on the roadmap.

Original finding

setReviewLoopWindow is called from registerReviewLoopHandlers exactly once at startup. If the BrowserWindow is ever destroyed and recreated (e.g. on macOS reopen-from-dock or any future window-recreate flow), the module-level mainWindow still points at the destroyed window and emitState short-circuits via isDestroyed(). All in-flight review-loop progress events are then silently dropped — the renderer's UI freezes on the last state it saw and only updates on manual refreshState calls. Either re-bind on window creation or use BrowserWindow.getAllWindows() to broadcast.

⏭ Deferred: Review and triage phases inherit --dangerously-skip-permissions despite being read-only — `src/main/services/review-loop.service.ts:364`

Reason: Real defense-in-depth concern, but attack surface is low: the loop runs against the user's own checked-out branch where they presumably trust the diff. Constraining review/triage via --allowedTools is a worthwhile follow-up but not blocking for the initial feature ship.

Original finding

runClaude is shared by all three phases and always passes --dangerously-skip-permissions. Only the fix phase actually needs write/exec permissions; review and triage just need to read the diff and write one structured JSON file each. Granting full skip-permissions to those phases broadens the blast radius for any prompt-injection content that lands in the diff (e.g. a malicious test fixture telling claude to run a command). A least-privilege design would constrain the read phases via --allowedTools (Bash for git diff, Read, Glob, Grep, Write to a single path) instead of skip-permissions.

↷ Skipped: Branch and base-branch names are interpolated into prompts without escaping — `src/main/services/review-loop.service.ts:224`

Reason: Branch names come from local git refs, not user input fields, and git refname rules already forbid most problematic characters in normal flows. Branch names are only spliced into prompt text and markdown — not into shell commands — so this is at most a minor prompt-confusion risk on a deliberately crafted ref. Not worth introducing sanitization in this PR.

Original finding

buildReviewPrompt, buildTriagePrompt, and buildFixPrompt all use template-literal interpolation on o.branch and o.baseBranch. Git refnames don't allow most shell-special chars but they DO allow backticks, dollar signs, and slashes; a branch like foobar or foo${x} produces a malformed prompt to claude (and in the case of the fix prompt, would interleave into the literal git diff instructions). The orchestrator should at minimum reject (or sanitize) ref names containing backticks, ${, and quotes before splicing them into prompts.

⏭ Deferred: Built-in review-loop button uses a localStorage 'seeded' flag that diverges from on-disk state — `src/renderer/stores/buttonStore.ts:79`

Reason: Buttons live in electron-store (main process) while the seeded flag lives in renderer localStorage, creating a divergence risk. Low probability today (no settings-sync feature exists), but the flag should move into the on-disk button store before this becomes harder to migrate. Defer to a follow-up.

Original finding

seedBuiltInButtons writes 'codecrucible.builtin-buttons.seeded'='1' to localStorage as a one-shot 'we already added it' flag, then never re-seeds. But the button itself is persisted in the main-process button store, not in localStorage. If a user clears Application Data or uses Crucible on a different machine with synced settings, localStorage is empty and the button gets re-added even if the user had deliberately removed it. Conversely, if the user clears the on-disk button store but keeps localStorage, the built-in button is gone forever. The 'seeded' bit should live alongside the buttons (in the same persisted store), not in a separate browser-storage location.

…rate trade-offs Previously "skip" meant either a false positive OR an accepted trade-off, and both ended up in the sticky PR comment. Triage was also told to mark unconfirmed issues as "skip", which polluted the comment with non-issues. Tighten the prompt: - skip = real issue / trade-off we are deliberately not fixing (PR-mentioned) - defer = real issue, out of scope for this branch (PR-mentioned) - noop = false positive / unconfirmed / pre-existing (NOT PR-mentioned) Conservative-fallback rule flipped from "skip" to "noop" so unverified findings are excluded from reviewer-facing notes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…heme-IW1IX

- buttonStore.test.ts: loadButtons seeds the built-in Review Loop button on first run, which made the "stores api result" test see 2 buttons instead of the 1 returned by the API mock. Pre-seed the localStorage sentinel so the seeding pass is a no-op for this test, isolating the assertion. - app.spec.ts ("Match System toggle" e2e): the bare role-radio "On" locator matched 5 elements once Review Loop settings (workspace + per-project) rendered their On/Off toggles. Scope the click to the Match System container so it stays unambiguous as more toggles get added. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

My previous .filter({ has: getByText('Match System') }).first() matched the outermost ancestor div, which still contained all five On/Off toggles on the settings page. Walk up two parents from the unique "Match System" label to land on the row that owns the adjacent ToggleGroup, so the strict-mode click sees exactly one On radio. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…heme-IW1IX # Conflicts: # README.md # mock/mockApi.ts # src/main/ipc/register.ts # src/preload/index.ts # src/renderer/stories/FullApp.stories.tsx # src/renderer/stories/helpers/storeSetup.ts # src/shared/constants.ts

claude and others added 4 commits May 2, 2026 16:56

mattmoran56 and others added 5 commits May 3, 2026 12:37

Merge remote-tracking branch 'origin/main' into claude/add-colorful-t…

5c7fb32

…heme-IW1IX

mattmoran56 merged commit 6cf34f0 into main May 3, 2026
3 checks passed

mattmoran56 deleted the claude/add-colorful-theme-IW1IX branch May 3, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add review-loop: automated review/triage/fix cycle on a branch#90

Add review-loop: automated review/triage/fix cycle on a branch#90
mattmoran56 merged 9 commits into
mainfrom
claude/add-colorful-theme-IW1IX

mattmoran56 commented May 2, 2026

Uh oh!

mattmoran56 commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mattmoran56 commented May 2, 2026

Summary

How it surfaces

Screenshots

Architecture

Caveats worth flagging on review

Test plan

Uh oh!

mattmoran56 commented May 3, 2026

Review Loop — issues left open

↷ Skipped: Cancel mid-phase leaves loop stuck in 'running' state — src/main/services/review-loop.service.ts:205

↷ Skipped: runClaude doc comment falsely claims it streams text to the renderer — src/main/services/review-loop.service.ts:371

⏭ Deferred: No timeout / watchdog on claude subprocess — src/main/services/review-loop.service.ts:374

⏭ Deferred: mainWindow reference is set once and never refreshed — src/main/services/review-loop.service.ts:33

⏭ Deferred: Review and triage phases inherit --dangerously-skip-permissions despite being read-only — src/main/services/review-loop.service.ts:364

↷ Skipped: Branch and base-branch names are interpolated into prompts without escaping — src/main/services/review-loop.service.ts:224

⏭ Deferred: Built-in review-loop button uses a localStorage 'seeded' flag that diverges from on-disk state — src/renderer/stores/buttonStore.ts:79

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

↷ Skipped: Cancel mid-phase leaves loop stuck in 'running' state — `src/main/services/review-loop.service.ts:205`

↷ Skipped: runClaude doc comment falsely claims it streams text to the renderer — `src/main/services/review-loop.service.ts:371`

⏭ Deferred: No timeout / watchdog on claude subprocess — `src/main/services/review-loop.service.ts:374`

⏭ Deferred: mainWindow reference is set once and never refreshed — `src/main/services/review-loop.service.ts:33`

⏭ Deferred: Review and triage phases inherit --dangerously-skip-permissions despite being read-only — `src/main/services/review-loop.service.ts:364`

↷ Skipped: Branch and base-branch names are interpolated into prompts without escaping — `src/main/services/review-loop.service.ts:224`

⏭ Deferred: Built-in review-loop button uses a localStorage 'seeded' flag that diverges from on-disk state — `src/renderer/stores/buttonStore.ts:79`