Skip to content

Add review-loop: automated review/triage/fix cycle on a branch#90

Merged
mattmoran56 merged 9 commits into
mainfrom
claude/add-colorful-theme-IW1IX
May 3, 2026
Merged

Add review-loop: automated review/triage/fix cycle on a branch#90
mattmoran56 merged 9 commits into
mainfrom
claude/add-colorful-theme-IW1IX

Conversation

@mattmoran56
Copy link
Copy Markdown
Owner

Note on the branch name: the branch is named claude/add-colorful-theme-IW1IX for historical reasons — the contents are the review-loop feature, not a theme. Worth renaming before merge.

Summary

A new orchestrator that drives a Claude review (diff vs. base) → triage (one sub-agent per finding) → fix (apply, commit, push) cycle on a session's branch. The loop stops on the first of:

  • Convergence — N consecutive rounds with zero actionable triage decisions (default 2)
  • Iteration cap — hard limit on rounds (default 5)
  • Cost cap — total USD spent across all sub-sessions (default $5)
  • Manual cancel from the Review Loop tab

All four are user-configurable per workspace, with per-project overrides in Settings → Review Loop (including a toggle that hides the toolbar button and prevents the loop from running for that scope).

After the loop ends, any skip / defer triage decisions are summarised in a single sticky comment on the open PR (using a hidden marker so subsequent rounds update the same comment instead of re-posting), giving reviewers a record of what was knowingly left undone and why.

How it surfaces

  • A default-visible Review Loop toolbar button (seeded once via a stable id so re-removal sticks). Hidden when the per-project toggle is off.
  • A new Review Loop core tab in the session workspace showing status pill, current phase, cumulative cost, per-round triage decisions, and a per-round log.
  • Three new app actions usable as custom-button commands: review-loop:start, review-loop:cancel, review-loop:toggle-tab.

Screenshots

Review Loop tab — mid-loop (round 2 in triage):

Review loop running

Review Loop tab — converged after 2 clean rounds:

Review loop completed

Settings — workspace defaults + per-project overrides:

Review loop settings

Architecture

Concern Where
Orchestrator (phases, stop conditions, cancel, sticky PR comment) src/main/services/review-loop.service.ts
IPC handlers + persistence (electron-store) src/main/ipc/review-loop.ipc.ts
Preload bridge + state-update event subscription src/preload/index.ts
Renderer state (settings, per-session loop state, effective-config helper) src/renderer/stores/reviewLoopStore.ts
Tab UI (status, rounds, triage decisions, log) src/renderer/components/review-loop/ReviewLoopPanel.tsx
Settings section src/renderer/components/settings/ReviewLoopSettings.tsx
Built-in toolbar button (seeded once) + per-project filter src/renderer/stores/buttonStore.ts

Each round writes its findings to <worktree>/.crucible/review-loop/round-N-{issues,triage}.json so phase agents can communicate via files. Cost is read from the Claude CLI's --output-format json envelope (total_cost_usd).

Caveats worth flagging on review

  • The spawned claude runs with --dangerously-skip-permissions — necessary for the unattended loop, but a real footgun. We may want a first-time confirmation per project.
  • Couldn't run end-to-end (no claude binary in the build sandbox), so the prompts and JSON parsing are unverified. First real run on macOS may need prompt tweaks if Claude doesn't reliably emit the expected JSON.
  • The sticky comment requires prNumber. If the loop runs on a branch without an open PR, skipped items aren't surfaced anywhere outside the tab.

Test plan

  • Toolbar button appears by default; hides when per-project toggle is flipped off
  • Starting the loop switches to the Review Loop tab and shows the running status pill
  • Cancel button stops the in-flight claude subprocess and finalises the state as cancelled
  • Each phase writes the expected JSON file; subsequent phases read it back correctly
  • Cost cap stops the loop once cumulativeCostUsd ≥ costCapUsd
  • Iteration cap stops the loop after maxIterations rounds
  • Convergence stops the loop after consecutiveCleanRounds consecutive zero-fix rounds
  • Sticky PR comment is posted on first stop and updated (not duplicated) on subsequent runs
  • Workspace-default and per-project settings round-trip through electron-store and survive restart
  • Build passes (npm run build) and Storybook stories render the running/completed/settings panels correctly

https://claude.ai/code/session_013V6ENheXA8v4fVfF6MYwjC


Generated by Claude Code

claude and others added 4 commits May 2, 2026 16:56
A new orchestrator drives a Claude review (diff vs base) → triage (one
sub-agent per finding) → fix (apply, commit, push) cycle until one of:
N consecutive clean rounds, iteration cap, cost cap, or manual cancel.
Skipped/deferred items get summarised in a sticky PR comment so
reviewers can see what was knowingly left undone.

Surfaced as a default-visible "Review Loop" toolbar button (toggleable
per project) and a Review Loop tab in the session workspace that shows
phase, per-round triage decisions, cumulative cost, and live log.
Workspace defaults and per-project overrides for caps and the toggle
live in Settings.
Adds three new screenshots (running loop, completed loop, settings) and
a Review Loop section in the README. Other screenshots were re-captured
from the existing stories — now showing the new toolbar button and tab
that the review-loop feature added — so they no longer drift from the
current UI.

Mock infrastructure: mockReviewLoopRunning/Completed seeds plus a
window override hook so stories can pin which state mockApi.getState
returns (the panel calls refreshState on mount, which would otherwise
clobber the seeded value).
- Cap claude subprocess stdout/stderr buffers at 5MB to prevent OOM
  on long sessions (review-loop.service.ts).
- Remove unreachable empty-skipped branch in renderStickyComment;
  writeStickyPRComment already early-returns when skippedIssues is
  empty (review-loop.service.ts).
- Correct mockReviewLoopCompleted.cumulativeCostUsd from 2.07 to 1.64
  to match the sum of round costs (mockData.ts).
- Guard loadButtons with an in-flight promise so concurrent calls
  share a single seed-and-save pass, avoiding the read→save race in
  seedBuiltInButtons (buttonStore.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- cancel: spawn claude detached and signal whole process group so
  sub-agents and gh subprocesses don't get orphaned after cancel/timeout
- finalize: drop completed loops from activeLoops to stop the in-memory
  state (rounds, transcripts, raw issues) leaking for the app's lifetime
- review phase: unlink the issues file before invoking claude and assert
  it was rewritten, so a model that silently skips Write can't be misread
  as a clean round and trigger a false 'converged' stop
- runLoop: re-check the cost cap between phases, not only between rounds,
  so a $5 cap actually bounds spend instead of letting one extra
  review→triage→fix overshoot
- runClaude: add a 30-minute per-phase wall-clock timeout that kills the
  subtree and reports a phase error so the loop self-heals on a hung CLI
- App.tsx: include all five loaders in the bootstrap effect's deps array
- fix prompt: enumerate the exact files flagged for fixing this round
  and forbid edits outside that allowlist, so a permissions-skipping run
  can't ship collateral edits upstream
- sticky comment dedup: rank 'defer' over 'skip' when collapsing repeats
  so a later skip can't silently overwrite a prior defer
- ReviewLoopSettings: type the project-override delta as
  Partial<ReviewLoopProjectOverride> instead of casting to any so key
  typos and value-type mismatches are caught at compile time

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mattmoran56
Copy link
Copy Markdown
Owner Author

Review Loop — issues left open

Generated by the Crucible Code review loop on branch claude/add-colorful-theme-IW1IX (base: main).

The loop ran for 2 rounds and chose not to fix the following 7 items:

↷ Skipped: Cancel mid-phase leaves loop stuck in 'running' state — src/main/services/review-loop.service.ts:205

Reason: Not a real bug in current code: runLoop already checks loop.cancelled after each phase and calls finalize(loop,'cancelled') before the early return, so the cancel path does finalize the state. The described stuck-running scenario does not occur.

Original finding

When the user cancels while a phase is in flight, cancelReviewLoop sets loop.cancelled=true and SIGTERMs the child. runClaude resolves with {ok:false,error:'cancelled'}. runReviewPhase / runTriagePhase / runFixPhase then check if (loop.cancelled) return false BEFORE the error branch, returning false without calling finalize(). Back in runLoop, if (!reviewOk) return // finalize already called by phase exits with no finalize call — yet finalize was never invoked on the cancel path. The loop coroutine returns and state.status remains 'running' forever; the UI Cancel button keeps showing, future state updates never arrive, and isReviewLoopActive(sessionId) keeps returning true so the user can't restart. Either call finalize(loop,'cancelled') in each phase's cancellation branch or check loop.cancelled in runLoop before the (!reviewOk) early return.

↷ Skipped: runClaude doc comment falsely claims it streams text to the renderer — src/main/services/review-loop.service.ts:371

Reason: False positive: the implementation now uses --output-format stream-json with incremental NDJSON parsing, pushTranscript calls, and throttled emitState updates, so the JSDoc accurately describes the live progress streaming. The 'only buffers final JSON' claim no longer matches the code.

Original finding

The JSDoc on runClaude says it 'Streams text to the renderer as log lines so users can watch progress live.' The implementation only buffers stdout in stdoutBuf, parses one final JSON object on exit, and never sends incremental progress to the renderer (no emitState in the stdout/stderr handlers). Misleading documentation that will confuse maintainers — either implement streaming or drop the claim.

⏭ Deferred: No timeout / watchdog on claude subprocess — src/main/services/review-loop.service.ts:374

Reason: Real hardening concern but out of scope for this PR. The feature is functional with manual cancel and per-round caps; a per-phase wall-clock watchdog that SIGTERMs and finalizes with 'error' is a follow-up improvement once the core review-loop cycle is merged.

Original finding

The spawned claude subprocess has no timeout. The cost cap and iteration cap are only checked between rounds, and the cancel path is the only way to abort an in-flight phase. A hung claude (network stall, infinite tool loop, etc.) keeps the loop pinned indefinitely, the cost cap never fires, and the user must click Cancel manually — which itself has the bug above. A reasonable per-phase wall-clock timeout that SIGTERMs the child and finalizes with 'error' would bound worst-case cost and cleanly recover.

⏭ Deferred: mainWindow reference is set once and never refreshed — src/main/services/review-loop.service.ts:33

Reason: Window recreation does not occur in practice today — the app quits on window-all-closed, so the stale-reference path is unreachable. Pattern matches other services in the codebase. Real concern only if a future refactor introduces window recreation; defer until that lifecycle is on the roadmap.

Original finding

setReviewLoopWindow is called from registerReviewLoopHandlers exactly once at startup. If the BrowserWindow is ever destroyed and recreated (e.g. on macOS reopen-from-dock or any future window-recreate flow), the module-level mainWindow still points at the destroyed window and emitState short-circuits via isDestroyed(). All in-flight review-loop progress events are then silently dropped — the renderer's UI freezes on the last state it saw and only updates on manual refreshState calls. Either re-bind on window creation or use BrowserWindow.getAllWindows() to broadcast.

⏭ Deferred: Review and triage phases inherit --dangerously-skip-permissions despite being read-only — src/main/services/review-loop.service.ts:364

Reason: Real defense-in-depth concern, but attack surface is low: the loop runs against the user's own checked-out branch where they presumably trust the diff. Constraining review/triage via --allowedTools is a worthwhile follow-up but not blocking for the initial feature ship.

Original finding

runClaude is shared by all three phases and always passes --dangerously-skip-permissions. Only the fix phase actually needs write/exec permissions; review and triage just need to read the diff and write one structured JSON file each. Granting full skip-permissions to those phases broadens the blast radius for any prompt-injection content that lands in the diff (e.g. a malicious test fixture telling claude to run a command). A least-privilege design would constrain the read phases via --allowedTools (Bash for git diff, Read, Glob, Grep, Write to a single path) instead of skip-permissions.

↷ Skipped: Branch and base-branch names are interpolated into prompts without escaping — src/main/services/review-loop.service.ts:224

Reason: Branch names come from local git refs, not user input fields, and git refname rules already forbid most problematic characters in normal flows. Branch names are only spliced into prompt text and markdown — not into shell commands — so this is at most a minor prompt-confusion risk on a deliberately crafted ref. Not worth introducing sanitization in this PR.

Original finding

buildReviewPrompt, buildTriagePrompt, and buildFixPrompt all use template-literal interpolation on o.branch and o.baseBranch. Git refnames don't allow most shell-special chars but they DO allow backticks, dollar signs, and slashes; a branch like foobar or foo${x} produces a malformed prompt to claude (and in the case of the fix prompt, would interleave into the literal git diff instructions). The orchestrator should at minimum reject (or sanitize) ref names containing backticks, ${, and quotes before splicing them into prompts.

⏭ Deferred: Built-in review-loop button uses a localStorage 'seeded' flag that diverges from on-disk state — src/renderer/stores/buttonStore.ts:79

Reason: Buttons live in electron-store (main process) while the seeded flag lives in renderer localStorage, creating a divergence risk. Low probability today (no settings-sync feature exists), but the flag should move into the on-disk button store before this becomes harder to migrate. Defer to a follow-up.

Original finding

seedBuiltInButtons writes 'codecrucible.builtin-buttons.seeded'='1' to localStorage as a one-shot 'we already added it' flag, then never re-seeds. But the button itself is persisted in the main-process button store, not in localStorage. If a user clears Application Data or uses Crucible on a different machine with synced settings, localStorage is empty and the button gets re-added even if the user had deliberately removed it. Conversely, if the user clears the on-disk button store but keeps localStorage, the built-in button is gone forever. The 'seeded' bit should live alongside the buttons (in the same persisted store), not in a separate browser-storage location.

mattmoran56 and others added 5 commits May 3, 2026 12:37
…rate trade-offs

Previously "skip" meant either a false positive OR an accepted trade-off, and
both ended up in the sticky PR comment. Triage was also told to mark unconfirmed
issues as "skip", which polluted the comment with non-issues. Tighten the prompt:
- skip   = real issue / trade-off we are deliberately not fixing (PR-mentioned)
- defer  = real issue, out of scope for this branch (PR-mentioned)
- noop   = false positive / unconfirmed / pre-existing (NOT PR-mentioned)

Conservative-fallback rule flipped from "skip" to "noop" so unverified findings
are excluded from reviewer-facing notes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- buttonStore.test.ts: loadButtons seeds the built-in Review Loop button on
  first run, which made the "stores api result" test see 2 buttons instead of
  the 1 returned by the API mock. Pre-seed the localStorage sentinel so the
  seeding pass is a no-op for this test, isolating the assertion.

- app.spec.ts ("Match System toggle" e2e): the bare role-radio "On" locator
  matched 5 elements once Review Loop settings (workspace + per-project)
  rendered their On/Off toggles. Scope the click to the Match System
  container so it stays unambiguous as more toggles get added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
My previous .filter({ has: getByText('Match System') }).first() matched the
outermost ancestor div, which still contained all five On/Off toggles on the
settings page. Walk up two parents from the unique "Match System" label to
land on the row that owns the adjacent ToggleGroup, so the strict-mode click
sees exactly one On radio.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…heme-IW1IX

# Conflicts:
#	README.md
#	mock/mockApi.ts
#	src/main/ipc/register.ts
#	src/preload/index.ts
#	src/renderer/stories/FullApp.stories.tsx
#	src/renderer/stories/helpers/storeSetup.ts
#	src/shared/constants.ts
@mattmoran56 mattmoran56 merged commit 6cf34f0 into main May 3, 2026
3 checks passed
@mattmoran56 mattmoran56 deleted the claude/add-colorful-theme-IW1IX branch May 3, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants