Fully automated TDD pipeline for Claude Code. Approve the plan once — come back to a ready-to-merge PR with green CI.
Three agents, seven reviewer phases, and file-system boundaries that make cheating much harder.
you
└─ reviewer (orchestrates everything)
├─ tester (writes tests BEFORE code exists)
└─ coder (writes code to pass tests it can't modify)
Claude Code writes tests that validate its own bugs. It rewrites existing tests to match broken code. It hallucinates API responses. It skips its own review step when you let it orchestrate. And halfway through a cycle, it forgets the rules you gave it at the start.
All tests green. Code broken. Agent insisting it's your fault.
These roles make cheating structurally impossible through file system boundaries, phase-loaded context, and forced evidence — not instructions the model can ignore.
mkdir my-awesome-project
mkdir my-awesome-project/.claude/
cp -r roles/ my-awesome-project/.claude/roles/
cp -r scripts/ my-awesome-project/scripts/
cd my-awesome-project
# Launch the reviewer — it sets up the project and runs everything else
claude --dangerously-skip-permissions --system-prompt "$(cat .claude/roles/reviewer.md)"
# The reviewer will ask what kind of project you want (React, Next.js, Express, etc.)
# It scaffolds the project, installs deps, verifies lint/build/test pass.
# Then describe your feature. Approve the plan. Walk away.
# Come back to a ready-to-merge PR with green CI.my-awesome-project/ is a ready-to-go React + TypeScript + Vitest project with the framework already wired up. Roles are in .claude/roles/, empty test/, prompts/, and scripts/ directories are in place, and plan.md has a simple first step.
cd my-awesome-projectRead claude.log to see the framework in action before setting it up in your own project.
- You describe a feature or bug to the reviewer
- Reviewer shows you the plan, waits for your OK (functional prompts only)
- Reviewer reads
reviewer-prompts-tester.md, writes tester prompt → launches tester - Reviewer reads
reviewer-review-tester.md, reviews tester's commit (tests will fail — no code yet, that's the point) - Reviewer reads
reviewer-prompts-coder.md, writes coder prompt → launches coder - Reviewer reads
reviewer-review-coder.md, reviews coder's commit → runs lint, build, test - Tests red → prompt to coder. No exceptions
- All green → reviewer reads
reviewer-acceptance.md, runs final checklist - Problems at acceptance → repeat from step 3 or 5
- You come back to a ready-to-merge PR
The reviewer splits work into functional prompts (func- prefix, need your approval) and operational prompts (op- prefix, handles autonomously during a cycle). You approve once, agent does the rest.
The reviewer role is split into seven files. Each one is loaded at exactly the moment it's needed — fresh context at the end of the window, not rules from 200 lines ago that the model has already forgotten.
| Cycle step | File loaded | Lines |
|---|---|---|
| Write tester prompt | reviewer-prompts-tester.md |
~140 |
| Review tester commit | reviewer-review-tester.md |
~55 |
| Write coder prompt | reviewer-prompts-coder.md |
~70 |
| Review coder commit | reviewer-review-coder.md |
~61 |
| Final acceptance | reviewer-acceptance.md |
~35 |
The core reviewer.md (~90 lines) is always present as the system prompt with short anchors for critical rules — so even if a phase file is skipped, the basics are visible.
| Rule | Why |
|---|---|
Tester writes only in test/ |
Can't adjust code to make bad tests pass |
Coder writes everywhere except test/ and roles |
Can't "fix" tests instead of fixing code |
| Existing tests are read-only | Prevents silent regression rewrites |
| Mocks only at HTTP boundary | Internal logic runs for real in tests |
| API types pasted in prompt | "Verify" means nothing — "show proof" works |
git show --name-only on every commit |
Boundary violations = auto-reject |
| Tests red → prompt to coder, no exceptions | Coder can't reframe failure as "tests are wrong" |
func-/op- prompt prefixes |
Forces classification before writing, not after |
roles/
reviewer.md # core identity, cycle, boundaries (~90 lines)
reviewer-init.md # project initialization (first run, no package.json)
reviewer-prompts-tester.md # scenario matrix, test levels, mock rules
reviewer-prompts-coder.md # code prompt format, regression rules
reviewer-review-tester.md # how to review tester's commit
reviewer-review-coder.md # how to review coder's commit
reviewer-acceptance.md # final acceptance checklist
tester.md # test agent role
coder.md # code agent role
Claude Code as a full dev team: autonomous TDD cycle from feature request to merged PR
MIT