A production-tested system of specialized agents, workflow skills, and enforcement hooks for Claude Code. Battle-tested on a real B2B SaaS product with 18 agents and 16 workflows.
Based on Anthropic's agent-building guidance:
- Start simple — Use
/quick-fixfor most things. Only escalate to agent pipelines when complexity demands it. - Specialized agents > one generalist — Each agent optimizes for its domain with fresh context.
- Context documents > context agents — Rich markdown files are faster and cheaper than read-only agents.
- Two-tier tracking — High-level stories for stakeholders, code-level tasks for Claude sessions.
- Enforce with hooks — Workflows without enforcement are just suggestions.
.claude/
agents/ # 10 specialized agents (architect, backend, frontend, etc.)
skills/ # 16 workflow skills (see Workflow Ladder below)
contexts/ # Context document pattern (README + template)
CLAUDE.md # Template project instructions
quick-reference.md # Cheat sheet for workflow selection
- Copy the
.claude/directory into your project root - Copy
CLAUDE.mdto your project root and customize it - Adapt agent definitions to your stack (the defaults assume Python + TypeScript)
- Create context documents for your bounded contexts / modules
- Set up MCP servers for the tools you want to use (see MCP Servers below)
From lightest to heaviest:
| Workflow | Scope | Direct Edits | Agents | When to Use |
|---|---|---|---|---|
/quick-fix |
1-2 files | Yes | Optional | "Fix this bug", small refactors |
/docs |
.md only | .md only | No | Documentation changes |
/explore |
Read-only | No | No | Research, investigation |
/preflight-check |
Read-only | No | No | Check if feature exists before building |
/code-review |
Read-only | No | Optional | Quick code review without tracking |
/panel-review |
Read-only | No | Optional | Code review with Panel Todo ticket creation |
/panel-fix |
Multi-file | Yes | Optional | Fix Panel Todo tickets from reviews |
/feature-design |
Read-only | No | Required | Feature vetting + architecture design |
/code-sweep |
Any | Structural only | Required | Review + fix quality in one pass |
/refactor |
Any | No | Required | Code cleanup, tech debt |
/fix |
Any | No | Required | Complex bugs needing investigation |
/dev |
Any | No | Required | Features, stories |
/batch-dev |
Any | No | Required | Parallel multi-session development |
/sprint-planning |
Read-only | No | Design only | Sprint prep (stories + execution tasks) |
/architect-review |
Read-only | No | Analysis only | Architecture analysis via Axon |
/auto-dev |
Any | Yes | No | Autonomous cron loop — picks tasks, implements, commits |
/pre-push |
Any | Yes | No | Pre-push validation |
The key insight: Most work is /quick-fix. Reserve agent pipelines for genuinely complex work. Use /auto-dev to grind through a backlog of well-specified tasks overnight.
This is the pattern that changed how we work with Claude Code.
Ticket trackers (Jira, Linear, GitHub Issues) are designed for humans and stakeholders. They track what the project needs: "Build tour sharing feature", "Fix checkout bug". But Claude sessions need something different — they need to know what the code needs: "Add sharing_token field", "Create POST /share endpoint", "Write ShareButton component".
Mixing these creates noise in both directions. Stakeholders don't want to see 15 implementation subtasks. Claude sessions don't want to parse acceptance criteria meant for humans.
TIER 1: Your Ticket Tracker (Jira, Linear, GitHub Issues)
═══════════════════════════════════════════════════════════
Stakeholder-visible
Epics, stories, acceptance criteria
"Build tour sharing feature" (PROJ-100)
↓ /sprint-planning bridges the gap ↓
TIER 2: Panel Todo (MCP server for Claude Code)
═══════════════════════════════════════════════════════════
Developer/Claude-visible
Code-level tasks with dependencies
"Add sharing_token field" (PT-1)
"Create POST /share endpoint" (PT-2, blocked_by: PT-1)
"Write ShareButton component" (PT-3, blocked_by: PT-2)
-
/sprint-planning— Create stories in your ticket tracker, then bridge to Panel Todo:- One story → 2-5 code-level tasks
- Include ticket key in task description for traceability
- Set
blocked_byfor dependency ordering
-
/batch-dev— Multiple Claude sessions work the Panel Todo sprint:- Each session queries for unassigned, unblocked tasks
- Claims a task (sets
assignee) - Invokes the appropriate agent
- Completes the task → unblocks downstream tasks
-
/panel-review→/panel-fix— Code review loop:- Review creates Panel Todo sprint with fix tickets
- Each
/panel-fixloads a ticket, implements the fix, marks done
| Field | Purpose |
|---|---|
title |
Code-level task description |
description |
Scope, affected files, ticket tracker reference |
priority |
critical / high / medium / low |
tags |
Track grouping (track-backend, track-frontend) |
blocked_by |
Dependency on other tasks |
assignee |
Which Claude session claimed this |
isBlocked |
Computed: true if any blocked_by is incomplete |
Panel Todo is a local MCP server — github.com/ingimareyfjord/panel-todo. Works with any Claude Code project.
For projects with complex module boundaries, the /architect-review workflow uses the Axon codebase knowledge graph to answer questions that grep can't:
- Blast radius: "If I change this model, what breaks?"
- Cross-module dependencies: "Which modules import from each other?"
- Dead code detection: "What's unused after this refactor?"
- Fan-in hotspots: "What are the most depended-on symbols?"
| Mode | When | What It Does |
|---|---|---|
| Design | Before dev | Blast radius, dependency mapping, story ordering, risk assessment |
| Review | After dev | Side effect detection, dead code delta, boundary violation check |
/architect-review design <- "If we add sharing to tours, what's affected?"
| Axon finds: Tour, TourSerializer, tour_list.tsx, 3 test files
| Risk: Medium (2 cross-module imports)
| Recommended order: model -> API -> frontend -> tests
|
/sprint-planning <- organize findings into sprint
|
/dev (implement stories)
|
/architect-review review <- "Did we break anything? New dead code?"
| Verdict: PASS WITH NOTES (1 unused import)
|
/pre-push <- build, lint, test, push
/feature-design "tour sharing" <- should we build? how?
|
/sprint-planning <- stories + Panel Todo tasks
|
/batch-dev <- parallel sessions implement
|
/code-sweep "sharing module" <- fix structural quality
|
/pre-push <- validate and push
/architect-review design <- blast radius + story ordering
|
/dev or /batch-dev <- implement
|
/architect-review review <- verify no side effects
|
/pre-push
/panel-review "payment module" <- creates fix tickets
|
/panel-fix PT-1 <- fix each ticket
/panel-fix PT-2
/panel-fix PT-3
|
/pre-push
/dev -> /pre-push -> push
/quick-fix (clear, 1-2 files) -> /pre-push -> push
/fix (complex, needs investigation) -> /pre-push -> push
/loop 20m /auto-dev <- start cron loop
|
|-- Every 20 minutes:
| 1. Query ticket tracker for tasks labeled "auto-ok"
| 2. Claim first available (atomic transition to "In Progress")
| 3. Safety check (scope, sensitivity, clear criteria)
| 4. Implement directly (no plan mode, no agents)
| 5. Run tests
| 6. Commit (never push)
| 7. Update ticket with results
| 8. STOP — wait for next cycle
|
|-- On failure:
Comment on ticket, remove label, move on
The most opinionated workflow in the system. Claude picks up tasks from your ticket tracker and implements them without human approval gates.
You label tickets with auto-ok when the spec is clear enough that a developer wouldn't need to ask questions. Claude runs on a timer, picks the next one, implements it, runs tests, commits (never pushes), and comments on the ticket with results. You review the commits when you're ready.
┌─────────────────────────────────────────────────────┐
│ /auto-dev (one cycle) │
│ │
│ 1. PICK & CLAIM │
│ Query: status="To Do" AND labels="auto-ok" │
│ Fetch 5 candidates (batch for contention) │
│ Claim via status transition (atomic lock) │
│ If claimed by another session → try next │
│ │
│ 2. SAFETY CHECK │
│ ✓ Clear acceptance criteria? │
│ ✓ Not billing/finance/GDPR/auth? │
│ ✓ No data-destructive migrations? │
│ ✓ No breaking API changes? │
│ ✗ Any fail → comment, remove label, STOP │
│ │
│ 3. PREFLIGHT │
│ Read existing code, check for conflicts │
│ Verify clean working tree │
│ │
│ 4. IMPLEMENT │
│ Direct edits (no plan mode, no agents) │
│ Follow existing patterns │
│ │
│ 5. VERIFY │
│ Run tests (max 2 retries on failure) │
│ │
│ 6. COMMIT (never push) │
│ Conventional commit message + ticket reference │
│ │
│ 7. COMPLETE │
│ Comment on ticket: files, tests, commit hash │
│ Transition to Done │
│ STOP — wait for next cron fire │
└─────────────────────────────────────────────────────┘
The loop is designed to fail safely. When anything unexpected happens, it stops and comments:
| Situation | Action |
|---|---|
| No tasks available | Stop, do nothing |
| Task is bigger than expected | Stop, comment, remove auto-ok label |
| Needs design decision | Stop, comment, add needs-input label |
| Tests fail after 2 retries | Commit partial, comment, leave In Progress |
| Touches sensitive code | Stop, comment, remove auto-ok label |
| Build/test infra down | Stop, retry next cycle |
| All tasks claimed by other sessions | Stop (normal in multi-session) |
Multiple /auto-dev loops can run concurrently. The ticket tracker status transition acts as an atomic claim — only one session can move a ticket from "To Do" to "In Progress". If another session claimed it between your query and your transition attempt, you skip to the next candidate.
A task gets auto-ok when it meets ALL of these:
- Clear acceptance criteria (testable)
- The "what" is fully specified (not just the "why")
- No design decisions needed
- Not in a sensitive domain (billing, auth, GDPR)
- No data-destructive migrations
Use Claude Code's /loop command to schedule:
/loop 20m /auto-dev # Every 20 minutes
/loop 30m /auto-dev # Every 30 minutes
/loop 1h /auto-dev # Every hour
The cron job only fires while the session is idle. If you're talking to Claude, the cycle waits.
Why no plan mode? Tasks are pre-planned by their ticket description. The auto-ok label IS the plan approval.
Why no agents? Speed and simplicity. Agent chains add overhead that's unnecessary for well-specified tasks.
Why never push? Human always reviews before code reaches the remote. This is the safety net — Claude can implement all day, but nothing ships without your approval.
Why remove the label on escalation? Prevents the loop from retrying the same stuck task every cycle.
Why batch-fetch 5 candidates? In multi-session setups, the first candidate might be claimed by another session between your query and your claim attempt. Fetching 5 gives you fallbacks without additional API calls.
| Agent | Model | Role |
|---|---|---|
arch-planner |
opus | Architecture design, ADRs, API contracts |
backend-engineer |
opus | Backend implementation (models, services, APIs) |
frontend-engineer |
opus | Frontend implementation (components, pages, state) |
test-engineer |
opus | Test automation (unit, integration, E2E) |
code-reviewer |
sonnet | Code review with severity classification |
code-refactorer |
sonnet | Code cleanup, complexity reduction |
api-contract-guardian |
haiku | API contract verification (backend <-> frontend types) |
security-reviewer |
opus | Security audit, OWASP, data protection |
infra-ops |
opus | Docker, CI/CD, deployment |
design-reviewer |
sonnet | UI/UX review, accessibility, responsiveness |
- opus — Complex decision-making: architecture, implementation, testing
- sonnet — Balanced quality/speed: reviews, refactoring
- haiku — Fast/cheap: contract checks, lightweight validation
| Work Type | Chain |
|---|---|
| Backend feature | arch-planner -> backend-engineer -> test-engineer -> api-contract-guardian |
| Frontend feature | frontend-engineer -> test-engineer -> api-contract-guardian -> design-reviewer |
| Full-stack | arch-planner -> backend-engineer -> frontend-engineer -> test-engineer |
| Bug fix | backend-engineer or frontend-engineer -> test-engineer |
| Feature Design | product-strategy-advisor -> arch-planner |
| Code Sweep | code-reviewer -> code-refactorer |
| Code cleanup | code-reviewer -> code-refactorer -> test-engineer |
| Architecture | arch-planner + architecture-tester + product-strategy-advisor |
| Infrastructure | infra-ops -> arch-planner |
The workflow system integrates with these MCP servers:
| Server | Purpose | Used By |
|---|---|---|
| Panel Todo | Code-level task tracking with sprints, dependencies, and multi-session coordination | /panel-review, /panel-fix, /batch-dev, /sprint-planning, /code-sweep |
| Axon | Codebase knowledge graph (KuzuDB) for blast radius, dead code, cross-module analysis | /architect-review |
| mcp-atlassian | Jira + Confluence integration for stories, sprints, documentation | /sprint-planning, /dev, /fix |
| shadcn | UI component discovery and installation | frontend-engineer agent |
| Playwright | Browser automation for visual design review | design-reviewer agent |
| Context7 | Library documentation lookup | /explore |
All are optional — the workflow system degrades gracefully. Without Panel Todo, use /code-review instead of /panel-review. Without Axon, skip /architect-review and use /feature-design for pre-dev analysis. Without Jira, use Panel Todo alone for all tracking.
Instead of read-only "expert" agents, use rich markdown files:
.claude/contexts/
payments-context.md # Models, services, API endpoints, gotchas
auth-context.md # Auth flow, token lifecycle, providers
orders-context.md # Order models, state machine, webhooks
Each context document contains:
- Key models with fields and relationships
- Service/selector file locations
- API endpoints
- Cross-module dependencies
- Common gotchas and traps
Agents read these documents before implementing. This is faster and cheaper than invoking a separate agent for domain knowledge.
The default agents assume Python (Django/FastAPI) + TypeScript (React/Next.js). To adapt:
backend-engineer.md— Change framework references (Django -> Rails, FastAPI, Express, etc.)frontend-engineer.md— Change framework references (Next.js -> Vue, Svelte, etc.)- Skills — Update file patterns in enforcement rules (.py -> .rb, .go, etc.)
- Context documents — Create for your bounded contexts / modules
For complex domains (billing, compliance, ML), create specialized agents:
---
name: billing-expert
description: Billing domain specialist. Invoke for subscription, payment, or invoicing work.
model: sonnet
tools: Read, Write, Edit, Glob, Grep, Bash
---
You are the billing domain specialist...The hooks/ directory contains a workflow enforcement script. Configure in .claude/settings.json:
{
"hooks": {
"Stop": [
{
"type": "command",
"command": "bash .claude/hooks/workflow-guard.sh"
}
]
}
}When Claude orchestrates agents, it writes the plan and the agents implement. If Claude also implemented, it would be both architect and builder — losing the benefit of fresh context and specialization.
For small, clear fixes, the overhead of spinning up an agent is worse than the benefit. A senior dev fixing a bug doesn't need a code review committee.
Jira is great for stakeholders but terrible for Claude session coordination. Panel Todo is great for code-level task claiming but has no stakeholder visibility. The bridge pattern (/sprint-planning) connects them: one Jira story becomes 2-5 Panel Todo tasks. Stakeholders see progress at the story level. Claude sessions see granular, dependency-ordered tasks at the code level.
Most teams jump straight from "feature idea" to "implement it". /feature-design adds a strategic gate: does this feature make sense? What's the architecture? Two agents (product-strategy-advisor for strategy, arch-planner for architecture) answer these questions before any code gets written. If the recommendation is KILL, you save days of wasted work.
The /code-review -> /quick-fix loop works but is tedious for quality passes: review finds 8 issues, you fix them one by one. /code-sweep collapses this by distinguishing structural issues (HOW code is organized — fixable by code-refactorer in bulk) from behavioral issues (WHAT code does — ticketed for careful individual fixes). One pass, two outcomes.
grep can find text. Axon can answer "if I change this model, what services, views, serializers, and tests need updating?" It builds a graph of your codebase's symbols and their relationships, enabling blast radius analysis that would take dozens of grep commands to approximate.
Read-only agents that can only answer questions are expensive documentation lookups. A markdown file:
- Loads instantly (no agent startup)
- Costs zero tokens for the agent invocation overhead
- Can be enriched with real dependency data (from tools like Axon)
- Is version-controlled and diffable
Not every agent needs opus. A contract check (comparing serializer fields to TypeScript types) is mechanical — haiku handles it perfectly at 1/10th the cost. Save opus for agents that need to make complex architectural decisions.
Developed by Ingimar Eyfjord while building Guide Connect, refined through months of real production use with Claude Code.
Architecture patterns aligned with Anthropic's Building Effective Agents guidance.