MaxsimCLI — Project Specification

Single Source of Truth for what MaxsimCLI is, how it works, and what it should become. Every architectural decision, feature, and constraint is defined here.

1. Identity

Field	Value
Name	MaxsimCLI
Meaning	MAXimale SIMplicity
npm package	`maxsimcli`
Command prefix	`/maxsim:`
Repository	`https://github.com/maystudios/maxsimcli`
Website	`maxsimcli.dev` (Landing Page + Documentation)
License	MIT

2. What MaxsimCLI Is

MaxsimCLI is a meta-prompting and project orchestration system for Claude Code. It installs into any project via npx maxsimcli@latest and transforms Claude Code from an ad-hoc coding assistant into a structured, self-improving project management engine.

MaxsimCLI solves three problems simultaneously:

Context Loss — Without MaxsimCLI, Claude forgets project goals, decisions, and progress across sessions. MaxsimCLI persists everything on GitHub as the single source of truth.
Lack of Structure — Without MaxsimCLI, large projects devolve into unstructured, untracked work. MaxsimCLI enforces a Plan → Execute → Verify cycle with phases, milestones, and roadmaps.
Quality Control — Without MaxsimCLI, code is produced without systematic verification. MaxsimCLI enforces strict quality gates with automated testing, linting, spec compliance, and code review.

Genealogy

MaxsimCLI is an independent project inspired by two predecessors:

Get-Shit-Done (GSD) — Provided the project planning model (phases, milestones, roadmaps, verification).
Superpowers — Provided the feedback loop and self-improvement philosophy.

MaxsimCLI is not a fork of either. It combines the best of both, extends them with GitHub-native orchestration and massive parallelism, and follows Anthropic's own conventions exactly.

3. Target Audience

All Claude Code users — from beginners to power users. The system is simple to install (one command) and progressive in complexity: beginners use /maxsim:go and let the system handle everything; power users configure profiles, skills, and parallel execution strategies.

4. Core Principles

GitHub is the Single Source of Truth — All project state, plans, tasks, progress, decisions, and learnings live on GitHub (Issues, Projects, Milestones, Wiki, Discussions). Local files are only for MaxsimCLI's own installation (.claude/).
Maximum Parallelism — Two-tier hybrid: Subagents (Tier 1, default) for independent tasks, Agent Teams (Tier 2, opt-in) for workflows requiring inter-agent communication. Scaled by model profile (budget: 5–10, balanced: 10–20, quality: 30–40) and project size. Competitive implementation available as an optional strategy. Graceful degradation to Tier 1 when Agent Teams are unavailable.
Full Automation — Commits, merges, pushes, branch management, verification, and error recovery happen automatically. The user is only involved at plan approval gates and when unrecoverable errors occur.
Self-Improvement — MaxsimCLI learns from every session. Skills, prompts, configurations, and workflows improve over time through a structured feedback loop.
Anthropic Conformity — Every skill, command, hook, and agent follows Anthropic's documented conventions exactly. Correct tool names (Agent, not Task), correct frontmatter format, correct skill structure.
Plan Before Execute — Every action that modifies code, GitHub state, or project configuration goes through Claude Code's Plan Mode first. The user always sees and approves what will happen before any code is written. Read-only commands (help, progress) are exempt.

5. Architecture

5.1 Runtime

Only Claude Code — No multi-runtime support. MaxsimCLI is 100% Claude Code focused.
Node.js >=22 — Required runtime for the CLI binary.
GitHub CLI (gh) — Required for all GitHub operations. If not authenticated, MaxsimCLI refuses to start.

5.2 Installation

npx maxsimcli@latest

One command. Installs project-locally into .claude/. No global installation.

What gets installed:

.claude/
├── settings.json          # Claude Code settings (hooks, permissions, env)
├── commands/maxsim/       # 14 slash commands (13 primary + 1 alias)
├── agents/                # 4 agent definitions + AGENTS.md registry
├── skills/                # 16 skill modules
├── rules/                 # Conventions + verification protocol
├── maxsim/
│   ├── bin/maxsim-tools.cjs  # Internal CLI helper
│   ├── hooks/             # Hook scripts (statusline, update-check, sounds)
│   ├── workflows/         # Workflow definitions
│   ├── references/        # Reference documents
│   └── templates/         # Output templates
└── agent-memory/          # Per-agent persistent memory (auto-created)

What does NOT exist:

No .planning/ directory — all planning lives on GitHub
No local STATE.md, ROADMAP.md, PLAN.md files — GitHub is the source of truth
No global ~/.claude/maxsim/ installation — everything is project-local

5.3 GitHub Integration (Mandatory)

GitHub is not optional. MaxsimCLI requires:

GitHub Feature	Purpose
Repository	Code storage. If none exists, MaxsimCLI offers to create a private repo.
GitHub Projects (v2)	Visual project board. Kanban: Backlog → To Do → In Progress → In Review → Done
GitHub Issues	Source of truth for phases, tasks, plans, and context
Sub-Issues	Tasks within a phase (sub-issues of the phase issue)
GitHub Milestones	Group phases into deliverable milestones
Labels	Categorize issues — 6 labels in 2 namespaces: `type:` (phase, task, bug, quick) and `maxsim:` (auto, user)
Issue Relations	Native GitHub "blocked by" / "blocking" for dependency tracking
Issue Comments	Store plans, research, context, summaries as structured comments
GitHub Wiki	Project specifications, requirements, architectural decisions, conventions — long-lived reference documents (vs. Issues for active tasks)
GitHub Discussions	Architecture decisions, design proposals

User-created Issues: Users can write GitHub Issues directly. MaxsimCLI recognizes them and integrates them into the planning/execution pipeline.

5.4 Local Files

Only .claude/ exists locally. Additionally:

CLAUDE.md in project root — Auto-generated during install. Contains a full command reference table with Quick Start pointing to /maxsim:go. Claude Code reads this automatically at session start.
.gitignore in project root — Install appends two entries (.claude/agent-memory/ and autoresearch-results.tsv) to keep per-machine agent memory and metric data out of version control. If .gitignore does not exist, it is created.
No other MaxsimCLI files in the project root or anywhere outside .claude/.

5.5 State Tracking

The project state IS the GitHub Project Board:

Which column an issue is in = its status (Backlog → To Do → In Progress → In Review → Done)
Open/closed issues = progress
Milestone completion percentage = roadmap progress
Issue comments = plans, research, context, summaries
Issue labels = type categorization (type:phase, type:task, type:bug, type:quick) and origin (maxsim:auto, maxsim:user)
Issue relations = dependency tracking (native GitHub "blocked by" / "blocking")

No local state file. No sync mechanism needed. No project-state cache — GitHub is always authoritative. A lightweight update-check cache (os.tmpdir()/.maxsimcli-update-cache.json, 1-hour TTL) avoids redundant npm registry calls; this is ephemeral utility data in the OS temp directory, not project state.

5.6 Multi-Project Isolation

Each project is completely isolated:

Own .claude/ directory
Own GitHub Project Board
Own agent memory (.claude/agent-memory/)
No cross-project interference
No shared global state

6. Commands

MaxsimCLI provides 14 slash commands (13 primary + 1 alias). /maxsim:go is the primary interface.

6.1 Command List

Command	Purpose	Category
`/maxsim:go`	Auto-dispatch — Detects project state and does the right thing	Primary
`/maxsim:init`	Initialize MaxsimCLI in a project	Setup
`/maxsim:plan [N]`	Plan a specific phase	Phase
`/maxsim:execute [N]`	Execute a specific phase	Phase
`/maxsim:debug [desc]`	Debug a specific issue	Explicit
`/maxsim:quick [desc]`	Quick task (simplified flow)	Shortcut
`/maxsim:progress`	Show project status + recommendation	Info
`/maxsim:settings`	Configure MaxsimCLI	Config
`/maxsim:help`	Show available commands	Info
`/maxsim:improve`	Autonomous optimization loop — modify→verify→keep/discard cycle against any metric	Optimization
`/maxsim:fix-loop`	Autonomous error repair — Iteratively fix until zero errors remain	Optimization
`/maxsim:debug-loop`	Autonomous bug hunting — Scientific method with hypothesis testing	Optimization
`/maxsim:security`	Security audit — STRIDE + OWASP + red-team analysis (read-only)	Audit
`/maxsim:execute-phase [N]`	Alias for `/maxsim:execute`	Phase

6.2 `/maxsim:go` — The Main Command

Auto-dispatch is the primary way users interact with MaxsimCLI. It:

Reads the GitHub Project Board
Determines the current state (what's planned, what's in progress, what's blocked)
Proposes the next action
Enters Plan Mode for user approval
Executes the approved action
Reports results

6.3 `/maxsim:init` — Project Initialization

Interactive process:

Scan — Analyze existing repo (if any): README, package.json, tech stack, file structure. Use parallel Research agents (count scaled by model profile and project size — see §7.4).
Interview — Deep questioning: project name, description, goals, tech stack, conventions, testing strategy, deployment, acceptance criteria, no-gos, risks.
GitHub Setup — Create/configure: GitHub repo (if none, offer to create private), GitHub Project Board (Kanban), Labels, Milestones.
CLAUDE.md — Generate project-root CLAUDE.md with brief context.
Roadmap (optional) — Ask user if they want an initial roadmap created as GitHub Milestones + Phase Issues.

For brownfield projects (existing code): Use parallel agent scanning (count determined by model profile and codebase size — see §7.4) to map the codebase, identify goals/patterns, then confirm with user before creating the GitHub structure.

6.4 `/maxsim:plan [N]`

Plans a specific phase:

Enter Plan Mode
Read phase issue from GitHub
Discussion stage — gather context
Research stage — parallel research agents investigate
Planning stage — create task breakdown as sub-issues
User approves plan via ExitPlanMode

6.5 `/maxsim:execute [N]`

Executes a planned phase:

Enter Plan Mode — show all plans for review
User approves via ExitPlanMode
Spawn executor agents in adaptive waves
Each executor works in its own git worktree
Competitive implementation (optional): if enabled or user-approved, same task solved multiple ways, best selected
Automatic verification after each task
Max 3 retries on failure
Merge verified worktrees sequentially, auto-resolve conflicts, verify merged result
Push to remote

6.6 `/maxsim:debug [desc]`

Dedicated debugging:

Auto-detected by /maxsim:go when issues exist
Also callable directly
Uses systematic-debugging skill (reproduce → hypothesize → isolate → verify → fix → resolve)

6.7 `/maxsim:quick [desc]`

Simplified flow for small tasks:

Creates a single GitHub Issue
Plans and executes in one flow
No multi-phase overhead

6.8 `/maxsim:progress`

Shows:

GitHub Project Board status table (phases, tasks, columns)
Gap detection (blocked, overdue, or missing tasks)
Next-action recommendation with the exact command to run

6.9 Behavior Without a Command

When a user opens Claude Code and describes a task without using /maxsim:, Claude sees the auto-generated CLAUDE.md which contains a full command reference table with a Quick Start note pointing to /maxsim:go. Claude works normally but is aware of all available MaxsimCLI commands.

7. Agent System

7.1 Agent Types

Agent	Role	Tools	Preloaded Skills	Available Skills
Executor	Implements code changes	Read, Write, Edit, Bash, Grep, Glob	handoff-contract, commit-conventions	github-operations (trigger: GitHub Issues), tdd (trigger: test-first)
Planner	Creates plans and task breakdowns	Read, Write, Bash, Grep, Glob (permissionMode: plan)	handoff-contract, roadmap-writing	github-operations (trigger: GitHub Issues), brainstorming (trigger: exploring approaches)
Researcher	Investigates codebase and external sources	Read, Bash, Grep, Glob, WebFetch, WebSearch	handoff-contract, research	github-operations (trigger: GitHub Issues)
Verifier	Reviews and verifies completed work	Read, Bash, Grep, Glob	handoff-contract, verification, code-review	systematic-debugging (trigger: test failures), github-operations (trigger: posting results)

Available Skills + Trigger Pattern: Each agent has a set of available_skills that Claude Code loads on-demand via semantic matching when trigger conditions are met. Unlike preloaded skills (always present in context), available skills are only injected when the agent's task context matches the trigger — keeping the context window lean while ensuring specialized capabilities are accessible when needed.

7.2 Parallelism Strategy

Two-tier hybrid: Subagents (default) + Agent Teams (opt-in)

Research completed 2026-03-24. Full findings: docs/spec/agent-teams-research.md Official docs: https://code.claude.com/docs/en/agent-teams

Tier 1 — Subagents (Default)

For parallel execution of independent tasks. This is MaxsimCLI's primary execution mechanism.

Uses the Agent tool with isolation: "worktree" and run_in_background: true
Follows Anthropic's batch pattern: all agents spawned in a single message block
Each subagent gets a self-contained prompt with full context (no shared state)
Results return to the coordinator; subagents cannot communicate with each other
Cost: ~2x a single session for 3 workers — token-efficient
Works on all platforms, all plans, all terminals

Used for:

/maxsim:execute — parallel phase execution (independent tasks)
/maxsim:init — parallel codebase scanning (read-only, report back)
/maxsim:plan — parallel research gathering (when no cross-checking needed)
Any workflow where tasks are independent and only the result matters

Tier 2 — Agent Teams (Opt-in)

For workflows that genuinely require inter-agent communication, shared task lists, and peer-to-peer messaging.

Experimental feature (since Feb 2026, Claude Code v2.1.32+)
Requires CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 (set by MaxsimCLI installer)
Each teammate is a fully independent Claude Code session with its own context window
Teammates share a task list (~/.claude/tasks/{team-name}/) with auto-dependency-unblocking
Peer-to-peer messaging via SendMessage — teammates can challenge each other's findings
Lead creates team, spawns teammates, synthesizes results
Teammates do NOT inherit the lead's conversation history — spawn prompt must contain full context
Cost: ~4-7x a single session — significantly more expensive
Display: in-process mode (any terminal) or split-pane mode (tmux/iTerm2 only, not Windows Terminal)

Used for:

Competitive implementation with debate — 2-3 agents solve the same problem, actively disprove each other
Multi-reviewer code review — security + performance + test-coverage reviewers share findings
Competing hypothesis debugging — agents investigate different root causes, debate like scientists
Cross-layer feature work — frontend + backend + tests, each owned by a different teammate
Architecture decisions — UX + architecture + devil's advocate explore a design

Not used for: Sequential tasks, same-file edits, simple focused work, budget-constrained workflows.

Tier Selection Logic

MaxsimCLI chooses the tier automatically based on the workflow:

Workflow	Tier	Reason
Phase execution (independent tasks)	Tier 1 (Subagents)	Tasks don't need to communicate
Codebase scanning	Tier 1 (Subagents)	Read-only, report back
Research gathering	Tier 1 (Subagents)	Collect and report
Competitive implementation	Tier 2 (Agent Teams)	Agents need to debate
Multi-dimensional code review	Tier 2 (Agent Teams)	Findings need cross-checking
Collaborative debugging	Tier 2 (Agent Teams)	Hypotheses need adversarial testing
Architecture exploration	Tier 2 (Agent Teams)	Requires discussion

Graceful degradation: If Agent Teams are unavailable (env var not set, unsupported plan, or feature not yet stable), MaxsimCLI falls back to Tier 1 subagents for all workflows. The user is informed but not blocked.

Agent Teams Architecture (Reference)

Component	Role	Storage
Team lead	Creates team, spawns teammates, coordinates	Main session
Teammates	Independent Claude Code instances	`~/.claude/teams/{team-name}/config.json`
Task list	Shared work items with dependency tracking	`~/.claude/tasks/{team-name}/{id}.json`
Mailbox	Per-agent message queues	`~/.claude/teams/{team-name}/inboxes/{name}.json`

Key Agent Teams constraints:

One team per session, no nested teams
Lead is fixed (no promotion/transfer)
Teammates load CLAUDE.md + MCP + skills at spawn, but NOT lead's conversation history
3-5 teammates recommended, 5-6 tasks per teammate
File locking prevents race conditions on task claiming
Avoid two teammates editing the same file (causes overwrites)

Agent Teams Quality Gates

Two hooks enable automatic quality enforcement:

Hook	Fires When	Exit Code 2 Effect
`TeammateIdle`	Teammate about to go idle	Keeps teammate working; stderr becomes feedback
`TaskCompleted`	Task being marked complete	Blocks completion; stderr becomes feedback

Example: A TaskCompleted hook that runs npm test before allowing task completion — if tests fail, the teammate receives the failure output and continues fixing.

Competitive Implementation (Optional)

The same task can be assigned to 2–3 agents simultaneously. Each works independently. The verifier picks the best implementation. Not enabled by default. Activated either by user approval during planning or automatically for tasks marked as critical.

In Tier 2 mode, competitive implementation uses the Agent Teams debate pattern: agents actively try to disprove each other's approaches, and the theory/implementation that survives adversarial cross-examination wins. This fights LLM anchoring bias (first plausible answer wins).

Rationale: a single high-quality result from competitive implementation can save more tokens than multiple retry cycles.

7.3 Worktrees

Every executor agent works in its own git worktree. Always. No exceptions.

Uses Claude Code's native worktree mechanism: .claude/worktrees/agent-{id}/
Own branch per worktree
Merged back after verification
Sequential merge order to minimize conflicts
Auto-resolve where possible, verifier checks merged result

7.4 Model Configuration

Profiles define default models per agent type:

Profile	Planner	Executor	Researcher	Verifier
quality	opus	opus	sonnet	opus
balanced (default)	opus	sonnet	sonnet	sonnet
budget	sonnet	sonnet	haiku	sonnet

Profiles are configurable via /maxsim:settings
Individual agent overrides possible
Claude can autonomously choose a different model when justified (e.g., Haiku for simple file listing, Opus with extended thinking for complex architecture)

Parallelism limits per profile (scaled dynamically by project size):

Profile	Max Agents	Typical Range
quality	40	20–40
balanced (default)	20	10–20
budget	10	5–10

Small projects (< 10 files) use fewer agents regardless of profile. The exact count is determined dynamically based on codebase size, task complexity, and profile limits.

8. Plan Mode Integration

Every MaxsimCLI command that modifies code, GitHub state, or project configuration starts in Plan Mode. This ensures the user always sees and approves what will happen before any changes.

Plan Mode per command:

Command	Plan Mode	Reason
`/maxsim:go`	Yes	Proposes modifying actions
`/maxsim:init`	Yes	Creates GitHub resources
`/maxsim:plan [N]`	Yes	Creates sub-issues
`/maxsim:execute [N]`	Yes	Writes code
`/maxsim:quick [desc]`	Yes	Creates issue + code
`/maxsim:improve`	Yes	Modifies code autonomously
`/maxsim:fix-loop`	Yes	Repairs code autonomously
`/maxsim:debug-loop`	Yes	May modify code
`/maxsim:debug [desc]`	Yes	Shows debugging plan + fix approach for approval before executing fix
`/maxsim:settings`	Yes	Shows current config for review before writing changes
`/maxsim:security`	No	Read-only audit
`/maxsim:progress`	No	Read-only status display
`/maxsim:help`	No	Read-only text display

8.1 Plan → Approve → Execute Flow

Follows the same pattern as Claude Code's /batch skill:

Command invoked (e.g., /maxsim:execute)
MaxsimCLI enters Plan Mode (EnterPlanMode) — read-only research begins
Explore/Research agents analyze codebase (read-only tools only)
Plan written to plan file, presented to user via ExitPlanMode
User reviews plan — can edit via Ctrl+G before approving
On approval: Plan Mode exits, execution begins with full permissions
On rejection: stays in Plan Mode, agent revises based on feedback

8.2 How Plan Mode Works Internally

Plan Mode is prompt-based, not tool-enforcement-based. A <system-reminder> is injected that instructs Claude not to use write/execute tools. The restricted tools (Write, Edit, Bash) remain technically callable — enforcement relies on the LLM following instructions.

Only ExitPlanMode has real UI enforcement — it requires an actual user approval dialog before returning.

Tools available in Plan Mode:

Full read access: Read, Glob, Grep, LS, WebSearch, WebFetch
Task management: TodoRead, TodoWrite
User interaction: AskUserQuestion (for clarifying requirements, NOT for plan approval)
Subagent spawning: Explore agents (read-only)
Plan file: Write/Edit allowed ONLY for the plan file

8.3 Two Plan Mode Mechanisms

Mechanism	`permissionMode: plan`	`EnterPlanMode` tool
Set by	Frontmatter / CLI flag / SDK	The agent itself, mid-session
Scope	Entire session from start	From the point the tool is called
User consent	No — imposed by configuration	Yes — requires user approval
Use case	Planner agent definition	MaxsimCLI workflow commands

Planner agent has permissionMode: plan in its frontmatter — enforcing read-only operation for the entire agent session. This is used when MaxsimCLI spawns a dedicated Planner subagent.

Workflow commands use EnterPlanMode / ExitPlanMode dynamically — the main session enters plan mode, researches, presents the plan, gets approval, then exits plan mode and executes.

9. Skills

MaxsimCLI ships with 16 skills, following Anthropic's skill conventions exactly.

9.1 Skill Format

Every skill follows this structure:

---
name: skill-name          # kebab-case, matches folder name
description: What it does. Use when [trigger conditions].
---

# Skill Title

[Body: max 500 lines, structured instructions]

YAML frontmatter with name and description (required)
Third-person descriptions
No @ imports (use plain path references)
Heavy content in references/ subdirectory
Loaded on-demand by Claude Code's semantic matching

9.2 Skill Inventory

#	Skill	Type	Purpose
1	`tdd`	Technique	Test-Driven Development (red-green-refactor cycle)
2	`systematic-debugging`	Technique	Reproduce → Hypothesize → Isolate → Verify → Fix → Confirm
3	`brainstorming`	Technique	Multi-approach design exploration before implementation
4	`roadmap-writing`	Technique	Phase planning with dependencies and success criteria
5	`handoff-contract`	Infrastructure	Standard output format for all agent results
6	`commit-conventions`	Infrastructure	Conventional commits, atomic changes, co-author attribution
7	`maxsim-batch`	Technique	Parallel execution orchestration — Tier 1 (subagent batch) + Tier 2 (Agent Teams) selection
8	`code-review`	Technique	Security, quality, spec-compliance review
9	`verification`	Infrastructure	MERGED from: verification-before-completion + evidence-collection + verification-gates. Single authoritative verification skill with gate framework, evidence blocks, anti-rationalization enforcement.
10	`github-operations`	Infrastructure	MERGED from: github-artifact-protocol + github-tools-guide. Unified GitHub interaction: artifact types, comment conventions, CLI commands, lifecycle state machine.
11	`research`	Technique	MERGED from: research-methodology + tool-priority-guide. Systematic investigation with source hierarchy and Claude Code tool priority.
12	`project-memory`	Infrastructure	NEW — GitHub-native persistence for project learnings, decisions, and patterns.
13	`using-maxsim`	User-facing	Command reference and routing table. Updated for v6 commands.
14	`maxsim-simplify`	Technique	Code simplification, dead code removal, reuse improvement.
15	`autoresearch`	Technique	Autonomous optimization loop with reference workflows (loop-protocol, debug, fix, security, results-logging, core-principles). Powers `/maxsim:improve`, `/maxsim:fix-loop`, `/maxsim:debug-loop`, `/maxsim:security`.
16	`agent-teams`	Infrastructure	Tier 2 Agent Teams coordination: TeamCreate, SendMessage, competitive implementation, multi-reviewer, collaborative debugging patterns.

9.3 Skill Loading

Skills are auto-loaded by Claude Code based on semantic description matching
Agent prompts mention recommended skills (e.g., "prefer using the tdd skill")
Users can request specific skills during init (e.g., "use the UX-Pro skill")
Skills can invoke other skills via the Skill tool

10. Verification System

10.1 Philosophy

Verification is automatic, strict, and evidence-based. No completion claims without fresh verification evidence.

10.2 What Gets Checked

After every task execution:

Check	Tool	Required
Tests pass	Test runner (jest, vitest, pytest, etc.)	Yes
Build succeeds	Build tool (tsc, vite, etc.)	Yes
Lint clean	Linter (biome, eslint, etc.)	Yes
Spec compliance	Verify planned tasks were implemented	Yes
Code review	Parallel review agents (security, quality, efficiency)	Yes
Evidence block	Structured CLAIM/EVIDENCE/OUTPUT/VERDICT	Yes

10.3 Retry Logic

Max 3 automatic retries on verification failure
Each retry spawns a fresh executor agent (no accumulated context rot)
After 3 failures: escalate to user with diagnostic GitHub Issue
autoresearch-style: atomic change → verify → keep/discard

Implementation status: Currently instruction-based (enforced via skill/rule prompts). Code-level enforcement of fresh agent spawning per retry is planned.

10.4 Guard Pattern

Borrowed from autoresearch:

Verify command — "Did this task accomplish its goal?" (primary metric)
Guard command — "Did this task break what was already working?" (regression check)
If guard fails after verify passes: 2 rework attempts before discarding

Implementation status: Currently instruction-based (enforced via verification skill). Code-level enforcement of the VERIFY+GUARD dual-command pattern is planned.

11. Self-Improvement

Research completed 2026-03-24. Full findings: docs/spec/self-improvement-research.md Sources analyzed: autoresearch (1,900 stars, v1.8.2), Superpowers (v4.3.1), 40+ academic/community sources

11.1 Philosophy

MaxsimCLI improves locally per project with every session through three layers: Session Memory (automatic), Metric Tracking (per task/phase), and an optional Optimization Loop (on-demand). Inspired by autoresearch's "constraint + mechanical metric + autonomous iteration = compounding gains" and Superpowers' anti-rationalization enforcement.

Core principles (adapted from autoresearch's 7 universal principles):

Mechanical verification only — no subjective "looks good"; every keep/discard uses a number
One atomic change per iteration — precise causality; if it breaks, the cause is unambiguous
Git as memory — git revert (not git reset --hard) preserves failed experiments for learning
Automatic rollback — failure has no permanent cost; every change reverts instantly
External enforcement — the system guarantees termination, not the agent's self-awareness
Evidence before claims — no completion without fresh verification (Superpowers Iron Law)

Pending: Deep Research. The autoresearch (github.com/uditgoenka/autoresearch) and superpowers (github.com/obra/superpowers) repositories will be cloned into docs/ for comprehensive analysis. Their Memory/Learning systems will be adopted as closely as possible. The autoresearch skill will be rewritten from scratch based on these findings. The TSV metric format in the execute workflow will be unified with autoresearch's real metrics (replacing the current binary 1/0).

11.2 Three-Layer Architecture

Layer	Mechanism	Frequency	What It Does
Session Memory	Stop + SessionStart hooks → MEMORY.md	Every session	Captures learnings, injects context at start
Metric Tracking	TSV logging after each task/phase	Every task execution	Tracks what worked/failed with numbers
Optimization Loop	`/maxsim:improve` command	On-demand	Runs autoresearch-style iteration loop

11.3 Layer 1 — Session Memory

Stop hook (maxsim-capture-learnings): Fires at session end. Implementation:

Tracks per-session commits via session_start_commit..HEAD with fallback to git log -5 when the start commit is unavailable
Extracts patterns from last_assistant_message using keyword prefix matching (e.g., lines starting with "learned:", "pattern:", "convention:")
Prunes MEMORY.md to 180 lines (hard 200-line limit in Claude Code)
Writes structured entries: date, session_id, commit_count, patterns, stop_reason
Checks stop_hook_active to prevent infinite loops (skips processing if already active)

SessionStart hook (maxsim-session-start): Fires at session start/resume/compact. Additionally detects missing hooks and warns the user if hook registration is incomplete. Injects context:

Read git log --oneline -20 (instant orientation)
Read first 200 lines of MEMORY.md (learned patterns)
Read last 10 TSV entries (metric trends, if file exists)
Output via hookSpecificOutput.additionalContext for injection into Claude's context

Storage: .claude/agent-memory/maxsim-learner/MEMORY.md (gitignored, machine-local)

11.4 Layer 2 — Metric Tracking

TSV format (adopted from autoresearch, 7 columns):

# metric_direction: lower_is_better
iteration	commit	metric	delta	guard	status	description
0	abc1234	847	0	-	baseline	Initial measurement
1	def5678	831	-16	pass	keep	Reduce verification timeout
2	-	852	+5	-	discard	Add parallel workers (reverted)
3	ghi9012	-	-	-	crash	Refactor config (syntax error, fixed)

Column	Description
`iteration`	Sequential counter (0 = baseline)
`commit`	Git hash or `-` if reverted
`metric`	Measured numeric value
`delta`	Change from previous best
`guard`	`pass` / `fail` / `-` (no guard)
`status`	`baseline` / `keep` / `discard` / `crash` / `hook-blocked`
`description`	One-sentence experiment description

Path: .claude/agent-memory/maxsim-learner/autoresearch-results.tsv (gitignored)

When written: After each task in /maxsim:execute, after each phase verification, after each /maxsim:improve iteration.

11.5 Layer 3 — Optimization Loop

/maxsim:improve runs the autoresearch 8-phase loop. The full autoresearch skill is included in templates/skills/autoresearch/ with all reference workflows.

Review — read git log + TSV + diff
Ideate — exploit successes, avoid repeated failures, try untried approaches
Modify — make ONE atomic change
Commit — commit before verify (experiment(<scope>): prefix)
Verify — run metric command, extract number
Guard — run regression check (e.g., npm test)
Decide — improved + guard pass → keep; otherwise → git revert
Log — append to TSV, check stuck condition, repeat

Verify + Guard dual-command pattern:

Verify: "Did the metric improve?" (primary goal)
Guard: "Did anything else break?" (regression safety net)
Guard failure + verify pass → rework (max 2 attempts), then discard
Guard/test files are NEVER modified by the loop

Stuck detection: After 5 consecutive discards/crashes:

Re-read ALL in-scope files (full context reload)
Re-read original goal
Review entire TSV log for patterns
Try combining 2-3 successful past changes
Try the OPPOSITE approach
Try a radical architectural change
If still stuck → create diagnostic GitHub Issue + escalate to user

Noise handling for volatile metrics: 3-run median for 1-5% variance, 5-run median for >5%, minimum-delta threshold to filter noise.

Note: Claude Code's built-in /loop command exists for scheduled recurring prompts but is not used by MaxsimCLI — it has no memory between cycles and is session-scoped (max 3 days). /maxsim:improve uses its own internal loop with git-based memory and TSV tracking.

11.6 Quality Enforcement (from Superpowers)

Adopted from Superpowers' anti-rationalization philosophy:

Evidence Blocks required for all completion claims: CLAIM / EVIDENCE / OUTPUT / VERDICT
10 forbidden phrases in verification: "should work", "I already checked", "tests were passing before", etc.
<HARD-GATE> tags in agent prompts for non-negotiable rules
Two-stage review (optional): Spec Compliance → Code Quality, each by a fresh subagent
Iron Law: No completion claims without fresh verification evidence from this session

11.7 Code Enforcement via Hooks

Hook	Event	Purpose	Exit Code 2 Effect
`maxsim-capture-learnings`	Stop	Write session learnings to MEMORY.md	N/A (always exit 0)
`maxsim-session-start`	SessionStart	Inject MEMORY.md + TSV + git log context	N/A (context injection)
`maxsim-task-completed`	TaskCompleted	Run tests before allowing task completion	Blocks completion, feeds back failure
`maxsim-teammate-idle`	TeammateIdle	Check for pending tasks before allowing idle	Keeps teammate working

11.8 Isolation

All improvements are project-local. Two projects using MaxsimCLI never interfere:

Separate .claude/agent-memory/maxsim-learner/ per project
Separate autoresearch-results.tsv per project
Separate Claude Code auto-memory (keyed by git repo root)
MEMORY.md hard-limited to 200 lines (Claude Code constraint)

12. Hooks

12.1 Hook List

Hook	Event	Purpose
`maxsim-statusline`	statusLine	Show current MaxsimCLI status in terminal
`maxsim-check-update`	SessionStart	Check for new MaxsimCLI version (1h cache)
`maxsim-session-start`	SessionStart	Inject MEMORY.md + TSV + git log context
`maxsim-notification-sound`	Notification	Play sound when Claude asks a question
`maxsim-stop-sound`	Stop	Play sound when Claude finishes
`maxsim-capture-learnings`	Stop	Capture session learnings to agent memory
`maxsim-teammate-idle`	TeammateIdle	Keep teammates working if pending tasks exist
`maxsim-task-completed`	TaskCompleted	Run verification gates before task completion

12.2 Agent Team Hook Details

Research completed 2026-03-24. Official docs: https://code.claude.com/docs/en/hooks See §12.1 for the consolidated hook list including these hooks.

These hooks fire only when Agent Teams are active (Tier 2 workflows). Neither hook supports matchers — they fire for every occurrence.

Hook	Event	Fires When	Payload Fields
`maxsim-teammate-idle`	`TeammateIdle`	Teammate about to go idle	`teammate_name`, `team_name`, `session_id`, `cwd`
`maxsim-task-completed`	`TaskCompleted`	Task being marked complete	`task_id`, `task_subject`, `task_description`, `teammate_name`, `team_name`

Exit code behavior (both hooks):

exit 0 — allow the action (teammate goes idle / task marked complete)
exit 2 — block the action; stderr is fed back to the teammate as instruction
JSON {"continue": false, "stopReason": "..."} — stop the teammate entirely

MaxsimCLI implementation:

maxsim-teammate-idle: Checks if pending tasks remain on the shared task list. If yes, exits 2 with "Pick up the next available task."
maxsim-task-completed: Runs verification (tests, build, lint). If any gate fails, exits 2 with the failure output. The teammate continues fixing until gates pass.

13. Error Handling

Three-tier recovery:

Debug — MaxsimCLI automatically enters debug mode and attempts to diagnose/fix the issue
Rollback — If debugging fails, revert to the last verified state (git revert)
Escalate — Create a diagnostic GitHub Issue with full context and notify the user

14. Git Strategy

14.1 Branching

MaxsimCLI decides the branching strategy:

Each executor agent gets a worktree branch: maxsim/phase-{N}-task-{id}
After verification, branches are merged into the main branch
Sequential merge order to minimize conflicts
Auto-resolve where possible
Verifier checks the merged result

14.2 Commits

Fully automatic:

Conventional commit format: type(scope): description
Co-author attribution: configurable via automation.co_author config key (default: Co-Authored-By: Claude <noreply@anthropic.com>)
Atomic commits (one logical change per commit)
Automatic push after successful verification

15. Technical Stack

15.1 Monorepo Structure

maxsimcli/
├── packages/
│   ├── cli/              # Main CLI package (TypeScript)
│   │   ├── src/
│   │   │   ├── core/     # Core logic (config, types, utilities)
│   │   │   ├── github/   # GitHub API integration (Projects v2, Issues, etc.)
│   │   │   ├── hooks/    # Hook scripts
│   │   │   └── install/  # Install/uninstall logic
│   │   └── tests/        # Unit + E2E tests (TDD)
│   └── website/          # Landing page + documentation (React + Vite)
├── templates/            # Source templates (copied to .claude/ during install)
│   ├── agents/           # 4 agent definitions + AGENTS.md registry
│   ├── commands/maxsim/  # 14 slash commands (13 primary + 1 alias)
│   ├── skills/           # 16 skill modules
│   ├── workflows/        # Workflow definitions
│   ├── references/       # Reference documents
│   ├── rules/            # Conventions + verification
│   └── templates/        # Output templates
├── docs/                 # Reference documentation (Anthropic courses, GSD reference, etc.)
└── scripts/              # Build/test scripts

15.2 Technology

Component	Technology
Language	TypeScript
Bundler	tsdown (rolldown)
Testing	Vitest (TDD for everything)
Linting	Biome
CI/CD	GitHub Actions
Releases	semantic-release (single source of truth for versioning — version injected into code at build time)
Website	React + Vite + Tailwind CSS + Motion
Documentation	Markdoc

15.3 Testing Strategy

TDD for everything. Tests before code.

Level	Coverage
Unit tests	Core logic, GitHub API, config, state, phases
Integration tests	Install/uninstall flow, hook registration
E2E tests	Full user flow: install → init → plan → execute. Runs against real GitHub API with a dedicated test account/token in CI secrets.

16. Website

maxsimcli.dev serves two purposes:

Landing Page — Marketing: features, benefits, installation instructions, tech stack showcase
Full Documentation — All commands, workflows, skills, configuration, and guides

Note: The current 33 documentation articles are outdated (reference v5 concepts like .planning/ directory, /maxsim:milestone, /maxsim:todos). All documentation must be completely rewritten to reflect the v6 spec. Only features that exist in this spec should be documented.

17. What MaxsimCLI is NOT

Not a fork of GSD or Superpowers — it is an independent project inspired by both
Not multi-runtime — it only works with Claude Code
Not global — it installs per-project into .claude/, not globally. Any global ~/.claude/maxsim/ installation is a developer's personal setup, not part of the product.
Not local-first — GitHub is always the source of truth
Not a MCP server — commands are slash commands, not MCP tools
Not optional — GitHub integration is mandatory, not a plugin

18. Success Criteria

MaxsimCLI is successful when:

A user can run npx maxsimcli@latest in any project and within minutes have a fully orchestrated development environment
/maxsim:go correctly detects project state and proposes the right action every time
Phases are planned, executed, and verified without manual intervention
The GitHub Project Board accurately reflects the project's real state at all times
Quality gates prevent broken code from being merged
The system measurably improves with each session (fewer errors, better plans, faster execution)
All components follow Anthropic's conventions exactly

19. Implementation Roadmap (Clean Rewrite v6)

Strategy: Clean rewrite on main. Phase for phase. Each phase = tagged commit. Approach: TDD — tests first, implementation second. Parallel agents for execution. Spec Documents: docs/spec/ contains all technical details for each phase.

Phase 0: Foundation

Goal: Clean slate with correct build tooling. Spec: N/A (infrastructure only)

1. git tag v5-archive (preserve current state)
2. Clear packages/cli/src/ completely
3. Set up fresh TypeScript project:
   - tsconfig.json (strict mode)
   - tsdown.config.ts (correct entry points)
   - vitest.config.ts (TDD setup)
   - biome.json (with rules ENABLED)
4. Create package.json with correct:
   - dependencies (only runtime needs)
   - devDependencies (build/test tools)
   - bin entry point
   - engines: >=22
5. Verify: npm run build && npm test passes (empty)

Commit: chore: clean rewrite foundation v6

Phase 1: Core Types & Config

Goal: Type-safe foundation for the entire system. Spec: PROJECT.md §5, §7, §14

1. src/core/types.ts — All TypeScript interfaces (single source)
2. src/core/config.ts — Config loading (from .claude/maxsim/config.json)
3. src/cli.ts — CLI entry point (maxsim-tools.cjs) — lives at src/ root, not src/core/
4. src/core/utils.ts — Shared utilities (path construction, frontmatter parsing)
5. src/core/version.ts — Version detection utilities
5. Tests: unit tests for every exported function

Commit: feat: core types and config module

Phase 2: GitHub Module (THE critical module)

Goal: Correct GitHub Projects v2 integration from scratch. Spec: docs/spec/github-projects-v2-api.md, docs/spec/github-structure-design.md

1. src/github/client.ts — Octokit setup, auth, error handling ✅
2. src/github/projects.ts — Projects v2 (GraphQL + REST, CORRECT APIs) ✅
3. src/github/issues.ts — Issues + Sub-Issues (correct ID types) ✅
4. src/github/milestones.ts — Milestones (with pagination) ✅
5. src/github/labels.ts — Label taxonomy (6 labels in 2 namespaces: type + maxsim) ✅ [UPDATE CODE: reduce from 19 to 6]
6. src/github/comments.ts — Structured comments (HTML markers) ✅
7. src/github/types.ts — GitHub-specific types ✅
8. src/github/discussions.ts — Discussions CRUD (GraphQL, pagination)
9. src/github/wiki.ts — Wiki page management (git clone strategy)
10. Tests: unit tests with mocked Octokit, E2E with real API
REMOVED: mapping.ts (local cache contradicts GitHub-only principle)
REMOVED: sync.ts (no sync needed — GitHub is always authoritative)
REMOVED: commands.ts (functionality covered by client.ts + individual modules)

Commit: feat: GitHub Projects v2 integration (correct API)

Phase 3: Install System

Goal: npx maxsimcli@latest works correctly. Spec: PROJECT.md §5.2, docs/spec/claude-md-guide.md

1. src/install/index.ts — Main installer orchestrator ✅
2. src/install/copy.ts — Template file copying (with path replacement) ✅
3. src/install/hooks.ts — Hook registration in settings.json ✅
4. src/install/uninstall.ts — Clean uninstall (complete!) ✅
5. src/install/claudemd.ts — CLAUDE.md generation ✅ (added, not in original spec)
6. src/install/manifest.ts — Track all installed files ✅
7. scripts/copy-assets.cjs — Build step: copy templates to dist ✅
8. Tests: E2E install/uninstall cycle

Commit: feat: install system with complete uninstall

Phase 4: Commands + Workflows

Goal: 14 slash commands (13 primary + 1 alias) with correct tool names and GitHub-first workflows. Spec: PROJECT.md §6, docs/spec/init-process-design.md, docs/spec/wave-execution-design.md

1. templates/commands/maxsim/ — All 14 commands (correct frontmatter)
   - Use 'Agent' tool (NOT 'Task')
   - Use correct allowed-tools
   - Correct argument-hint on all commands
2. templates/workflows/ — All workflows (GitHub-first)
   - No local .planning/ references
   - GitHub Issues as source of truth
   - Plan Mode integration (EnterPlanMode before execute)
   - Correct Agent tool spawn syntax
3. Tests: frontmatter parsing, workflow references

Loop commands (improve, fix-loop, debug-loop, security) will be extracted into separate workflow files for consistency with other commands. execute.md will be split into sub-workflows (wave execution, competitive mode, retry loop).

Commit: feat: commands and workflows (GitHub-first, correct tool names)

Phase 5: Skills (16 total)

Goal: 16 skills following Anthropic conventions exactly. Spec: docs/spec/skills-specification.md, docs/spec/skills-writing-guide.md

1. Keep 8: tdd, systematic-debugging, brainstorming, roadmap-writing,
   handoff-contract, commit-conventions, maxsim-batch, code-review
2. Merge 3: verification, github-operations, research
3. New 2: project-memory, using-maxsim (updated)
4. Keep 1: maxsim-simplify
5. All with correct YAML frontmatter (name, description)
6. New: agent-teams (Tier 2 coordination patterns, extracted from maxsim-batch)
7. All under 500 lines
8. No @ imports
9. Third-person descriptions

Commit: feat: 16 skills (Anthropic-compliant)

Phase 6: Agents (4 definitions)

Goal: 4 agent definitions with valid YAML frontmatter. Spec: PROJECT.md §7

1. templates/agents/executor.md — Valid YAML, correct tools
2. templates/agents/planner.md — permissionMode: plan
3. templates/agents/researcher.md — WebSearch + WebFetch
4. templates/agents/verifier.md — Verification skills
5. templates/agents/AGENTS.md — Registry (no debugger row)
6. No pipe-table YAML! Use proper YAML lists.

Commit: feat: 4 agent definitions (valid YAML)

Phase 7: Hooks

Goal: Working hooks for statusline, updates, sounds, learnings. Spec: docs/spec/hooks-reference.md

1. src/hooks/maxsim-statusline.ts — Status in terminal
2. src/hooks/maxsim-check-update.ts — Version check on SessionStart
3. src/hooks/maxsim-notification-sound.ts — Sound on Notification (correct event!)
4. src/hooks/maxsim-stop-sound.ts — Sound on Stop
5. src/hooks/maxsim-capture-learnings.ts — NEW: Save learnings on Stop
6. Correct registration in settings.json (right events, right matchers)
7. Platform-safe paths (quoted for Windows spaces)

Commit: feat: hooks (correct events, learnings capture)

Phase 8: Self-Improvement

Goal: Three-layer self-improvement system (Session Memory + Metric Tracking + Optimization Loop). Spec: docs/spec/self-improvement-guide.md, docs/spec/memory-system-guide.md Research: Completed 2026-03-24. Findings: docs/spec/self-improvement-research.md

P0 — Session Memory:
1. Rewrite maxsim-capture-learnings Stop hook (per-session commits, pattern extraction, pruning)
2. New maxsim-session-start SessionStart hook (MEMORY.md + TSV + git log injection)
3. Stop hook already captures learnings to MEMORY.md ✅ (needs improvement)

P1 — Metric Tracking:
4. TSV logging in execute workflow (7-column autoresearch format)
5. TaskCompleted hook for test-gate enforcement
6. Verify + Guard dual-command pattern in verification workflow

P2 — Quality & Detection:
7. Stuck detection (5 consecutive failures → 6-step escalation)
8. Iron Laws + Anti-Rationalization tables in agent prompts (from Superpowers)
9. <HARD-GATE> tags for non-negotiable verification rules

P3 — Optimization Loop:
10. /maxsim:improve command (optional autoresearch-style loop)
11. Plan wizard for /maxsim:improve setup
12. Noise handling for volatile metrics (median, min-delta)

Commit: feat: self-improvement system (autoresearch + superpowers adapted)

Phase 9: Documentation & Website

Goal: All docs match the new v6 implementation. Spec: All docs/spec/ documents

1. Rewrite USER-GUIDE.md for v6
2. Rewrite INTERNALS.md for v6
3. Update README.md
4. COMPLETELY REWRITE all 33 website documentation articles for v6
   - Remove references to .planning/, /maxsim:milestone, /maxsim:todos, dashboard
   - Only document features that exist in this spec
5. Fix CONTRIBUTING.md (correct lint command, etc.)
6. Update GitHub issue templates
7. Update global CLAUDE.md template
9. Verify all docs match actual code

Commit: docs: complete documentation for v6

Release

1. semantic-release handles versioning (6.0.0 via breaking change commit)
2. CHANGELOG.md auto-updated by semantic-release
3. npm publish (automated via CI)
4. Deploy website (automated via GitHub Pages workflow)
5. Announce

Version strategy: semantic-release is the single source of truth for versioning. The version in core/types.ts, core/version.ts, and templates/templates/config.json must be injected at build time from packages/cli/package.json. No hardcoded version strings.

20. Deep-Dive Specifications

Each section above has a corresponding deep-dive document in docs/spec/ with full technical details, API references, and implementation guidance.

#	Topic	Document	Lines	Key Content
1	GitHub Projects v2 API	`github-projects-v2-api.md`	2,374	Complete REST + GraphQL + gh CLI reference, Sub-Issues API, authentication, pagination
2	GitHub Issue Structure	`github-structure-design.md`	1,855	Board design, issue hierarchy, 6 labels in 2 namespaces (update from 16/4), 9 comment types, IssueOps, GitHub Actions
3	Agent Teams Guide	`agent-teams-guide.md`	1,283	TeamCreate, SendMessage, TeammateIdle/TaskCompleted hooks, 6 coordination patterns
3b	Agent Teams Research	`agent-teams-research.md`	~400	NEW — Consolidated research (20 parallel agents, 2026-03-24): hybrid architecture decision, cost analysis, patterns catalog, community findings
4	Plan Mode Guide	`plan-mode-guide.md`	1,090	EnterPlanMode/ExitPlanMode mechanics, permissionMode:plan, tool restrictions
5	Skills Writing Guide	`skills-writing-guide.md`	1,480	Anthropic skill conventions, frontmatter spec, CSO rules, 12 anti-patterns
6	Skills Specification	`skills-specification.md`	985	All 14 target skills: name, description, structure, agent preloads
7	Memory System Guide	`memory-system-guide.md`	1,340	CLAUDE.md, auto memory, MEMORY.md, subagent memory, feedback loops
8	CLAUDE.md Guide	`claude-md-guide.md`	961	Best practices, template, 200-line limit, path-scoped rules
9	Self-Improvement Guide	`self-improvement-guide.md`	1,151	autoresearch adaptation, 8-phase loop, Verify+Guard, Git-as-Memory
9b	Self-Improvement Research	`self-improvement-research.md`	~350	NEW — Consolidated research (21 agents, 2026-03-24): 3-layer architecture, autoresearch + Superpowers analysis
10	Parallel Execution Guide	`parallel-execution-guide.md`	1,043	Agent tool parameters, batch pattern, worktree isolation, token costs
11	Wave Execution Design	`wave-execution-design.md`	848	Dependency analysis, Kahn's algorithm, adaptive waves, error recovery
12	Competitive Implementation	`competitive-implementation-design.md`	1,136	Best-of-N sampling, 7 scoring criteria, prompt variation, hybrid strategy. Now optional, not default.
13	Verification System	`verification-system-design.md`	1,432	Gate framework, evidence blocks, anti-rationalization, Guard pattern
14	Init Process Design	`init-process-design.md`	1,301	5-phase init, profile-based agent count (not fixed 30+), adaptive interview, GitHub setup
15	Hooks Reference	`hooks-reference.md`	2,319	All 22 hook events, settings.json format, 4 handler types, 12 gotchas
16	Git Worktree Strategy	`git-worktree-strategy.md`	2,007	Worktree lifecycle, merge strategies, conflict resolution, cleanup
17	Claude Code SDK Guide	`claude-code-sdk-guide.md`	1,331	Claude Agent SDK, headless mode, programmatic sessions, @maxsim/sdk

Total specification volume: ~23,936 lines across 17 documents.

This document is the authoritative specification for MaxsimCLI. All code, templates, documentation, and workflows must conform to what is defined here. The deep-dive documents in docs/spec/ provide the technical details for implementation. When in doubt, this document wins.

FilesExpand file tree

PROJECT.md

Latest commit

History