Research date: 2026-03-24 Method: 21 parallel research agents analyzing autoresearch repo, Superpowers docs, existing MaxsimCLI specs, and web sources Purpose: Redesign MaxsimCLI's Self-Improvement system (PROJECT.md §11)
MaxsimCLI's self-improvement system should adopt autoresearch's 8-phase loop as the core iteration engine and Superpowers' anti-rationalization + two-stage review as the quality enforcement layer. The current maxsim-capture-learnings hook is a good start but needs significant improvement.
Key design decisions:
- Adopt autoresearch's loop as an optional
/maxsim:improvecommand — not as the default execution mode, but as a dedicated optimization workflow - Fix the capture-learnings hook — track per-session commits (not all-time), extract patterns, enforce MEMORY.md 200-line limit
- Implement TSV logging — use autoresearch's 7-column format for metric tracking
- Add a SessionStart hook — read git log + MEMORY.md + TSV at session start for context injection
- Adopt Verify + Guard as standard — already instruction-based, needs code enforcement via TaskCompleted hook
- Stuck detection — 5 consecutive failures triggers 6-step escalation (already specified, not implemented)
What it is: A Claude Code plugin (v1.8.2) that implements Karpathy's autonomous improvement loop for any domain with a measurable metric.
Core loop (8 phases):
- Review state (git log + TSV + diff)
- Ideate (exploit successes, avoid repeated failures)
- Modify (ONE atomic change)
- Commit (before verify — change is recorded regardless of outcome)
- Verify (run metric command, extract number)
- Guard (regression check — optional but recommended)
- Decide (keep/revert/rework/crash-recover)
- Log (append to TSV, check stuck condition)
Key patterns:
git revertovergit reset --hard— preserves failed experiments for learning- All commits use
experiment(<scope>):prefix - TSV is gitignored — local only
- Progress summary every 10 iterations
- Noise handling: 3-5x median for volatile metrics, min-delta threshold
- Stuck detection: 5 consecutive discards → 6-step escalation
9 commands: /autoresearch (core loop), :plan, :debug, :fix, :security, :ship, :scenario, :predict, :learn
Architecture: Pure markdown/instruction-based. No hooks, no external processes. Plugin manifest + commands + skills with lazy-loaded references.
What it is: A plugin-based meta-prompting system for Claude Code with 13 skills focused on development quality.
Key patterns for self-improvement:
- Anti-rationalization tables: 10 forbidden phrases + pre-empted excuses in every skill
- Two-stage review per task: Spec Compliance Review → Code Quality Review (never trust the implementer!)
- Evidence-based verification: CLAIM/EVIDENCE/OUTPUT/VERDICT blocks — no "should work" allowed
- Fresh subagent per task: Zero context pollution between tasks
- 1% skill invocation rule: If even 1% chance a skill applies, MUST invoke it
- Iron Laws:
<HARD-GATE>tags for non-negotiable rules
Architecture: SessionStart hook injects using-superpowers skill. All other skills loaded on-demand. No persistent memory system — Git history + committed docs only.
Critical difference from autoresearch: Superpowers is quality-enforcement (making sure work is correct). autoresearch is optimization (making a metric improve). MaxsimCLI needs both.
What works:
maxsim-capture-learningsStop hook writes to MEMORY.md ✅- Verify + Guard pattern as instructions in skills/workflows ✅
- 4-gate verification (Input → Pre-Action → Completion → Quality) ✅
What's broken or missing:
- Hook only records last 5 commits (all-time, not per-session) — misleading
- No TSV logging — no metric tracking
- No SessionStart hook — no context injection
- No stuck detection — no escalation
- No
/maxsim:improvecommand — no dedicated optimization workflow - MEMORY.md grows endlessly — no 200-line pruning
stop_reasonis ignored by the hook- TSV path inconsistency (guide says root, PROJECT.md says agent-memory/)
- TSV format inconsistency (guide has 7 columns, reference has 5)
| Layer | Mechanism | Frequency | Source |
|---|---|---|---|
| Session Memory | MEMORY.md via Stop hook | Every session | Superpowers philosophy |
| Metric Tracking | autoresearch-results.tsv | Every task/phase | autoresearch |
| Optimization Loop | /maxsim:improve command |
On-demand | autoresearch core loop |
The maxsim-capture-learnings hook needs these improvements:
Current: Records last 5 commits (all-time), no pattern extraction
Target: Records THIS session's commits only, extracts patterns, prunes MEMORY.md
Implementation:
- Record
session_start_commitat SessionStart (save the HEAD sha) - At Stop, diff
git log {session_start_commit}..HEAD --onelineto get only THIS session's commits - Extract patterns: count of changes by type (feat/fix/refactor), files most modified
- Prune MEMORY.md to stay under 180 lines (leave 20-line buffer before 200-line hard limit)
- Use
stop_reasonto differentiate clean exits from crashes - Write structured entry with: date, session_id, commit_count, commit_list, patterns, stop_reason
Single authoritative format (adopted from autoresearch):
# metric_direction: lower_is_better
iteration commit metric delta guard status descriptionSingle authoritative path: .claude/agent-memory/maxsim-learner/autoresearch-results.tsv (gitignored)
When it's written:
- After each task execution in
/maxsim:execute— metric = test pass rate or build success - After each iteration in
/maxsim:improve— metric = user-defined - After each phase verification — metric = verification gate results
A new maxsim-session-start hook that fires on SessionStart:
1. Read git log --oneline -20
2. Read MEMORY.md (first 200 lines)
3. Read last 10 TSV entries (if file exists)
4. Output as additional_context for Claude
This gives every session instant context about recent work, learned patterns, and metric trends.
Already instruction-based. Add code enforcement via the TaskCompleted hook:
#!/bin/bash
# maxsim-task-completed hook
npm test 2>&1
if [ $? -ne 0 ]; then
echo "Tests not passing. Cannot complete task." >&2
exit 2 # blocks completion, feeds back to agent
fi
exit 0For phase-level, the TeammateIdle hook enforces the Guard:
#!/bin/bash
# maxsim-teammate-idle hook
npm run build 2>&1
if [ $? -ne 0 ]; then
echo "Build broken. Fix before going idle." >&2
exit 2
fi
exit 0Implement in the orchestrator workflow:
After each task completion:
Read last 5 TSV entries
If 5 consecutive discards/crashes:
1. Re-read ALL in-scope files (full context reload)
2. Re-read original goal/phase description
3. Review entire TSV log for patterns
4. Try combining 2-3 successful past changes
5. Try the OPPOSITE approach
6. Try a radical architectural change
7. If still stuck → create diagnostic GitHub Issue + escalate to user
A dedicated optimization command that runs the autoresearch loop:
/maxsim:improve
Goal: Reduce TypeScript errors to zero
Scope: src/**/*.ts
Verify: npx tsc --noEmit 2>&1 | grep error | wc -l
Guard: npx vitest run
Iterations: 20
This is NOT part of the standard /maxsim:execute flow. It's a separate, optional tool for when the user wants autonomous metric optimization.
| Pattern | Adopt? | How |
|---|---|---|
| 8-phase loop | Yes | As /maxsim:improve command |
| TSV logging (7-column format) | Yes | Standard format for all metric tracking |
| git revert over git reset | Yes | Already in verification skill |
| Stuck detection (5 fails → escalation) | Yes | In orchestrator workflow |
| Noise handling (median, min-delta) | Yes | For volatile metrics |
| Plan wizard | Partially | Integrate into /maxsim:improve setup |
| experiment() commit prefix | Yes | For autoresearch iterations |
| Progress summary every 10 iterations | Yes | In loop output |
| Pattern | Adopt? | How |
|---|---|---|
| Anti-rationalization tables | Yes | Already in verification skill |
| Two-stage review (Spec + Quality) | Yes | As opt-in during execute |
| Evidence blocks (CLAIM/EVIDENCE/OUTPUT/VERDICT) | Yes | Already in verification skill |
| Fresh subagent per task | Yes | Already in executor workflow |
| 1% skill invocation rule | No | Too aggressive for MaxsimCLI's use case |
| Iron Laws / HARD-GATE tags | Yes | For critical verification rules |
| SessionStart skill injection | Partially | Use SessionStart hook for context, not full skill injection |
| Pattern | Why Not |
|---|---|
| autoresearch's 9 subcommands | Too many — MaxsimCLI has its own command set |
| Superpowers' zero-memory philosophy | MaxsimCLI needs persistent project memory (GitHub) |
| autoresearch's Plugin manifest system | MaxsimCLI uses its own install system |
| Karpathy's GPU-specific patterns | Not applicable to code quality metrics |
| Issue | Resolution |
|---|---|
| TSV path (root vs agent-memory/) | Use .claude/agent-memory/maxsim-learner/autoresearch-results.tsv — consistent with MEMORY.md location |
| TSV format (7 cols vs 5 cols) | Use 7-column autoresearch format — it's more detailed and proven |
| Retry count (3 vs 4 total attempts) | Use 3 total attempts (max 2 retries) — matches executor workflow |
/maxsim:improve not in 9 commands |
Add as 10th command OR implement as a workflow within /maxsim:quick |
| Capture-learnings hook too simple | Rewrite with per-session tracking, pattern extraction, pruning |
| self-improvement-guide.md discrepancies | Rewrite §11 of PROJECT.md to match this research, then update the guide |
| Priority | Item | Effort |
|---|---|---|
| P0 | Fix capture-learnings hook (per-session commits, pruning) | Small |
| P0 | Add SessionStart hook (git log + MEMORY.md injection) | Small |
| P1 | Implement TSV logging in execute workflow | Medium |
| P1 | Add TaskCompleted hook for test-gate enforcement | Small |
| P2 | Implement stuck detection in orchestrator | Medium |
| P2 | Add /maxsim:improve command (autoresearch loop) |
Large |
| P3 | Noise handling for volatile metrics | Small |
| P3 | Plan wizard for /maxsim:improve |
Medium |
Replace the current [NEEDS RESEARCH] section with the architecture described in §3 above. Key changes:
- Remove
[NEEDS RESEARCH]tag - Describe three-layer architecture (Session Memory + Metric Tracking + Optimization Loop)
- Document TSV format (7-column, single authoritative path)
- Document SessionStart hook
- Document improved capture-learnings hook
- Document stuck detection
- Reference
/maxsim:improveas optional 10th command - Link to this research document and the autoresearch guide
| Source | URL/Path |
|---|---|
| autoresearch repo | github.com/uditgoenka/autoresearch (cloned to /tmp/autoresearch/) |
| autoresearch index.md | claude-plugin/skills/autoresearch/index.md |
| autoresearch loop protocol | claude-plugin/skills/autoresearch/references/autonomous-loop-protocol.md |
| autoresearch results logging | claude-plugin/skills/autoresearch/references/results-logging.md |
| autoresearch core principles | claude-plugin/skills/autoresearch/references/core-principles.md |
| Superpowers reference | docs/superpowers-reference/ |
| Superpowers research | docs/superpowers-research.md |
| MaxsimCLI self-improvement guide | docs/spec/self-improvement-guide.md |
| MaxsimCLI memory system guide | docs/spec/memory-system-guide.md |
| MaxsimCLI verification skill | templates/skills/verification/index.md |
| MaxsimCLI project-memory skill | templates/skills/project-memory/index.md |
| MaxsimCLI capture-learnings hook | packages/cli/src/hooks/maxsim-capture-learnings.ts |
| MaxsimCLI self-improvement ref | templates/references/self-improvement.md |
| MaxsimCLI verification patterns | templates/references/verification-patterns.md |
| Claude Code hooks docs | https://code.claude.com/docs/en/hooks |
| Claude Code memory docs | Described in docs/spec/memory-system-guide.md |