A 6-phase AI pipeline for shipping features into existing codebases — without the "the agent confidently broke prod" failure mode.
Each phase has a hard-constraint block the AI reads before any creative work, every verification finding quotes file:line evidence from your code, and a multi-model consensus (Claude + GPT-4o + Gemini) runs before the first design decision. Ships as a Claude Code plugin.
What you write: a 2–3 line feature description. What it produces: consensus synthesis → architecture plan with blast-radius analysis → code → tests (security-first) → verification report with confidence-scored findings → executed smoke tests → deployment plan. Each artifact is a file in your repo, so the full decision trail is committable.
See it before installing: docs/example-task/ walks through one feature end-to-end — every artifact from requirements through delivery, including the consensus synthesis, an architecture decomposition, a verification finding with file:line evidence, and executed smoke tests with side-effect checks.
motspilot is built for engineers integrating into production codebases — not for greenfield prototyping. If your codebase is younger than its first incident, a single-model agent is probably fine.
Two components:
-
motspilot.sh— Shell utility. Manages named tasks, requirements, state, and artifacts. Does not invoke AI directly. -
Claude Code — The AI orchestrator. Reads the work order and runs each phase as a Task subagent. You approve each phase before it proceeds. Tasks auto-archive on completion.
Claude Code is the engine. motspilot.sh is the filing system.
- Claude Code — the AI orchestrator that runs the pipeline phases
- Bash 4.0+ — the shell script uses associative arrays
- Linux: you almost certainly have bash 4+ already
- macOS: ships with bash 3.2 — install a newer version with
brew install bash - Windows: use Git Bash, WSL, or any bash 4+ environment
yqv4.52+ — parses YAML frontmatter from phase prompts- Install:
wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O ~/.local/bin/yq && chmod +x ~/.local/bin/yq - macOS:
brew install yq
- Install:
From any Claude Code session:
/plugin marketplace add motsmanish/motspilot
/plugin install motsThen in your target project:
/mots:init # One-time setup
/mots:pilot add login throttling # Run the pipeline
/mots:status # Check progress
/mots:view verify # Read verification report
/mots:archive --task=add-login-throttling # Archive when doneRequires: WSL, macOS, or Linux (bash). Consensus phase optionally requires PHP 8+ and API keys for Claude, GPT-4o, and Gemini.
git clone https://github.com/motsmanish/motspilot.gitSee CONTRIBUTING.md for development setup. Alternative integration methods (symlink, submodule) are documented there.
# 1. Initialize (first time only)
./motspilot/motspilot.sh init
# Edit .motspilot/config with your project's language, framework, and test command
# 2. Create a task and prepare the pipeline
./motspilot/motspilot.sh go --task=csv-export "Add CSV export to the reports page"
# 3. In Claude Code chat:
run motspilot pipelineClaude Code orchestrates all 6 phases (consensus → architecture → development → testing → verification → delivery) and archives the task automatically when delivery is approved. By default, phases run without pausing (AUTO_APPROVE="all"). Set AUTO_APPROVE="none" in .motspilot/config to approve each phase before it proceeds.
motspilot works with any framework. Framework-specific knowledge lives in prompts/frameworks/:
| Guide | Framework | Status |
|---|---|---|
cakephp.md |
CakePHP 4.x | Shipped |
plain-php.md |
Plain PHP (no framework) | Shipped |
laravel.md |
Laravel 11.x+ | Shipped |
All shipped guides include side-effect-asserting smoke-test templates (status-code-only curl is no longer accepted by the delivery phase), <framework_tool_affinity> blocks that route the AI toward correct tool usage, and point at prompts/delivery.md section 3.2 for the smoke-test execution gate. The plain-PHP guide is structured around two callouts — Shape A (page-file) and Shape B (PDS-skeleton) — and includes webserver isolation rules. The Laravel guide covers Eloquent, Form Requests, Policies, Events/Queues, Pest/PHPUnit, and artisan commands.
Without a guide for your framework, the pipeline runs on framework-agnostic reasoning and discovers patterns from your codebase as it goes. Adding a guide is a one-file contribution — see Adding a Framework Guide below.
Community contributions welcome. See CONTRIBUTING.md for how to write a framework guide. Guides for Django, Rails, Next.js, Express, and others would be valuable additions.
To add support for a new framework, create prompts/frameworks/<name>.md and set FRAMEWORK="<name>" in .motspilot/config.
./motspilot.sh go --task=billing-fix "Fix incorrect subtotal on invoices"
# Then in Claude Code: run motspilot pipeline# Current task is paused automatically when you start a new one.
# Artifacts are preserved on disk.
./motspilot.sh go --task=high-priority "Fix critical login bug"
# Then in Claude Code: run motspilot pipeline./motspilot.sh go --task=csv-export --from=development
# Then in Claude Code: run motspilot pipeline./motspilot.sh reactivate csv-export
./motspilot.sh go --task=csv-export --from=development
# Then in Claude Code: run motspilot pipeline./motspilot.sh tasks # active tasks
./motspilot.sh tasks --all # active + archivedgo --task=name → pending
pipeline starts → in_progress
delivery approved → archived (automatic)
reactivate → in_progress (again)
Tasks auto-archive when the delivery phase is approved. You never lose the artifacts — they move from tasks/ to archive/ and are retrievable anytime.
| Command | Description |
|---|---|
./motspilot.sh init |
First-time setup |
./motspilot.sh go --task=<name> "description" |
Create task + prepare pipeline |
./motspilot.sh go --task=<name> |
Re-prepare existing task |
./motspilot.sh go --task=<name> --from=<phase> |
Re-run pipeline from a specific phase |
./motspilot.sh tasks |
List all active tasks with phase progress |
./motspilot.sh tasks --all |
List active + archived tasks |
./motspilot.sh status [--task=<name>] |
Detailed phase status for a task |
./motspilot.sh archive --task=<name> |
Manually archive a task |
./motspilot.sh reactivate <name> |
Restore task from archive |
./motspilot.sh reset --task=<name> |
Delete phase artifacts (keeps requirements) |
./motspilot.sh view <phase> [--task=<name>] |
View a phase artifact |
./motspilot.sh mem-check |
Check memory index health (line/byte caps, staleness) |
Phase names: architecture · development · testing · verification · delivery
View shortcuts: req · arch · dev · test · verify · wo (workorder)
You write the requirements. motspilot runs 6 AI phases:
Requirements (input) → Consensus → Architecture → Development → Testing → Verification → Delivery
| # | Phase | What it does | Writes code? |
|---|---|---|---|
| 1 | Consensus | Fans out the requirements to Claude + GPT-4o + Gemini, synthesizes a starting point | No |
| 2 | Architecture | Explores codebase, designs feature, traces blast radius | No |
| 3 | Development | Implements the feature — schema, models, controllers, views | Yes |
| 4 | Testing | Writes comprehensive tests (security-first) | Yes |
| 5 | Verification | Senior code review — security, correctness, framework patterns | No |
| 6 | Delivery | Executes smoke tests, deployment steps, rollback plan, git commit message | No |
Each phase uses a thinking framework (in prompts/) — not a checklist. Every phase prompt includes YAML frontmatter (parsed by yq), a <hard_constraints> block of non-negotiable rules read before any creative work, and a structured 12-item <completion_checklist> the phase must self-report against. Downstream phases consume tight <summary> blocks, not full reasoning history. Verification quotes file:line evidence; Delivery executes smoke tests with both entry-point and side-effect checks before marking a task complete.
For the full catalogue of patterns applied across phases — structure, reasoning gates, output discipline, severity / confidence / consistency tiers, and cross-phase contracts — see docs/prompt-engineering.md.
Each phase also receives a framework guide (in prompts/frameworks/) if one exists for your framework.
| Situation | Action |
|---|---|
| New unrelated feature | New task: go --task=new-name "description" |
| Bug in a recently completed feature | Reactivate: reactivate <name>, re-run from development |
| Bug that reveals a design flaw | New task (architecture needs rethinking) |
| Bug in old unrelated code | New task |
motspilot/ # The tool (symlink, submodule, or clone)
├── motspilot.sh # Shell utility
├── PIPELINE_ORCHESTRATOR.md # Claude Code orchestration instructions
├── README.md # This file
├── CLAUDE.md # Quick reference for Claude Code
├── prompts/
│ ├── _xml_tags.md # Canonical reference for all XML tag names
│ ├── architecture.md # Architecture thinking framework (YAML frontmatter)
│ ├── development.md # Development thinking framework (YAML frontmatter)
│ ├── testing.md # Testing thinking framework (YAML frontmatter)
│ ├── verification.md # Verification thinking framework (YAML frontmatter)
│ ├── delivery.md # Delivery thinking framework (YAML frontmatter)
│ └── frameworks/ # Framework-specific guides
│ ├── cakephp.md # CakePHP 4.x guide
│ ├── laravel.md # Laravel 11.x+ guide
│ └── plain-php.md # Plain PHP (no framework) guide
└── .motspilot/
├── config # Project settings (edit this)
├── current_task # Name of currently active task
└── logs/
└── motspilot.log
Task data lives in the workspace — either inside .motspilot/workspace/ (default) or in your project repo via WORKSPACE_DIR:
<workspace>/ # .motspilot/workspace/ OR <WORKSPACE_DIR>/
├── tasks/
│ └── <task-name>/
│ ├── meta # STATUS, DESCRIPTION, CREATED
│ ├── 01_requirements.md
│ ├── 02_architecture.md
│ ├── 03_development.md
│ ├── 04_testing.md
│ ├── 05_verification.md
│ ├── 06_delivery.md
│ ├── checkpoint # Current phase state
│ └── pipeline_workorder.md
└── archive/
└── <task-name>/ # Same structure, auto-moved on completion
Edit motspilot/.motspilot/config:
LANGUAGE="php" # Language: php, python, javascript, typescript, go, ruby, java, etc.
LANGUAGE_VERSION="8.2" # Language version
FRAMEWORK="cakephp" # Framework: cakephp, laravel, django, nextjs, express, rails, etc.
PROJECT_ROOT=".." # Path to your project root (relative to motspilot/)
AUTO_APPROVE="all" # "all" = fully automatic, "none" = approve each phase
TEST_CMD="./vendor/bin/phpunit" # How to run tests
APP_URL="http://localhost:8080" # For smoke testing (verification phase)
WORKSPACE_DIR="" # Store task data in your project repo (see below)By default, task artifacts (requirements, architecture docs, development summaries, etc.) are stored inside motspilot/.motspilot/workspace/. This directory is gitignored in the motspilot repo, so task data is not version-controlled.
If you want to preserve task history in your project's git repository, set WORKSPACE_DIR to a directory inside your project:
# In .motspilot/config
WORKSPACE_DIR="motspilot-data"This tells motspilot to store tasks/ and archive/ at <PROJECT_ROOT>/motspilot-data/ instead of inside the motspilot tool directory. Then commit motspilot-data/ to your project's git repo:
your-project/
├── motspilot/ # symlink/submodule (gitignored or submodule)
├── motspilot-data/ # committed to YOUR repo
│ ├── tasks/
│ │ └── csv-export/
│ │ ├── 01_requirements.md
│ │ ├── 02_architecture.md
│ │ └── ...
│ └── archive/
│ └── completed-feature/
└── src/
Why this matters:
- Task data survives even if motspilot is uninstalled or the symlink breaks
- Architecture decisions and development summaries are part of your project's history
- Other team members can read past task artifacts without running motspilot
git log motspilot-data/shows the evolution of feature development decisions
To add support for a new framework:
- Create
motspilot/prompts/frameworks/<framework-name>.md - Include these sections:
- Version reference — correct vs incorrect API for the framework version
- Naming conventions — how files, classes, and routes are named
- Files to explore — key landmark files for the architecture phase
- Migration/schema patterns — how to create and rollback schema changes
- Model/entity patterns — access control, validation, relationships
- Service/business logic patterns — where business logic lives
- Controller/handler patterns — request handling conventions
- Template/view patterns — output escaping, form helpers
- Test patterns — test setup, fixtures, security test examples
- Verification checks — grep patterns to catch common mistakes
- Deployment commands — deploy, rollback, cache clear
- Set
FRAMEWORK="<framework-name>"in.motspilot/config
See prompts/frameworks/cakephp.md as a reference.
Before the pipeline phases begin, motspilot runs a Multi-Model Consensus step. It fans out the task requirements to 3 LLMs (Claude, GPT-4o, Gemini) in parallel, collects their responses, and synthesizes a single authoritative starting point via a Claude judge using a 3-pass process (Extract → Reconcile → Synthesize) with a completeness contract ensuring no unique insight is lost. The synthesis uses a 9-section structured format with Agreed/Split/Risks/Scope sections. Split decisions auto-resolve by majority when AUTO_APPROVE=all. The synthesized output is then fed as additional context into every pipeline phase.
The consensus script is a standalone PHP CLI — no framework dependencies, just PHP 8.0+ with curl:
# Direct usage (the pipeline runs this automatically):
php motspilot/bin/consensus.php --prompt-file=prompt.txt --phase=architecture > output.md
# Or via stdin:
echo "Design a caching strategy" | php motspilot/bin/consensus.php --phase=generalSetup: Set ANTHROPIC_API_KEY, OPENAI_API_KEY, and GEMINI_API_KEY in motspilot/.env.
Fault-tolerant by design: if 1–2 APIs fail, synthesis proceeds with the remaining responses. If all 3 fail, the pipeline continues without consensus and logs the gap for review. Consensus strengthens the starting point but is not a hard dependency for the rest of the pipeline to run.
Dynamic timeouts: API timeouts scale automatically based on prompt length (90s floor, +5s per 1K chars, 300s ceiling). Judge calls get 1.5x. No hardcoded limits — large requirements documents get proportionally more time. Human-readable error messages for timeouts, DNS failures, and connection issues.
- Start with the user — think about who uses the feature before touching code
- Explore before building — read existing code, do not assume
- Trace blast radius — know what breaks before you change it
- Layered build order — Foundation → Logic → Interface, verifying each against the architecture
- Security mindset — think like an attacker at every step
- Never greenfield — always integrating into existing apps, matching existing patterns
- One action per migration — never combine unrelated changes in a single migration
motspilot was built for production work I do day-to-day and shared because it may help others doing similar work. It's maintained alongside that primary work — issue and PR response times may be longer than a full-time-staffed project, and community contributions (especially framework guides and bug fixes) are very welcome. See CONTRIBUTING.md.
- motspilot requires your own Claude Code subscription or API access. It does not include, bundle, or redistribute any Anthropic software.
- motspilot stores all data locally on your machine. It does not collect or transmit any data. When the pipeline runs, task data is sent to Anthropic's servers by Claude Code under Anthropic's privacy policy.
- All AI-generated code must be reviewed by qualified developers before deployment. The authors are not responsible for code produced by AI models.
- Security testing instructions are intended for testing your own applications only.