The Autonomous Delivery Platform
From labeled GitHub issue to merged PR — with 18 new autonomous agents orchestrating every step.
- Shipwright Builds Itself
- Code Factory Pattern
- What's New in v3.2.4
- How It Works
- Install
- Quick Start
- Features
- Commands
- Pipeline Templates for Teams
- Configuration
- Prerequisites
- Architecture
- Contributing
- License
This repo uses Shipwright to process its own issues. Label a GitHub issue with shipwright and the autonomous pipeline takes over: semantic triage, plan, design, build, test, review, quality gates, PR. No human in the loop.
See it live | Create an issue and watch it build.
Shipwright implements the complete Code Factory control-plane pattern — where agents write 100% of the code and the repo enforces deterministic, risk-aware checks before every merge. Every decision is traceable to policy. Every merge is backed by machine-verifiable evidence.
Agent writes code → Risk policy gate → Tier-appropriate CI → Code review agent
→ Findings auto-remediated → SHA-validated evidence → Bot threads cleaned → Merge
→ Incidents feed back into harness coverage
| Code Factory Layer | Shipwright Implementation |
|---|---|
| Single contract | config/policy.json — risk tiers, merge policy, docs drift, evidence specs, harness SLAs in one file |
| Preflight gate | risk-policy-gate.yml classifies risk from changed files before expensive CI runs |
| SHA discipline | All checks, reviews, and approvals validated against current PR head — stale evidence is never trusted |
| Rerun writer | sw-review-rerun.sh — SHA-deduped, single canonical writer, no duplicate bot comments |
| Remediation loop | review-remediation.yml — agent reads findings, patches code, validates, pushes fix to same branch |
| Bot thread cleanup | auto-resolve-threads.yml — resolves bot-only threads after clean rerun, never touches human threads |
| Evidence framework | sw-evidence.sh — browser, API, database, CLI, webhook, and custom evidence with freshness enforcement |
| Harness-gap loop | shipwright incident gap — every regression creates a test case with SLA tracking |
Shipwright extends the Code Factory pattern with capabilities most implementations don't have:
- 12-stage pipeline with self-healing builds, adversarial review, and compound quality gates
- Predictive risk scoring using GitHub signals (security alerts, contributor expertise, file churn)
- Persistent memory — failure patterns, fix effectiveness, and prediction accuracy compound over time
- Auto-learning — self-optimize runs automatically after every pipeline completion, including context efficiency tuning
- Decision engine — tiered autonomous decisions with outcome learning and deduplication
- Unified model routing — single source of truth for model selection across all components
- Evidence-gated merges — SHA discipline ensures all evidence validated against current PR head
- Semantic quality audits — Claude-powered audits with grep fallback when Claude unavailable
- 18 autonomous agents with specialized roles (PM, reviewer, security auditor, test generator, etc.)
- Cross-platform compatibility — portable date helpers, file_mtime, and compat layer for macOS/Linux
- Fleet operations — the Code Factory pattern applied across every repo in your org
- Cost intelligence — per-pipeline cost tracking, budget enforcement, adaptive model routing
- Self-optimization — DORA metrics analysis auto-tunes daemon config and template weights
# Evidence framework — capture and verify all types
npm run harness:evidence:capture # All collectors (browser, API, DB, CLI)
npm run harness:evidence:capture:api # API endpoints only
npm run harness:evidence:capture:cli # CLI commands only
npm run harness:evidence:capture:database # Database checks only
npm run harness:evidence:verify # Verify manifest + freshness
npm run harness:evidence:pre-pr # Capture + verify in one step
# Risk and policy
npm run harness:risk-tier
# Incident-to-harness loop
shipwright incident gap list
shipwright incident gap slaFull Code Factory documentation
Code Factory pattern — deterministic, risk-aware agent delivery with machine-verifiable evidence:
- Risk policy gate — PR-level preflight classifies risk tier from changed files; blocks before expensive CI
- SHA discipline — All evidence validated against current PR head SHA; stale evidence never trusted
- Evidence framework — 6 collector types (browser, API, database, CLI, webhook, custom) with freshness enforcement
- Review remediation — Agent reads review findings, patches code, validates, pushes fix commit in-branch
- Auto-resolve bot threads — Bot-only PR threads cleaned up after clean rerun; human threads untouched
- Harness-gap loop — Every incident creates a test case requirement with SLA tracking (P0: 24h, P1: 72h)
- Policy contract v2 — Risk tiers, merge policy, docs drift rules, evidence specs, harness SLAs in one file
v2.3.1: Autonomous feedback loops, testing foundation, chaos resilience
v2.3.0: Fleet Command completeness overhaul + autonomous team oversight
v2.0.0: 18 autonomous agents, 100+ CLI commands, intelligence layer, multi-repo fleet, local mode
graph LR
A[GitHub Issue] -->|labeled 'shipwright'| B[Daemon]
B --> C[Triage & Score]
C --> D[Select Template]
D --> E[Pipeline]
subgraph Pipeline ["12-Stage Pipeline"]
direction LR
E1[intake] --> E2[plan] --> E3[design] --> E4[build]
E4 --> E5[test] --> E6[review] --> E7[quality]
E7 --> E8[PR] --> E9[merge] --> E10[deploy]
E10 --> E11[validate] --> E12[monitor]
end
E --> E1
E12 --> F[Merged PR]
subgraph Intelligence ["Intelligence Layer"]
I1[Predictive Risk]
I2[Model Routing]
I3[Adversarial Review]
I4[Self-Optimization]
end
Intelligence -.->|enriches| Pipeline
style A fill:#00d4ff,color:#000
style F fill:#4ade80,color:#000
When tests fail, the pipeline re-enters the build loop with error context — self-healing like a developer reading failures and fixing them. Convergence detection stops infinite loops. Error classification routes retries intelligently.
One-command install (recommended):
git clone https://github.com/sethdford/shipwright.git && cd shipwright && ./install.shOther methods
curl
curl -fsSL https://raw.githubusercontent.com/sethdford/shipwright/main/scripts/install-remote.sh | bashnpm (global)
npm install -g shipwright-cliVerify
shipwright doctor# One-command setup
shipwright init
# See what's running
shipwright status
# Process a GitHub issue end-to-end
shipwright pipeline start --issue 42
# Run daemon 24/7 with agent orchestration
shipwright daemon start --detach
# See live agent activity
shipwright activity
# Spin up agent team for manual work
shipwright session my-feature -t feature-dev
# View DORA metrics and pipeline vitals
shipwright dora
# Continuous build loop with test validation
shipwright loop "Build auth module" --test-cmd "npm test"
# Multi-repo operations
shipwright fleet start
shipwright fix "upgrade deps" --repos ~/a,~/b,~/c
# Release automation
shipwright version bump 2.4.0
shipwright changelog generateWave 1 (Organizational):
- Swarm Manager — Orchestrates dynamic agent teams with specialization roles
- Autonomous PM — Team leadership, task scheduling, roadmap execution
- Knowledge Guild — Cross-team learning, pattern capture, mentorship
- Recruitment System — Talent acquisition and team composition
- Standup Automaton — Daily standups, progress tracking, blocker detection
Wave 2 (Operational Backbone):
- Quality Oversight — Intelligent audits, zero-defect gates, completeness verification
- Strategic Agent — Long-term planning, goal decomposition, roadmap intelligence
- Code Reviewer — Architecture analysis, clean code standards, best practices
- Security Auditor — Vulnerability detection, threat modeling, compliance
- Test Generator — Coverage analysis, scenario discovery, regression prevention
- Incident Commander — Autonomous triage, root cause analysis, resolution
- Dependency Manager — Semantic versioning, update orchestration, compatibility checking
- Release Manager — Release planning, changelog generation, deployment orchestration
- Adaptive Tuner — DORA metrics analysis, self-optimization, performance tuning
- Strategic Intelligence — Predictive analysis, trend detection, proactive recommendations
Plus 10+ specialized agents for observability, UX, documentation, and more.
intake → plan → design → build → test → review → compound_quality → pr → merge → deploy → validate → monitor
Each stage is configurable with quality gates that auto-proceed or pause for approval. 8 pipeline templates:
| Template | Stages | Use Case |
|---|---|---|
fast |
intake → build → test → PR | Quick fixes, score >= 70 |
standard |
+ plan, design, review | Normal feature work |
full |
All 12 stages | Production deployment |
hotfix |
Minimal, all auto | Urgent production fixes |
autonomous |
All stages, all auto | Daemon-driven delivery |
enterprise |
All stages, all gated | Maximum safety + rollback |
cost-aware |
All stages + budget checks | Budget-limited delivery |
deployed |
All + deploy + validate + monitor | Full deploy pipeline |
7 modules that make the pipeline smarter over time. Enabled by default: intelligence is on when Claude CLI is available, with optimization and prediction active out of the box. Set intelligence.enabled=false to disable. All modules degrade gracefully.
| Module | What It Does |
|---|---|
| Semantic Triage | AI-powered issue analysis, complexity scoring, template selection |
| Pipeline Composer | Generates custom pipeline configs from codebase analysis (file churn, test coverage, dependencies) |
| Predictive Risk | Scores issues for risk using GitHub signals (security alerts, similar past issues, contributor expertise) |
| Adversarial Review | Red-team code review — finds security flaws, edge cases, failure modes. Cross-checks against CodeQL/Dependabot alerts |
| Self-Optimization | Reads DORA metrics and auto-tunes daemon config. Includes context efficiency closed loop for token budget tuning |
| Developer Simulation | 3-persona review (security, performance, maintainability) before PR creation |
| Architecture Enforcement | Living architectural model with violation detection and dependency direction rules |
Adaptive everything: thresholds learn from history, model routing uses SPRT evidence-based switching, poll intervals adjust to queue depth, memory timescales tune based on fix effectiveness.
Native GitHub API integration enriches every intelligence module:
| API | Integration |
|---|---|
| GraphQL | File change frequency, blame data, contributor expertise, similar issues, commit history |
| Checks API | Native check runs per pipeline stage — visible in PR timeline, blocks merges on failure |
| Deployments API | Tracks deployments per environment (staging/prod), rollback support, deployment history |
| Security | CodeQL + Dependabot alerts feed into risk scoring and adversarial review |
| Contributors | CODEOWNERS-based reviewer routing, top-contributor fallback, auto-approve as last resort |
| Branch Protection | Checks required reviews and status checks before attempting auto-merge |
The autonomous decision engine (config/policy.json → decision section) handles routine operational decisions with outcome learning. Decisions are tiered by risk, with low-risk actions auto-approved and higher tiers escalated. The engine learns from outcomes to improve future decisions.
Intelligent context window management for pipeline agents:
- Budget-aware trimming — Configurable character budgets for prompt composition (
context_budget_chars) - Section-level trimming — Independent limits for memory, git history, hotspot files, and test output
- Context efficiency metrics — Tracks budget utilization and trim ratios per iteration
- Self-tuning — The self-optimization loop analyzes context efficiency events and recommends budget adjustments
shipwright daemon start --detachWatches GitHub for labeled issues and processes them 24/7:
- Auto-scaling: Adjusts worker count based on CPU, memory, budget, and queue depth
- Priority lanes: Reserve a worker slot for urgent/hotfix issues
- Retry with escalation: Failed builds retry with template escalation (fast → standard → full)
- Patrol mode: Proactively scans for security issues, stale deps, dead code, coverage gaps
- Self-optimization: Tunes its own config based on DORA metrics over time
shipwright fleet startOrchestrate daemons across multiple repositories with a shared worker pool. Workers rebalance based on queue depth, issue complexity, and repo priority.
The pipeline learns from every run:
- Failure patterns: Captured and injected into future builds so agents don't repeat mistakes
- Fix effectiveness: Tracks which fixes actually resolved issues
- Prediction validation: Compares predicted risk against actual outcomes, auto-adjusts thresholds
- False-alarm tracking: Reduces noise by learning which anomalies are real
shipwright cost showPer-pipeline cost tracking with model pricing, budget enforcement, and ROI analysis. Adaptive model routing picks the cheapest model that meets quality targets.
shipwright dashboard startWeb dashboard with live pipeline progress, GitHub context (security alerts, contributors, deployments), DORA metrics, cost tracking, and context efficiency metrics. WebSocket-powered, updates in real-time.
shipwright webhook listenInstant issue processing via GitHub webhooks instead of polling. Register webhook with shipwright webhook register, receive events in real-time, process issues with zero-lag.
shipwright pr review <pr#>
shipwright pr merge <pr#>
shipwright pr cleanupFully automated PR management: review based on predictive risk and coverage, intelligent auto-merge when gates pass, cleanup stale branches. Reduces manual PR overhead by 90%.
shipwright fleet discover --org myorgScan a GitHub organization and auto-populate fleet config with all repos matching criteria (language, archived status, team ownership). One command instead of manual registry building.
ACID-safe state management replacing JSON files. Replaces volatile .claude/pipeline-artifacts/ with reliable database schema. Atomic transactions ensure no partial states, crash recovery automatic.
shipwright decompose analyze 42
shipwright decompose decompose 42AI-powered issue analysis: analyze scores complexity; decompose creates child issues with inherited labels/assignees and a dependency graph.
Cross-platform process supervision. Use systemd on Linux instead of tmux, same daemon commands:
shipwright launchd install # macOS launchd
# systemd service auto-generated on Linuxshipwright context gatherRich context injection for pipeline stages. Pulls together: contributor history, file hotspots, architecture rules, related issues, failure patterns. Injected automatically at each stage for smarter decisions.
Over 100 commands. Key workflows:
# Autonomous delivery
shipwright pipeline start --issue 42
shipwright daemon start --detach
# Agent teams
shipwright swarm status
shipwright recruit --roles builder,tester
shipwright standup
shipwright guild list
# Quality gates
shipwright code-review
shipwright security-audit
shipwright testgen
shipwright quality validate
# Observability
shipwright vitals
shipwright dora
shipwright stream
shipwright activity
# Multi-repo operations
shipwright fleet start
shipwright fix "feat: add auth" --repos ~/a,~/b,~/c
shipwright fleet-viz
# Release automation
shipwright version bump 2.4.0
shipwright changelog generate
shipwright deploys list
# Setup & maintenance
shipwright init
shipwright prep
shipwright doctor
shipwright upgrade --apply
# See all commands
shipwright --helpSee .claude/CLAUDE.md for the complete 100+ command reference organized by workflow. Full documentation: https://sethdford.github.io/shipwright.
24 team templates covering the full SDLC:
shipwright templates list| File | Purpose |
|---|---|
config/policy.json |
Central contract — risk tiers, merge policy, docs drift, browser evidence, harness SLAs |
config/policy.schema.json |
JSON Schema validation for the policy contract |
.claude/daemon-config.json |
Daemon settings, intelligence flags, patrol config |
.claude/pipeline-state.md |
Current pipeline state |
templates/pipelines/*.json |
8 pipeline template definitions |
tmux/templates/*.json |
24 team composition templates |
~/.shipwright/events.jsonl |
Event log for metrics |
~/.shipwright/costs.json |
Cost tracking data |
~/.shipwright/budget.json |
Budget limits |
~/.shipwright/github-cache/ |
Cached GitHub API responses |
| Requirement | Version | Install |
|---|---|---|
| tmux | 3.2+ | brew install tmux |
| jq | any | brew install jq |
| Claude Code CLI | latest | npm i -g @anthropic-ai/claude-code |
| Node.js | 20+ | For hooks and dashboard |
| Git | any | For installation |
| gh CLI | any | brew install gh (GitHub integration) |
100+ bash scripts (~100K lines), 125 shell test suites + 16 dashboard test files (141 total), plus E2E system test proving full daemon→pipeline→loop→PR flow. Dashboard at 98% coverage. Bash 3.2 compatible — runs on macOS and Linux out of the box.
Core Layers:
Pipeline Layer
sw-pipeline.sh # 12-stage delivery orchestration
sw-daemon.sh # Autonomous GitHub issue watcher
sw-loop.sh # Continuous multi-iteration build loop
Agent Layer (18 agents)
sw-swarm.sh # Dynamic agent team orchestration
sw-pm.sh # Autonomous PM coordination
sw-recruit.sh # Agent recruitment system
sw-standup.sh # Daily team standups
sw-guild.sh # Knowledge guilds
sw-oversight.sh # Quality oversight board
sw-strategic.sh # Strategic intelligence
sw-scale.sh # Dynamic team scaling
... 10 more agent scripts
Intelligence Layer
sw-intelligence.sh # AI analysis engine
sw-predictive.sh # Risk scoring + anomaly detection
sw-adaptive.sh # Data-driven pipeline tuning
sw-security-audit.sh # Security analysis
sw-code-review.sh # Code quality analysis
sw-testgen.sh # Test generation
sw-architecture.sh # Architecture enforcement
Operational Layer
sw-fleet.sh # Multi-repo orchestration
sw-ci.sh # CI/CD orchestration
sw-webhook.sh # GitHub webhooks
sw-incident.sh # Incident response
sw-release-manager.sh # Release automation
... 20+ operational scripts
Observability Layer
sw-vitals.sh # Pipeline health scoring
sw-dora.sh # DORA metrics dashboard
sw-activity.sh # Live activity streams
sw-replay.sh # Pipeline playback
sw-trace.sh # E2E traceability
sw-otel.sh # OpenTelemetry integration
... observability services
Infrastructure
sw-github-graphql.sh # GitHub GraphQL API client
sw-github-checks.sh # Native GitHub check runs
sw-github-deploy.sh # Deployment tracking
sw-memory.sh # Persistent learning system
sw-cost.sh # Cost intelligence
sw-db.sh # SQLite persistence
sw-eventbus.sh # Async event bus
Tools & UX
dashboard/server.ts # Real-time dashboard
sw-session.sh # tmux agent sessions
sw-status.sh # Team dashboard
sw-docs.sh # Documentation sync
sw-tmux.sh # tmux health management
Let Shipwright build it: Create an issue using the Shipwright template and label it shipwright. The autonomous pipeline will triage, plan, build, test, review, and create a PR.
Manual development: Fork, branch, then:
npm test # 125 shell suites + 16 dashboard test files (141 total), E2E system testMIT — Seth Ford, 2026.