Skip to content

sethdford/shipwright

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

591 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shipwright

Shipwright

The Autonomous Delivery Platform
From labeled GitHub issue to merged PR — with 18 new autonomous agents orchestrating every step.

Tests Pipeline 141 suites v3.2.4 MIT License Bash 3.2+


Table of Contents


Shipwright Builds Itself

This repo uses Shipwright to process its own issues. Label a GitHub issue with shipwright and the autonomous pipeline takes over: semantic triage, plan, design, build, test, review, quality gates, PR. No human in the loop.

See it live | Create an issue and watch it build.


Code Factory Pattern

Shipwright implements the complete Code Factory control-plane pattern — where agents write 100% of the code and the repo enforces deterministic, risk-aware checks before every merge. Every decision is traceable to policy. Every merge is backed by machine-verifiable evidence.

Agent writes code → Risk policy gate → Tier-appropriate CI → Code review agent
→ Findings auto-remediated → SHA-validated evidence → Bot threads cleaned → Merge
→ Incidents feed back into harness coverage

What makes Shipwright best-in-class

Code Factory Layer Shipwright Implementation
Single contract config/policy.json — risk tiers, merge policy, docs drift, evidence specs, harness SLAs in one file
Preflight gate risk-policy-gate.yml classifies risk from changed files before expensive CI runs
SHA discipline All checks, reviews, and approvals validated against current PR head — stale evidence is never trusted
Rerun writer sw-review-rerun.sh — SHA-deduped, single canonical writer, no duplicate bot comments
Remediation loop review-remediation.yml — agent reads findings, patches code, validates, pushes fix to same branch
Bot thread cleanup auto-resolve-threads.yml — resolves bot-only threads after clean rerun, never touches human threads
Evidence framework sw-evidence.sh — browser, API, database, CLI, webhook, and custom evidence with freshness enforcement
Harness-gap loop shipwright incident gap — every regression creates a test case with SLA tracking

Beyond the baseline

Shipwright extends the Code Factory pattern with capabilities most implementations don't have:

  • 12-stage pipeline with self-healing builds, adversarial review, and compound quality gates
  • Predictive risk scoring using GitHub signals (security alerts, contributor expertise, file churn)
  • Persistent memory — failure patterns, fix effectiveness, and prediction accuracy compound over time
  • Auto-learning — self-optimize runs automatically after every pipeline completion, including context efficiency tuning
  • Decision engine — tiered autonomous decisions with outcome learning and deduplication
  • Unified model routing — single source of truth for model selection across all components
  • Evidence-gated merges — SHA discipline ensures all evidence validated against current PR head
  • Semantic quality audits — Claude-powered audits with grep fallback when Claude unavailable
  • 18 autonomous agents with specialized roles (PM, reviewer, security auditor, test generator, etc.)
  • Cross-platform compatibility — portable date helpers, file_mtime, and compat layer for macOS/Linux
  • Fleet operations — the Code Factory pattern applied across every repo in your org
  • Cost intelligence — per-pipeline cost tracking, budget enforcement, adaptive model routing
  • Self-optimization — DORA metrics analysis auto-tunes daemon config and template weights
# Evidence framework — capture and verify all types
npm run harness:evidence:capture          # All collectors (browser, API, DB, CLI)
npm run harness:evidence:capture:api      # API endpoints only
npm run harness:evidence:capture:cli      # CLI commands only
npm run harness:evidence:capture:database # Database checks only
npm run harness:evidence:verify           # Verify manifest + freshness
npm run harness:evidence:pre-pr           # Capture + verify in one step

# Risk and policy
npm run harness:risk-tier

# Incident-to-harness loop
shipwright incident gap list
shipwright incident gap sla

Full Code Factory documentation


What's New in v3.2.4

Code Factory pattern — deterministic, risk-aware agent delivery with machine-verifiable evidence:

  • Risk policy gate — PR-level preflight classifies risk tier from changed files; blocks before expensive CI
  • SHA discipline — All evidence validated against current PR head SHA; stale evidence never trusted
  • Evidence framework — 6 collector types (browser, API, database, CLI, webhook, custom) with freshness enforcement
  • Review remediation — Agent reads review findings, patches code, validates, pushes fix commit in-branch
  • Auto-resolve bot threads — Bot-only PR threads cleaned up after clean rerun; human threads untouched
  • Harness-gap loop — Every incident creates a test case requirement with SLA tracking (P0: 24h, P1: 72h)
  • Policy contract v2 — Risk tiers, merge policy, docs drift rules, evidence specs, harness SLAs in one file

v2.3.1: Autonomous feedback loops, testing foundation, chaos resilience

v2.3.0: Fleet Command completeness overhaul + autonomous team oversight

v2.0.0: 18 autonomous agents, 100+ CLI commands, intelligence layer, multi-repo fleet, local mode


How It Works

graph LR
    A[GitHub Issue] -->|labeled 'shipwright'| B[Daemon]
    B --> C[Triage & Score]
    C --> D[Select Template]
    D --> E[Pipeline]

    subgraph Pipeline ["12-Stage Pipeline"]
        direction LR
        E1[intake] --> E2[plan] --> E3[design] --> E4[build]
        E4 --> E5[test] --> E6[review] --> E7[quality]
        E7 --> E8[PR] --> E9[merge] --> E10[deploy]
        E10 --> E11[validate] --> E12[monitor]
    end

    E --> E1
    E12 --> F[Merged PR]

    subgraph Intelligence ["Intelligence Layer"]
        I1[Predictive Risk]
        I2[Model Routing]
        I3[Adversarial Review]
        I4[Self-Optimization]
    end

    Intelligence -.->|enriches| Pipeline

    style A fill:#00d4ff,color:#000
    style F fill:#4ade80,color:#000
Loading

When tests fail, the pipeline re-enters the build loop with error context — self-healing like a developer reading failures and fixing them. Convergence detection stops infinite loops. Error classification routes retries intelligently.


Install

One-command install (recommended):

git clone https://github.com/sethdford/shipwright.git && cd shipwright && ./install.sh
Other methods

curl

curl -fsSL https://raw.githubusercontent.com/sethdford/shipwright/main/scripts/install-remote.sh | bash

npm (global)

npm install -g shipwright-cli

Verify

shipwright doctor

Quick Start

# One-command setup
shipwright init

# See what's running
shipwright status

# Process a GitHub issue end-to-end
shipwright pipeline start --issue 42

# Run daemon 24/7 with agent orchestration
shipwright daemon start --detach

# See live agent activity
shipwright activity

# Spin up agent team for manual work
shipwright session my-feature -t feature-dev

# View DORA metrics and pipeline vitals
shipwright dora

# Continuous build loop with test validation
shipwright loop "Build auth module" --test-cmd "npm test"

# Multi-repo operations
shipwright fleet start
shipwright fix "upgrade deps" --repos ~/a,~/b,~/c

# Release automation
shipwright version bump 2.4.0
shipwright changelog generate

Features

18 Autonomous Agents

Wave 1 (Organizational):

  • Swarm Manager — Orchestrates dynamic agent teams with specialization roles
  • Autonomous PM — Team leadership, task scheduling, roadmap execution
  • Knowledge Guild — Cross-team learning, pattern capture, mentorship
  • Recruitment System — Talent acquisition and team composition
  • Standup Automaton — Daily standups, progress tracking, blocker detection

Wave 2 (Operational Backbone):

  • Quality Oversight — Intelligent audits, zero-defect gates, completeness verification
  • Strategic Agent — Long-term planning, goal decomposition, roadmap intelligence
  • Code Reviewer — Architecture analysis, clean code standards, best practices
  • Security Auditor — Vulnerability detection, threat modeling, compliance
  • Test Generator — Coverage analysis, scenario discovery, regression prevention
  • Incident Commander — Autonomous triage, root cause analysis, resolution
  • Dependency Manager — Semantic versioning, update orchestration, compatibility checking
  • Release Manager — Release planning, changelog generation, deployment orchestration
  • Adaptive Tuner — DORA metrics analysis, self-optimization, performance tuning
  • Strategic Intelligence — Predictive analysis, trend detection, proactive recommendations

Plus 10+ specialized agents for observability, UX, documentation, and more.

12-Stage Delivery Pipeline

intake → plan → design → build → test → review → compound_quality → pr → merge → deploy → validate → monitor

Each stage is configurable with quality gates that auto-proceed or pause for approval. 8 pipeline templates:

Template Stages Use Case
fast intake → build → test → PR Quick fixes, score >= 70
standard + plan, design, review Normal feature work
full All 12 stages Production deployment
hotfix Minimal, all auto Urgent production fixes
autonomous All stages, all auto Daemon-driven delivery
enterprise All stages, all gated Maximum safety + rollback
cost-aware All stages + budget checks Budget-limited delivery
deployed All + deploy + validate + monitor Full deploy pipeline

Intelligence Layer

7 modules that make the pipeline smarter over time. Enabled by default: intelligence is on when Claude CLI is available, with optimization and prediction active out of the box. Set intelligence.enabled=false to disable. All modules degrade gracefully.

Module What It Does
Semantic Triage AI-powered issue analysis, complexity scoring, template selection
Pipeline Composer Generates custom pipeline configs from codebase analysis (file churn, test coverage, dependencies)
Predictive Risk Scores issues for risk using GitHub signals (security alerts, similar past issues, contributor expertise)
Adversarial Review Red-team code review — finds security flaws, edge cases, failure modes. Cross-checks against CodeQL/Dependabot alerts
Self-Optimization Reads DORA metrics and auto-tunes daemon config. Includes context efficiency closed loop for token budget tuning
Developer Simulation 3-persona review (security, performance, maintainability) before PR creation
Architecture Enforcement Living architectural model with violation detection and dependency direction rules

Adaptive everything: thresholds learn from history, model routing uses SPRT evidence-based switching, poll intervals adjust to queue depth, memory timescales tune based on fix effectiveness.

GitHub Deep Integration

Native GitHub API integration enriches every intelligence module:

API Integration
GraphQL File change frequency, blame data, contributor expertise, similar issues, commit history
Checks API Native check runs per pipeline stage — visible in PR timeline, blocks merges on failure
Deployments API Tracks deployments per environment (staging/prod), rollback support, deployment history
Security CodeQL + Dependabot alerts feed into risk scoring and adversarial review
Contributors CODEOWNERS-based reviewer routing, top-contributor fallback, auto-approve as last resort
Branch Protection Checks required reviews and status checks before attempting auto-merge

Decision Engine

The autonomous decision engine (config/policy.jsondecision section) handles routine operational decisions with outcome learning. Decisions are tiered by risk, with low-risk actions auto-approved and higher tiers escalated. The engine learns from outcomes to improve future decisions.

Context Engineering

Intelligent context window management for pipeline agents:

  • Budget-aware trimming — Configurable character budgets for prompt composition (context_budget_chars)
  • Section-level trimming — Independent limits for memory, git history, hotspot files, and test output
  • Context efficiency metrics — Tracks budget utilization and trim ratios per iteration
  • Self-tuning — The self-optimization loop analyzes context efficiency events and recommends budget adjustments

Autonomous Daemon

shipwright daemon start --detach

Watches GitHub for labeled issues and processes them 24/7:

  • Auto-scaling: Adjusts worker count based on CPU, memory, budget, and queue depth
  • Priority lanes: Reserve a worker slot for urgent/hotfix issues
  • Retry with escalation: Failed builds retry with template escalation (fast → standard → full)
  • Patrol mode: Proactively scans for security issues, stale deps, dead code, coverage gaps
  • Self-optimization: Tunes its own config based on DORA metrics over time

Fleet Operations

shipwright fleet start

Orchestrate daemons across multiple repositories with a shared worker pool. Workers rebalance based on queue depth, issue complexity, and repo priority.

Persistent Memory

The pipeline learns from every run:

  • Failure patterns: Captured and injected into future builds so agents don't repeat mistakes
  • Fix effectiveness: Tracks which fixes actually resolved issues
  • Prediction validation: Compares predicted risk against actual outcomes, auto-adjusts thresholds
  • False-alarm tracking: Reduces noise by learning which anomalies are real

Cost Intelligence

shipwright cost show

Per-pipeline cost tracking with model pricing, budget enforcement, and ROI analysis. Adaptive model routing picks the cheapest model that meets quality targets.

Real-Time Dashboard

shipwright dashboard start

Web dashboard with live pipeline progress, GitHub context (security alerts, contributors, deployments), DORA metrics, cost tracking, and context efficiency metrics. WebSocket-powered, updates in real-time.

Webhook Receiver

shipwright webhook listen

Instant issue processing via GitHub webhooks instead of polling. Register webhook with shipwright webhook register, receive events in real-time, process issues with zero-lag.

PR Lifecycle Automation

shipwright pr review <pr#>
shipwright pr merge <pr#>
shipwright pr cleanup

Fully automated PR management: review based on predictive risk and coverage, intelligent auto-merge when gates pass, cleanup stale branches. Reduces manual PR overhead by 90%.

Fleet Auto-Discovery

shipwright fleet discover --org myorg

Scan a GitHub organization and auto-populate fleet config with all repos matching criteria (language, archived status, team ownership). One command instead of manual registry building.

SQLite Persistence

ACID-safe state management replacing JSON files. Replaces volatile .claude/pipeline-artifacts/ with reliable database schema. Atomic transactions ensure no partial states, crash recovery automatic.

Issue Decomposition

shipwright decompose analyze 42
shipwright decompose decompose 42

AI-powered issue analysis: analyze scores complexity; decompose creates child issues with inherited labels/assignees and a dependency graph.

Linux systemd Support

Cross-platform process supervision. Use systemd on Linux instead of tmux, same daemon commands:

shipwright launchd install  # macOS launchd
# systemd service auto-generated on Linux

Context Engine

shipwright context gather

Rich context injection for pipeline stages. Pulls together: contributor history, file hotspots, architecture rules, related issues, failure patterns. Injected automatically at each stage for smarter decisions.


Commands

Over 100 commands. Key workflows:

# Autonomous delivery
shipwright pipeline start --issue 42
shipwright daemon start --detach

# Agent teams
shipwright swarm status
shipwright recruit --roles builder,tester
shipwright standup
shipwright guild list

# Quality gates
shipwright code-review
shipwright security-audit
shipwright testgen
shipwright quality validate

# Observability
shipwright vitals
shipwright dora
shipwright stream
shipwright activity

# Multi-repo operations
shipwright fleet start
shipwright fix "feat: add auth" --repos ~/a,~/b,~/c
shipwright fleet-viz

# Release automation
shipwright version bump 2.4.0
shipwright changelog generate
shipwright deploys list

# Setup & maintenance
shipwright init
shipwright prep
shipwright doctor
shipwright upgrade --apply

# See all commands
shipwright --help

See .claude/CLAUDE.md for the complete 100+ command reference organized by workflow. Full documentation: https://sethdford.github.io/shipwright.

Pipeline Templates for Teams

24 team templates covering the full SDLC:

shipwright templates list

Configuration

File Purpose
config/policy.json Central contract — risk tiers, merge policy, docs drift, browser evidence, harness SLAs
config/policy.schema.json JSON Schema validation for the policy contract
.claude/daemon-config.json Daemon settings, intelligence flags, patrol config
.claude/pipeline-state.md Current pipeline state
templates/pipelines/*.json 8 pipeline template definitions
tmux/templates/*.json 24 team composition templates
~/.shipwright/events.jsonl Event log for metrics
~/.shipwright/costs.json Cost tracking data
~/.shipwright/budget.json Budget limits
~/.shipwright/github-cache/ Cached GitHub API responses

Prerequisites

Requirement Version Install
tmux 3.2+ brew install tmux
jq any brew install jq
Claude Code CLI latest npm i -g @anthropic-ai/claude-code
Node.js 20+ For hooks and dashboard
Git any For installation
gh CLI any brew install gh (GitHub integration)

Architecture

100+ bash scripts (~100K lines), 125 shell test suites + 16 dashboard test files (141 total), plus E2E system test proving full daemon→pipeline→loop→PR flow. Dashboard at 98% coverage. Bash 3.2 compatible — runs on macOS and Linux out of the box.

Core Layers:

Pipeline Layer
  sw-pipeline.sh              # 12-stage delivery orchestration
  sw-daemon.sh                # Autonomous GitHub issue watcher
  sw-loop.sh                  # Continuous multi-iteration build loop

Agent Layer (18 agents)
  sw-swarm.sh                 # Dynamic agent team orchestration
  sw-pm.sh                    # Autonomous PM coordination
  sw-recruit.sh               # Agent recruitment system
  sw-standup.sh               # Daily team standups
  sw-guild.sh                 # Knowledge guilds
  sw-oversight.sh             # Quality oversight board
  sw-strategic.sh             # Strategic intelligence
  sw-scale.sh                 # Dynamic team scaling
  ... 10 more agent scripts

Intelligence Layer
  sw-intelligence.sh          # AI analysis engine
  sw-predictive.sh            # Risk scoring + anomaly detection
  sw-adaptive.sh              # Data-driven pipeline tuning
  sw-security-audit.sh        # Security analysis
  sw-code-review.sh           # Code quality analysis
  sw-testgen.sh               # Test generation
  sw-architecture.sh          # Architecture enforcement

Operational Layer
  sw-fleet.sh                 # Multi-repo orchestration
  sw-ci.sh                    # CI/CD orchestration
  sw-webhook.sh               # GitHub webhooks
  sw-incident.sh              # Incident response
  sw-release-manager.sh       # Release automation
  ... 20+ operational scripts

Observability Layer
  sw-vitals.sh                # Pipeline health scoring
  sw-dora.sh                  # DORA metrics dashboard
  sw-activity.sh              # Live activity streams
  sw-replay.sh                # Pipeline playback
  sw-trace.sh                 # E2E traceability
  sw-otel.sh                  # OpenTelemetry integration
  ... observability services

Infrastructure
  sw-github-graphql.sh        # GitHub GraphQL API client
  sw-github-checks.sh         # Native GitHub check runs
  sw-github-deploy.sh         # Deployment tracking
  sw-memory.sh                # Persistent learning system
  sw-cost.sh                  # Cost intelligence
  sw-db.sh                    # SQLite persistence
  sw-eventbus.sh              # Async event bus

Tools & UX
  dashboard/server.ts         # Real-time dashboard
  sw-session.sh               # tmux agent sessions
  sw-status.sh                # Team dashboard
  sw-docs.sh                  # Documentation sync
  sw-tmux.sh                  # tmux health management

Contributing

Let Shipwright build it: Create an issue using the Shipwright template and label it shipwright. The autonomous pipeline will triage, plan, build, test, review, and create a PR.

Manual development: Fork, branch, then:

npm test    # 125 shell suites + 16 dashboard test files (141 total), E2E system test

License

MIT — Seth Ford, 2026.

About

Orchestrate fully autonomous Claude Code agent teams. Delivery pipelines, fleet operations, DORA metrics, and auto-scaling workers. From GitHub issue to deployed PR — zero human intervention.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors