set-core

An experiment in structured AI development — specs, quality gates, and parallel agents applied to Claude Code.

I use Claude Code every day. It's great for writing code, but shipping software — coordinating parallel agents, testing, merging, recovering from failures — needs more structure. set-core is what I built around it: give it a markdown spec, it decomposes into independent changes, dispatches parallel agents in git worktrees, runs quality gates on each, and merges the results.

Every change goes through OpenSpec — a structured workflow (proposal → design → spec → tasks → code → verify) that gives agents contracts to work against instead of prompts to interpret. Quality gates are deterministic: exit codes, not LLM judgment.

Built with set-core, using set-core. This project was developed using its own orchestration pipeline.

This is an early alpha — a working experiment, not a finished product. It works well enough that I use it daily, but expect rough edges. The sentinel auto-recovers from most of them. See docs/release/alpha-release.md for the current state, known issues, and what's not yet implemented.

Open source (MIT) — in case it's useful to someone else too.

Read the FAQ — how SET compares to Cursor, Devin, Kiro, Copilot, Augment Intent, and others. Honest comparison — what SET does well, where others are ahead.

Repositories

Repository	Description
set-core	Core orchestration engine, web module, dashboard, CLI tools
set-spec-capture	Capture specs from any source (web, PDF, conversation)
set-voice-agent-delivery	Voice agent project type — Soniox STT, Google TTS, spec-driven customer interaction

See It In Action

An agent working on a change — debugging, testing, fixing:

Input: A markdown spec + a v0.app design export (or any Next.js-shaped design source)

Orchestration: Phased execution, dependency DAG, quality gates on every change

Visibility: See where every minute goes — agent sessions, LLM calls, tool executions, sub-agents, gates — all placed on a real-time axis. Click any implementing span to drill into its per-tool breakdown.

Activity timeline (top) — every change as a row, every gate and LLM call placed in time. Drilldown (below) — one implementing span expanded into per-tool, per-LLM-call, per-sub-agent breakdown with the longest operations called out.

Output: Running application — built entirely from the spec, faithful to the design source

Spec + v0 design → parallel agents → quality gates (including design-fidelity) → working app. Zero intervention.

The Pipeline

spec.md ──► digest ──► decompose ──► parallel agents ──► verify ──► merge ──► done

What's actually happening under the hood

spec.md + design source (v0.app export → manifest, or design-snapshot.md)
  │
  ▼
┌───────────────────────────────────────────────────────────┐
│ Sentinel (autonomous supervisor)                          │
│  ├─ digests spec into requirements + domain summaries     │
│  ├─ decomposes into independent changes (DAG)             │
│  ├─ dispatches each to its own git worktree               │
│  ├─ monitors progress, restarts on crash                  │
│  ├─ merges verified results back to main                  │
│  └─ auto-replans until full spec coverage                 │
│                                                           │
│  Per change:                                              │
│  ┌──────────────────────────────────────────────────┐     │
│  │ Ralph Loop                                       │     │
│  │  ├─ OpenSpec artifacts (proposal → design → code) │     │
│  │  ├─ iterative implementation with tests           │     │
│  │  ├─ progress-based trend detection                │     │
│  │  └─ auto-pause on stall or budget limit           │     │
│  └──────────────────────────────────────────────────┘     │
│                                                           │
│  Quality gates (per change, before merge):                │
│  ┌──────────────────────────────────────────────────┐     │
│  │ Jest/Vitest → Build → Playwright E2E             │     │
│  │ → Code Review → Spec Coverage → Smoke Test       │     │
│  │ (gate profiles: per-change-type configuration)    │     │
│  └──────────────────────────────────────────────────┘     │
│                                                           │
│  Across all agents:                                       │
│  ┌──────────────────────────────────────────────────┐     │
│  │ Memory Layer                                     │     │
│  │  ├─ 5-layer hooks inject context per tool         │     │
│  │  ├─ agents learn from each other's work           │     │
│  │  └─ conventions survive across sessions           │     │
│  └──────────────────────────────────────────────────┘     │
└───────────────────────────────────────────────────────────┘
  │
  ▼
merged, tested, done

Key Features

	Feature	Description
⚙️	Full Pipeline	Spec to merged code — digest, decompose, dispatch, verify, merge — hands-off. Guide
🛡️	Quality Gates	Test, build, E2E, code review, spec coverage, and smoke — deterministic, not LLM-judged. Guide
🧠	Persistent Memory	Hook-driven cross-session recall — agents learn from each other. Infrastructure saves, not voluntary. Guide
📊	Web Dashboard	Real-time monitoring — orchestration state, agents, tokens, issues, learnings. Guide
📋	OpenSpec Workflow	Structured artifact flow (proposal → design → spec → tasks → code) minimizes hallucination. Guide
🔧	Self-Healing	Issue pipeline: detect → investigate → fix → verify. The sentinel diagnoses before it acts. Guide
🧩	Plugin System	Project-type plugins add domain rules, gates, templates, and conventions. Docs
🎨	Design Bridge	v0.app export → manifest → per-change design slice → Tailwind tokens + shell components injected into every agent's context. Guide
📈	Cross-Run Learnings	Review findings and gate failures become rules for the next run. The system gets better with use. Dashboard
🔁	Account Manager	Manage multiple Claude Code accounts — register, monitor usage, manually switch. Docs

Consumer Feedback Loop

set-core is battle-tested through consumer projects — E2E runs that exercise the full pipeline from spec to merged app. During these runs, agents discover and fix real problems: build gate failures, middleware bugs, test flakiness. These fixes are framework-level insights that belong in set-core, not just in the consumer project.

set-core (framework)                    consumer project (E2E run)
   │                                        │
   ├── set-project init ───────────────────►│  deploy rules, templates, gates
   │                                        │
   │                                        ├── agents build features
   │                                        ├── gates catch failures
   │                                        ├── ISS pipeline creates fixes  ◄── valuable!
   │                                        │
   │◄── set-harvest ───────────────────────┤  review + adopt framework fixes
   │                                        │
   ├── update planning rules, templates     │
   ├── set-project init ───────────────────►│  redeploy with improvements

After every E2E run, harvest the fixes:

set-harvest                          # scan all registered projects
set-harvest --project craftbrew-run-20260320-1445 # scan single project
set-harvest --dry-run                # preview without updating state

The harvest tool scans ISS fix commits chronologically, classifies them as framework-relevant or project-specific, and suggests where to adopt them (planning rules, templates, or core code). Each fix is reviewed interactively — no auto-adoption.

Custom projects: The harvest command runs from the set-core repo, not the consumer project. After an orchestration run completes on any registered project, switch to the set-core directory and run set-harvest. This is a manual step — the system cannot auto-adopt fixes without human review.

cd /path/to/set-core          # switch to set-core repo
set-harvest                    # review all registered projects
set-harvest --project my-app   # review single project

Where We're Heading

Active development priorities:

Direction	Goal	Status
Divergence reduction	Eliminate remaining nondeterminism through template optimization, scaffold testing, and configuration distribution across core → module → scaffold → project layers	Measurably reduced for simple projects; complex projects still improving — tracked across paired E2E runs
Build time optimization	Reduce gate pipeline wall clock time — parallel gate execution, incremental builds, cached test results between changes	Currently sequential (Jest → Build → E2E → Review); exploring parallel gates where safe
Session context reuse	Reuse conversation context across Ralph Loop iterations and between related changes — reduce cold-start token overhead	Currently each iteration starts fresh; investigating warm-start from previous iteration's state
Memory optimization	Smarter recall — relevance scoring, dedup, consolidation. Lite-mode hooks for low-context sessions	Dedup, consolidation, and `SET_MEMORY_HOOKS=lite` mode operational; learning-to-rule conversion deferred
Gate intelligence	Per-change-type gate profiles that adapt based on historical pass rates and failure patterns	Gate profiles operational; adaptive thresholds planned
Merge conflict prevention	Proactive detection of cross-cutting file conflicts before they happen — schedule conflicting changes sequentially	Phase ordering works; file-level conflict prediction in research

See docs/learn/journey.md for the full development history and docs/learn/lessons-learned.md for production insights driving these priorities.

What We're Measuring

AI agents are nondeterministic — run the same prompt twice, get different results. The experiment here: does adding structure (specs, gates, templates) make the output converge?

Challenge	Our Approach	Result
Output divergence	3-layer template system — templates lock structure, agents focus on logic	83–87% structural convergence across paired runs (report)
Hallucination	OpenSpec workflow — structured artifacts with requirements + acceptance criteria	Agents implement against spec, not imagination
Quality roulette	Programmatic gates — exit codes, not LLM judgment. 7 gate types	Deterministic pass/fail
Spec drift	Coverage tracking — verifies "does it satisfy the spec?" not just "do tests pass?"	Auto-replan when coverage < 100%
Failure recovery	Issue pipeline — detailed investigation before any fix. No guessing.	30-second recovery, not hours
Agent amnesia	Hook-driven memory — shared across worktrees, survives sessions	Zero voluntary saves → 100% capture via hooks
Framework reliability	E2E scaffold testing — the orchestrator tests itself	30+ runs across 4 project scaffolds

Structural convergence across paired runs: 83% minishop, 87% micro-web. Schema equivalence at 100%, convention compliance at 100%. The remaining divergence is stylistic, not structural. Not perfect, but converging. Full data →

The Spec Is the Bottleneck

Writing a good spec takes effort. That's intentional — the quality of the output depends on the quality of the input. The trade-off SET makes: upfront structure for reliable results.

A good spec for SET includes:

Data model — entities, fields, relationships, enums (becomes the Prisma schema)
Page layouts — sections, column counts, component names (not vague descriptions)
Design tokens — exact hex colors, font families, spacing values (or use v0.app + set-design-import to pull a working Tailwind/shadcn export)
Auth & roles — protected routes, user roles, registration flow
Seed data — realistic names and content, not "Product 1"
i18n — locales, framework, URL structure (if multilingual)
Business requirements — what the user should be able to do, with acceptance criteria

Use /set:write-spec in Claude Code to generate a structured spec interactively — it detects your project type, asks targeted questions per section, and integrates with your design source (v0.app export or Figma). Works for web apps, APIs, CLI tools, and any project type.

The project type templates handle the rest — framework boilerplate, build config, test setup, linting rules. You focus on what to build. The templates ensure how it gets built is consistent and deterministic.

The better the spec, the better the result. Agents working from a detailed spec produce dramatically better output than agents working from a conversation.

See the Writing Specs guide for the full methodology, the CraftBrew scaffold for a production-quality example with Figma design integration, and the MiniShop spec for a minimal but complete example.

Quick Start

Step 1: Install

git clone ssh://git@git.setcode.dev:2222/root/set-core.git
cd set-core && ./install.sh

After install, the web dashboard starts automatically as a background service (launchd on macOS, systemd on Linux). Open http://localhost:7400 — you should see the manager page.

Step 2: Try an E2E test first

Before setting up your own project, see the full pipeline in action. Open a Claude Code session from the set-core directory and type:

run a micro-web E2E test

Claude will scaffold a project, register it with the manager, and validate the gate pipeline. Then tell Claude to start the sentinel — the orchestration runs through the manager API at http://localhost:7400.

Watch the dashboard as it progresses — you'll see phases, gate results, token usage, and the final application. The micro-web test builds a simple 5-page site (home, about, blog, contact) in ~20 minutes.

When the orchestration completes, tell Claude:

start the application that was just built

Claude will install dependencies, start the dev server, and open the app in your browser.

For a more complex test, try the MiniShop — a full e-commerce app (products, cart, admin panel, auth) built from a detailed spec with Figma design:

run a minishop E2E test

Step 3: Set up your own project

Now that you've seen how it works, set up orchestration for your own project:

cd ~/my-project
set-project init --project-type web --template nextjs

Write your spec — use /set:write-spec in Claude Code for an interactive guide that detects your project and asks the right questions:

/set:write-spec

Sync your design (optional but recommended) — pull a v0.app export and let set-core scan it for shell components, tokens, and hygiene issues:

set-design-import --git <v0-repo-url> --ref main --scaffold .
set-design-hygiene                          # report mock arrays, hardcoded strings, broken routes

The design-fidelity gate runs on every UI change and blocks merge if the agent's output diverges from the v0 export's component structure or tokens.

Start the orchestration — use /set:start in Claude Code, or start from the dashboard:

/set:start docs/spec.md

See docs/guide/writing-specs.md for the complete spec-writing methodology, docs/guide/design-integration.md for the design source → agent pipeline, and docs/guide/quick-start.md for the full setup walkthrough.

Technology

Core orchestration:

Component	Technology
Agent runtime	Claude Code (Anthropic)
Workflow	OpenSpec — spec-driven artifact pipeline
Isolation	Git worktrees — real branches, real merges
Engine	Python, FastAPI, uvicorn
Dashboard	React, TypeScript, Tailwind CSS
Memory	shodh-memory (RocksDB + vector embeddings)
Design bridge	v0.app export → `set-design-import` → manifest + per-change design slice → agent context, with design-fidelity gate at merge

Tooling ecosystem:

Tool	Purpose
set-spec-capture	Capture specs from any source (web, PDF, conversation)
set-design-import	Pull a v0.app export, generate the design manifest, optionally run hygiene scan
set-design-hygiene	Scan a design export for 9 antipatterns (mock arrays, hardcoded strings, broken routes, …)
set-run-logs	Forensic CLI for completed orchestration runs — events, gate timing, agent decisions
set-e2e-report	Generate benchmark reports from orchestration runs
set-router	Manage multiple Claude Code accounts — register, switch, monitor usage (docs)

Built-in modules add domain-specific technology (Next.js, Prisma, Playwright for web; Soniox STT, Google TTS for voice). See Plugins.

Built & Battle-Tested

set-core is a framework with a plugin system. The core orchestration engine is open source. Project types — domain-specific rules, templates, and conventions — can be public or private.

The web project type (Next.js, Prisma, Playwright) ships built-in and is validated through synthetic E2E orchestration runs that simulate real development environments — reproducible, measurable, tracked.

Custom project types in development include voice agent delivery (Soniox TTS/STT with spec-driven customer interaction), and others not yet public. The plugin architecture lets anyone create their own domain-specific type with custom gates, templates, and conventions.

Metric	Value
Commits	1,870 (across 88 days)
Capability specs	429
Active changes	5 in flight, 424 archived
Codebase	109K LOC (89K Python, 20K TypeScript) + Shell, specs, docs, templates
Built-in modules	`web` (Next.js + Prisma + Playwright), `example` (reference plugin)
MiniShop benchmark	6/6 merged, 0 interventions, 1h 45m
Latest milestones	v0.app design pipeline, design-fidelity gate, fix-iss circuit-breaker, forensics CLI, USD cost metrics

Worth every hour. Full journey, benchmarks, and lessons: docs/learn/journey.md

Why This Matters

Single-agent was the start. Orchestration is the present. Enterprise is preparing.

Systems like SET can do the work of a full development team — given the right specification and properly developed project types. Period.

This is not the future. This is the present. The sooner we move in this direction, the sooner we'll see what software development actually becomes — instead of clinging to the assumption that manual development or even manual code review should remain the default.

Don't blame the model

Claude Code is extraordinarily capable. But we can't expect it to guess what we haven't specified. When the model fills in gaps we left empty, we call it "hallucination" — but most of the time, it's underspecification on our side. The frustration of repeated errors, unexpected behavior, inconsistent output — 90% of this comes from insufficient context, not model limitation.

And even with detailed, multi-hundred-page specifications, we cannot guarantee that the model treats every character with equal priority during implementation. This is precisely why systems like SET exist: to enforce, verify, and check that the model understood and executed what we intended. OpenSpec structures the work. Quality gates verify the output. The sentinel investigates before it fixes. No guessing.

Enterprise is next

Banks, government, defense, regulated industries — they can't use cloud-hosted models today. But this will change. On-premise models, secure multi-tenant systems, hybrid architectures — the infrastructure is coming. Some organizations are already preparing.

The responsible thing for every enterprise is to prepare now — before the models arrive on their infrastructure. Learn orchestration patterns. Build project types for your domain. Develop the muscle memory for spec-driven development. Don't wait for the infrastructure to be ready; be ready when it arrives.

This needs a community

We built set-core to solve our own problems. The web project type is public and battle-tested. But the real power comes from domain-specific project types — fintech with IDOR rules, healthcare with HIPAA compliance, e-commerce with payment flow gates.

Model providers (Anthropic included) will build orchestration into their platforms — we welcome that. These middleware layers are destined to become first-party features. But that doesn't mean we shouldn't build them ourselves: this is how we shape the tools to our needs and accumulate knowledge that vendor-generic solutions can't provide.

Start now. There will be bugs. But this is a self-healing system — the sentinel detects, investigates, and fixes. The more people use it, the faster it improves.

Join us → · Email

Documentation

Section	Contents
Guide	Quick start, writing specs, design integration, orchestration, sentinel, worktrees, OpenSpec, memory, dashboard
Reference	CLI tools, configuration, architecture, plugins
Learn	How it works, development journey, benchmarks, lessons learned
Research	Dated deep-dives: token optimization, cache tier analysis, divergence studies, framework comparisons
Examples	MiniShop walkthrough, first project setup
Deep Dive	18-chapter technical reference covering every pipeline stage
Contributing	Dev setup, testing, plugin development, code style

License

MIT — See LICENSE for details.

Website: setcode.dev · Source: git.setcode.dev/root/set-core

Name		Name	Last commit message	Last commit date
Latest commit History 1,940 Commits
.claude		.claude
.github		.github
assets		assets
benchmark		benchmark
bin		bin
contrib/systemd		contrib/systemd
docs		docs
gui		gui
lib		lib
mcp-server		mcp-server
modules		modules
openspec		openspec
scripts		scripts
set_tools		set_tools
templates		templates
tests		tests
tools		tools
web		web
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
install.ps1		install.ps1
install.sh		install.sh
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

set-core

Repositories

See It In Action

The Pipeline

Key Features

Consumer Feedback Loop

Where We're Heading

What We're Measuring

The Spec Is the Bottleneck

Quick Start

Step 1: Install

Step 2: Try an E2E test first

Step 3: Set up your own project

Technology

Built & Battle-Tested

Why This Matters

Don't blame the model

Enterprise is next

This needs a community

Documentation

License

About

Uh oh!

Releases 23

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

set-core

Repositories

See It In Action

The Pipeline

Key Features

Consumer Feedback Loop

Where We're Heading

What We're Measuring

The Spec Is the Bottleneck

Quick Start

Step 1: Install

Step 2: Try an E2E test first

Step 3: Set up your own project

Technology

Built & Battle-Tested

Why This Matters

Don't blame the model

Enterprise is next

This needs a community

Documentation

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages