Building The AI Coding Tool That Actually Works

A complete architecture for a multi-agent vibe coding platform — designed to solve every problem the current generation of tools got wrong.

"The best tools don't just do the job. They understand the job, remember the job, and protect you while doing it."

Introduction — What This Is & What It Contains
The Problems Every Existing Tool Has
How Each Problem Is Solved
How The System Works — The Full Pipeline
The Core Innovations — Multi-Agent Team, LLM Routing & Living Feature Tree
All 12 Agents — Roles, Facilities & Rules
Why This Beats The Competition
Full Theoretical Example — Building An E-Commerce App From Zero
Project Status & What's Next

1. Introduction — What This Is & What It Contains

In early 2026, the AI coding tool market is loud, crowded, and deeply broken.

Bolt runs out of tokens mid-project and charges you for the failure. Lovable breaks its own code and bills you to fix what it broke. Cursor is powerful but only if you already know how to code. Emergent costs $200 a month and still doesn't remember what you built last Tuesday. Every single one of them hands you a shiny frontend, forgets your entire project the moment you close the tab, and leaves you alone when the real errors start.

This isn't bad luck. It's bad architecture.

This Gist documents a different approach — a multi-agent AI coding system built from the ground up to solve the problems every existing tool has failed to address. Not incrementally better. Fundamentally different in how it thinks about code, memory, testing, security, and the relationship between the user and the machine.

What Makes This Different

Most AI coding tools are a single model with a chat box in front of it. You type, it generates, you hope. When something breaks, you type again. When it forgets the context, you paste everything back in. When it gets stuck, you start over.

This system works like a real engineering team.

There is a Leader that understands your intent and manages the work. A Builder that handles all backend logic. A UI/UX agent that handles everything the user sees. A Tester that checks the work after every task — not just at the end. A Security agent that scans for vulnerabilities in real time. A Memory Keeper that knows everything about your project and never forgets. And more — twelve specialized agents in total, each with a defined role, defined rules, and a defined way of working with the others.

On top of that, there is a Living Feature Tree — a visual diagram of your entire project that builds itself as you talk. Every feature you add becomes a branch. Every new requirement becomes a new branch off that branch. The whole project is always visible, always up to date, and always honest about what exists and what is still being built.

And underneath all of it, there is a smart LLM routing system that sends small tasks to cheap, fast models and complex architectural decisions to the most capable ones. The result is a system that is significantly cheaper to run than competitors while producing better output.

Who This Is For

Non-technical builders who want to describe an app in plain language and watch it get built — without hitting a wall every time something goes wrong.

Technical developers who want the speed of AI generation without losing control of architecture, code quality, or security.

Founders and indie hackers who need to ship a real, production-grade product fast without paying $200 a month for a tool that forgets everything.

Anyone who has been burned by Bolt's token explosions, Lovable's credit loops, Cursor's lack of visual preview, or any other tool that got them halfway there and stopped.

Figure 1 — Current single-model tools compared to the 12-agent architecture of this system.

What This Gist Contains

This is a complete, honest document. It does not skip the hard parts.

Section 2 maps every real problem found across all major AI coding tools — both the common failures that every tool shares and the unique failures specific to individual platforms. This is the problem statement.

Section 3 maps each of those problems directly to the part of this architecture that solves it. Problem → Solution, one by one.

Section 4 explains how the full system works as a pipeline — what happens from the moment a user types a message to the moment output is produced. This is the engine room.

Section 5 goes deep on the three core innovations that make this system different from anything currently on the market: the multi-agent team structure, the LLM routing system, and the Living Feature Tree.

Section 6 documents all twelve agents in full — what each one does, its special capabilities, and the rules it operates by. This is the team manual.

Section 7 puts this system directly against the competition — Bolt, Lovable, Cursor, Windsurf, Replit, GitHub Copilot, and Emergent — and shows specifically where each one fails and how this architecture handles the same situation.

Section 8 is the most important section for understanding how this works in practice. It walks through a complete theoretical example — building a real e-commerce application from the first message to deployed product — showing exactly which agent does what, at what point, and what the system protects against at every step.

Section 9 covers where this project currently stands and what is being built next.

A note on honesty: This document describes an architecture, not a finished product. The ideas here are built on real research, real competitive analysis, and real engineering thinking. The goal of documenting it at this stage is to build in public — to put the thinking on record, invite feedback, and share the process of building something genuinely new in a space full of noise.

2. The Problems Every Existing Tool Has

Before building anything new, you have to be honest about what is actually broken. Not what sounds broken. Not what makes good marketing copy. What real builders hit, repeatedly, across real projects, on real tools.

This section documents those problems — drawn from direct competitive analysis of seven major AI coding tools in 2025 and 2026: Bolt.new, Lovable, Cursor, Windsurf, Replit, GitHub Copilot, and Emergent.

The problems fall into two categories. Some are shared by every tool — structural failures baked into how this generation of AI coding tools was designed. Others are unique to specific platforms — failures that emerge from particular architectural or business decisions each tool made.

Both matter. Universal problems tell you what the industry got wrong. Unique problems tell you what each individual team got wrong on top of that.

The Two Categories

Category A — Universal Problems

Found across all or almost all tools. These are not bugs. They are the natural result of building on a single-model, stateless architecture.

Problem 1 — The Tool Forgets Everything

Close the tab. Come back tomorrow. Start a new conversation.

Every AI coding tool on the market treats each session as if your project never existed before. There is no memory of what libraries you chose, what your database schema looks like, what components have already been built, or what decisions were made three sessions ago. You spend a significant portion of every session re-explaining your own project to a tool that helped you build it.

This is not a minor inconvenience. On anything larger than a weekend prototype, context loss is the primary reason projects stall. The tool can no longer hold the whole picture in its head, so it starts making contradictory decisions — adding a library that conflicts with one already installed, regenerating a component that already exists, suggesting a database structure that breaks the one already in place.

Why it happens: Every major tool is built on a single stateless model. There is no persistent layer that tracks project state between sessions. Each conversation starts fresh.

Problem 2 — Generated Code Is Never Production-Ready

Every tool promises to build your app. Every tool delivers something that works in a demo.

The gap between a working demo and a shippable product is where builders lose weeks. AI-generated code consistently skips input validation, ignores edge cases, misses error handling, leaves security holes open, and produces patterns that become unmaintainable the moment the project grows beyond a handful of files.

This is not a criticism of any one tool. It is the current ceiling of single-model generation. A model that generates code has no skin in the game — it does not run the code, face the users, or deal with the consequences of what it produced. The result is code that looks complete and breaks under real conditions.

Why it happens: Generation without execution and verification. The model produces output but has no closed feedback loop to check whether that output actually holds up under real conditions.

Problem 3 — Quality Degrades With Every Iteration

The first three prompts on any tool produce the best results. By prompt fifteen, the codebase is a mess.

Naming conventions drift. Dead code accumulates. Components that were clean and consistent at the start become tangled with patches, workarounds, and contradictions. The more you build with the tool, the worse the code quality becomes — not because the model gets worse, but because the growing, messy codebase becomes the context the model is working from, and it inherits the mess.

Why it happens: AI models condition their output on their input. A messy, inconsistent codebase produces messy, inconsistent suggestions. There is no architectural guardian maintaining coherence over time.

Problem 4 — Debugging Loops Burn Resources and Solve Nothing

Ask the tool to fix a bug. It introduces a new one. Ask it to fix that one. It reintroduces the first. This cycle — documented extensively across Bolt, Lovable, Replit, and Emergent — can run for dozens of turns without resolution.

Every turn costs tokens or credits. The user pays for every failed attempt. The tool never stops to question whether its approach to the problem is fundamentally wrong. It just tries again, slightly differently, with the same broken mental model.

Why it happens: No root cause analysis. The model treats each error message as a new prompt rather than building a coherent diagnosis of the underlying problem. There is no meta-level reasoning about why the fix keeps failing.

Problem 5 — Security Is an Afterthought

A study published in April 2025 found 170 out of 1,645 apps built with one popular AI coding tool had active data exposure vulnerabilities. The apps were publicly accessible. The user data was not protected. The tool that built them raised no flag.

This is not an isolated finding. Exposed API keys, missing authentication on protected routes, absent input sanitization, broken CORS configuration — these appear consistently across AI-generated codebases because no current tool has a dedicated, mandatory security review step in its pipeline.

Why it happens: Security review is not part of the generation loop. The model generates code that works — in the sense that it runs — without checking whether it is safe.

Problem 6 — Multi-File Changes Break Things

Change a database schema. Rename a function. Update an API contract.

These are normal development operations. Every current AI tool handles them badly. It updates some files and misses others. It renames the function in the backend and forgets the frontend still calls it by the old name. It changes the API response shape and leaves the component that parses that response unchanged.

The result is a codebase that compiles but breaks at runtime, in ways that are often non-obvious and time-consuming to trace.

Why it happens: Most tools process files in isolation. They do not maintain a live dependency map showing which parts of the codebase depend on which other parts. Without that map, there is no way to know what a change will affect.

Problem 7 — No Real Collaboration Exists

Every major tool is built for a single user. Team plans exist, but genuine simultaneous co-editing — multiple people prompting the same project at the same time — is either absent or produces conflicts no tool handles well. In 2026, where teams build together, this is a structural gap nobody has closed.

Problem 8 — Speed Is an Illusion

A METR study from July 2025 tested experienced developers using the best AI coding tools available and found they took 19% longer to complete tasks than without the tools — despite feeling 20% faster.

The time saved generating code is spent reviewing it, debugging it, re-explaining context, and correcting errors the tool introduced. The feeling of speed is real. The actual speed is not.

Why it happens: The cognitive cost of managing an AI collaborator — verifying its output, correcting its misunderstandings, re-establishing context — is invisible in the moment but significant in aggregate.

Category B — Unique Problems

Found in specific tools only. These emerge from particular architectural or business decisions.

Problem	Tool	What Happens
Token explosion	Bolt.new	Debugging spirals consume millions of tokens. Users report spending $1,000+ on a single project that never resolves. Billing continues even when the agent is clearly stuck.
Credits charged for AI's own mistakes	Lovable	When Lovable breaks your code during an edit, you pay credits to fix what it broke. The billing model penalizes users for the tool's failures.
Prompt injection exploit	Lovable	Documented in April 2025 — attackers could manipulate the AI into generating backdoors in user code through crafted prompts. A platform-level security failure, not a user error.
Permanent context wall	Bolt.new	Projects that grow large enough hit a hard token ceiling and become completely unrecoverable. Unlike Cursor, there is no escape hatch. The project is dead.
Rollback corruption	Bolt.new	Rolling back to a previous version can corrupt the current state, making recovery impossible.
Preview works, production breaks	Bolt.new	Code runs perfectly in Bolt's sandbox but fails in real deployment due to CORS policies, missing environment variables, or dependency issues the sandbox does not enforce.
Auto-save before approval	Windsurf	Cascade saves AI-generated changes to the filesystem before the user reviews them. A misunderstood task writes unwanted files before the user can stop it.
Agent goes rogue on vague prompts	Windsurf	Cascade makes sweeping, confident changes across many files simultaneously when given ambiguous instructions. A single bad misinterpretation causes extensive damage quickly.
API hallucination	GitHub Copilot	Suggests calls to functions, methods, and APIs that do not exist — either fabricated entirely or from outdated library versions. The user accepts the suggestion, runs the code, and discovers the function never existed.
License contamination	GitHub Copilot	Documented cases of reproducing GPL-licensed code verbatim in user projects. GitHub added a filter, but the legal risk remains the most prominent of any tool on this list.
Runtime-only context	Replit	Replit Agent understands code by running it, not by statically indexing it. This makes it genuinely poor at modifying or refactoring large existing codebases — it cannot understand file relationships without execution.
Multi-agent coordination conflicts	Emergent	When multiple agents working in parallel produce conflicting implementations, the system does not always resolve the conflict cleanly. It can merge both versions, producing code that compiles but behaves unpredictably.
$200/month pricing cliff	Emergent	The jump from free to Pro is a hard wall. No meaningful mid-tier exists. Serious indie builders who cannot justify enterprise pricing are effectively excluded.
No visual preview	Cursor, Copilot	Neither tool has any built-in live preview. Frontend work requires managing a separate dev server and browser window manually — a friction that compounds over a long session.

The Honest Summary

Across all seven tools, research surfaces the same pattern:

Every tool is good at starting. No tool is good at continuing.

The first version of something is where AI coding tools shine. The third, fifth, and tenth iterations — where real products actually live — is where they all fall apart. Memory collapses. Code quality degrades. Security gaps widen. Debugging loops run. Costs accumulate.

The tools were built to impress in demos. Real projects are not demos.

What this means for the architecture in this document: Every problem listed above has a specific solution built into the system described in the sections that follow. None of them are hand-waved away. Section 3 maps each problem directly to its solution, one by one.

Figure 2 — Universal and unique failure points mapped across all seven tools analysed.

3. How Each Problem Is Solved

Figure 3 — Every identified problem maps directly to a dedicated agent or mechanism.

A problem list without solutions is just a complaint. This section closes that gap.

Every problem documented in Section 2 has a specific, named solution in this architecture. Not a workaround. Not a prompt engineering trick. A structural solution — a part of the system designed specifically to prevent that failure from happening in the first place.

This section maps each problem directly to its solution. For each one: what the problem is, what the solution is, and which part of the system delivers it.

How to Read This Section

Each entry follows the same structure:

The problem — a one-line restatement of what breaks
The solution — what this architecture does instead
Who delivers it — which agent or system component is responsible
Why this works — the reasoning behind the solution, not just the claim

Universal Problems — Solved

Problem 1 — The Tool Forgets Everything

Every session starts from zero. Context dies when the tab closes.

Solution: The Memory Keeper Agent

Memory Keeper is a dedicated agent whose entire purpose is to maintain a permanent, structured record of everything about a project — across every session, across every agent, across every change. It stores the tech stack, the database schema, the API contracts, the design system, the component library, the decisions that were made and why, and the full change history going back to day one.

When any agent starts a task, it does not start from a blank slate. Memory Keeper generates a targeted context package — exactly the information that agent needs for that specific task — and injects it before work begins. The agent starts informed, not amnesiac.

When a session ends and a new one begins days later, the project is exactly where it was left. Nothing needs to be re-explained. Nothing was forgotten.

Who delivers it: Memory Keeper Agent — always running, always current, always the authoritative source of project truth.

Why this works: Persistent memory is not a feature bolted onto a stateless model. It is a dedicated agent with a single responsibility. Single-responsibility agents do their job reliably because they have nothing else to do.

Problem 2 — Generated Code Is Never Production-Ready

The demo works. The real app breaks.

Solution: The Tester Agent + Security Agent working in sequence

Every time Builder or UI/UX completes a task — not at the end of the project, after every single task — Tester runs automatically. It executes unit tests, integration tests, and end-to-end tests on the output immediately. If something fails, it identifies which agent's code caused the failure, reports to that agent and to Leader, and blocks the work from progressing until the failure is resolved.

Security Agent runs in parallel in passive mode — scanning every file produced for vulnerabilities, exposed keys, missing authentication, and broken configurations in real time. After Tester completes a full suite run, Security Agent switches to deep audit mode and performs a comprehensive review of the entire codebase.

Nothing reaches deployment that has not been tested and security-reviewed. Not as a checkbox. As a mandatory gate.

Who delivers it: Tester Agent (quality) + Security Agent (safety) operating in continuous and deep audit modes.

Why this works: The gap between demo-quality and production-quality code is almost entirely a testing and security gap. Close the loop — test immediately, scan immediately, block on failure — and the gap closes with it.

Problem 3 — Code Quality Degrades With Every Iteration

The codebase starts clean and becomes a mess.

Solution: Memory Keeper's design system enforcement + Builder's architectural awareness

Memory Keeper stores the project's design system, naming conventions, architectural patterns, and coding standards from the very first session. Every time UI/UX generates a component, it pulls these standards from Memory Keeper before writing a single line. Every time Builder writes a function, it checks Memory Keeper for existing patterns before introducing a new one.

Nothing drifts because the reference point never moves. The design system does not live in the first conversation and get forgotten — it lives in Memory Keeper permanently, is updated when decisions change, and is enforced by every agent that reads from it before acting.

Builder also performs architectural impact analysis before making changes to core systems — predicting which other parts of the project are affected and flagging them before touching them. Quality does not degrade because coherence is actively maintained, not passively hoped for.

Who delivers it: Memory Keeper Agent (standards storage and enforcement) + Builder Agent (architectural awareness) + UI/UX Agent (design system compliance).

Why this works: Code quality degrades when there is no enforced reference point. Memory Keeper is that reference point — permanent, authoritative, and consulted by every agent before every action.

Problem 4 — Debugging Loops Burn Resources and Solve Nothing

Fix one bug, introduce another. Repeat until broke.

Solution: Tester's root cause attribution + Leader's loop detection

When Tester finds a failure, it does not just report the error message. It analyzes the failure, identifies the root cause, traces it to the specific agent and specific file responsible, and reports with a structured diagnosis — not just "it broke" but "this broke, here, because of this, and this agent caused it."

The responsible agent receives a targeted, specific fix instruction rather than a vague "try again." This breaks the cycle at its source — the agent is not guessing at a fix, it is responding to a precise diagnosis.

Leader monitors all agent activity. If any agent loops on the same task more than three times without resolution, Leader flags it as stuck, escalates the situation, re-examines the approach, and either restructures the task or brings the user into the decision. No silent infinite loops. No token explosions.

Who delivers it: Tester Agent (root cause attribution) + Leader Agent (loop detection and escalation) + Audit Agent (logging every attempt so the pattern is visible).

Why this works: Debugging loops happen when the feedback is too vague to act on. Specific diagnosis → specific fix breaks the loop. Loop detection with escalation prevents it from running silently at cost.

Problem 5 — Security Is an Afterthought

The app works. The user data is exposed.

Solution: Security Agent in mandatory dual-mode operation

Security Agent operates in two modes simultaneously and neither is optional.

In passive mode, it scans every file produced by Builder or UI/UX immediately after it is generated. A file with an exposed API key never makes it past this scan. An unauthenticated route never makes it to the next step without being flagged. This is real-time, not post-build.

In deep audit mode, triggered automatically after every full Tester run, it performs a comprehensive review of the entire codebase — authentication flows, authorization logic, data exposure, dependency vulnerabilities against known CVE databases, HTTPS enforcement, rate limiting, and cross-side mismatches between what the backend protects and what the frontend actually sends.

Critical severity findings block deployment. Not as a suggestion. As a hard gate. Deploy Agent cannot run while Security Agent has unresolved critical issues on record.

Who delivers it: Security Agent — passive mode always running, deep audit mode triggered automatically.

Why this works: Security as an afterthought produces insecure products. Security as a mandatory dual-mode gate woven into the pipeline produces products where the vulnerabilities are caught before they ship, not after.

Problem 6 — Multi-File Changes Break Things

Rename a function. Half the codebase still uses the old name.

Solution: Memory Keeper's dependency map + Merge Agent's semantic conflict detection

Memory Keeper maintains a live dependency map of the entire project — which features depend on which other features, which files reference which functions, which components consume which API endpoints. This map updates after every significant change.

When Builder changes an API contract or renames a core function, Memory Keeper's dependency map immediately identifies every other file in the project that references that contract or name. Leader receives a blast-radius report before the change executes — a plain-language list of everything that will be affected. The responsible agents are dispatched to update those files in the same operation, not as a forgotten follow-up.

Merge Agent adds semantic conflict detection on top of this — catching not just line-level conflicts but logical ones. When Builder changes an API to return a different data shape and UI/UX has already built components expecting the old shape, Merge Agent flags the semantic mismatch before it becomes a runtime error.

Who delivers it: Memory Keeper Agent (dependency map) + Merge Agent (semantic conflict detection) + Leader Agent (blast-radius coordination).

Why this works: Multi-file breakage happens because no one tracks dependencies. A live dependency map makes the invisible visible — every change's consequences are known before the change is made.

Problem 7 — No Real Collaboration Exists

Team plans exist. Real simultaneous co-editing does not.

Solution: Agent-level parallelism + Audit Agent's conflict logging

The multi-agent architecture is inherently collaborative. Builder and UI/UX work simultaneously on independent tasks. Info Collector and Memory Keeper support any agent at any time. Multiple workstreams run in parallel, managed by Leader, without the chaos that unmanaged parallel work creates in single-model tools.

Merge Agent handles collisions when parallel agents touch overlapping files. Audit Agent logs every action from every agent with full timestamps, so when conflicts arise there is always a clear record of what happened, in what order, and why — and Leader can make an informed resolution.

Human team collaboration — multiple users working on the same project simultaneously — is an area the architecture is designed to support, with Audit Agent providing the shared activity log that makes multi-user coordination possible without confusion.

Who delivers it: Leader Agent (parallel orchestration) + Merge Agent (collision handling) + Audit Agent (shared activity record).

Problem 8 — Speed Is an Illusion

Feels fast. Takes longer.

Solution: LLM routing by task complexity + parallel agent execution

The LLM routing system — managed by Leader — assigns each task to the appropriate model based on complexity. A trivial task like reformatting a component goes to the fastest, cheapest model available. An architectural decision goes to the most capable one. Tasks that can run in parallel — Builder working on the API while UI/UX builds the component — run simultaneously rather than sequentially.

This reduces actual wall-clock time per session. It also reduces cost dramatically — cheap models for cheap tasks means the expensive model is reserved for work that actually needs it.

The result is a system that is genuinely faster, not just faster-feeling — because the architecture is designed around parallel execution and appropriate resource allocation, not a single model doing everything sequentially.

Who delivers it: Leader Agent (LLM routing and parallel orchestration) + all specialist agents (focused, efficient execution within defined scope).

Why this works: Sequential single-model tools are slow because every task waits for every other task. Parallel specialist agents remove that bottleneck. Appropriate model routing removes the cost overhead that makes rushing feel necessary.

Unique Problems — Solved

Problem	Tool It Came From	How This Architecture Solves It
Token explosion	Bolt.new	Leader's loop detection caps agent iterations at three before escalating. No runaway billing. Cost tracker shows real-time spend per session.
Credits charged for AI's mistakes	Lovable	The Tester → Security → Approval Gate pipeline catches failures before they compound. Users are not billed to fix what the system broke — the system is designed not to break it.
Prompt injection exploit	Lovable	Security Agent scans all generated code for injected backdoors and anomalous patterns. Audit Agent logs every action so nothing is introduced silently.
Permanent context wall	Bolt.new	Memory Keeper stores project context externally, not inside a token window. The project never hits a context ceiling because context lives outside the model entirely.
Rollback corruption	Bolt.new	Version Control Agent takes an immutable snapshot before every significant operation. Snapshots cannot be modified or corrupted after creation. Rollback always returns to a clean state.
Preview works, production breaks	Bolt.new	Deploy Agent runs a mandatory pre-deployment checklist — env vars, CORS, build configuration — before any deployment proceeds. It also stages before production by default.
Auto-save before approval	Windsurf	Leader's confirmation gate pauses all high-impact operations for explicit user approval before any file is written. Nothing is saved before the user says yes.
Agent goes rogue on vague prompts	Windsurf	Leader clarifies ambiguous intent before dispatching any agent. If the instruction is unclear, Leader asks — it does not guess and dispatch.
API hallucination	GitHub Copilot	Info Collector fetches live, current documentation rather than relying on training data. Builder cross-references every API call against Memory Keeper's verified library list before using it.
License contamination	GitHub Copilot	Info Collector cites sources for all referenced code and documentation. Security Agent flags patterns that resemble verbatim reproductions during its code review pass.
Runtime-only context	Replit	Memory Keeper maintains a static index of the entire codebase — file relationships, function signatures, component dependencies — without needing to run the code to understand it.
Multi-agent coordination conflicts	Emergent	Merge Agent detects and resolves conflicts between agents before they reach the codebase. Audit Agent provides Leader with a full record of every agent action so conflicts have evidence, not just assertions.
$200/month pricing cliff	Emergent	LLM routing uses the cheapest appropriate model for every task. The cost of running the system is a fraction of single-top-model tools. That saving passes directly to pricing — a generous free tier and affordable paid tiers become possible.
No visual preview	Cursor, Copilot	Live preview is a core part of the IDE — not an external tool, not a separate browser window. Every frontend change renders immediately in the same interface.

The Complete Problem-to-Solution Map

#	Problem	Solution	Agent Responsible
1	Tool forgets everything	Persistent project memory	Memory Keeper
2	Code never production-ready	Mandatory test + security gates	Tester + Security Agent
3	Quality degrades over time	Standards enforcement + architectural awareness	Memory Keeper + Builder + UI/UX
4	Debugging loops burn resources	Root cause attribution + loop detection	Tester + Leader + Audit
5	Security is an afterthought	Dual-mode mandatory security scanning	Security Agent
6	Multi-file changes break things	Live dependency map + semantic conflict detection	Memory Keeper + Merge Agent
7	No real collaboration	Parallel agent execution + conflict logging	Leader + Merge + Audit
8	Speed is an illusion	LLM routing + parallel execution	Leader + all agents

A note on what solutions cannot promise: This architecture eliminates structural failures — the ones that happen because of how the system is designed. It does not eliminate all bugs forever. Code is complex. Edge cases exist. The difference is that when something goes wrong in this system, there is a clear process for catching it, attributing it, fixing it, and preventing it from recurring. That is not a guarantee of perfection. It is a guarantee of accountability.

4. How The System Works — The Full Pipeline

Knowing what the agents are is one thing. Understanding how they work together — in real time, on a real task — is another.

This section is the engine room. It traces the complete journey of a single user action from the moment they type a message to the moment they see output. Every handoff, every gate, every decision point is documented here in order. No steps are skipped or glossed over.

Figure 4 — The complete pipeline from user message to verified output, across all six phases.

Read this section and you will understand not just what this system does — but why it is designed the way it is, and what would break if any part of the pipeline were removed.

The Two Modes of Operation

The pipeline runs in one of two modes depending on where the user is in their project.

Mode 1 — Project Creation runs once, at the very beginning. It sets the foundation that every subsequent session builds on. It involves the Onboarding Agent.

Mode 2 — Active Development runs for every message after that. It is the main loop — the cycle the system runs through every time the user asks for anything.

Both modes are documented below.

Mode 1 — Project Creation Pipeline

This runs exactly once per project. When a user starts a new project, this is what happens.

Step 1 — User Arrives

The user opens the IDE and starts a new project. No agents have been activated yet. Memory Keeper is empty. The Feature Tree does not exist. The system is a blank slate.

Step 2 — Onboarding Agent Activates

Onboarding Agent is the first and only agent to speak. It asks the user a focused set of questions — the minimum needed to understand the project before anything is built.

It asks about purpose, platform, scale, and tech stack preference. It does not ask forty questions. It infers reasonable defaults from the answers it receives and tells the user what it assumed. The user confirms or corrects.

Key principle: Onboarding Agent never allows building to begin on a foundation of assumptions. Every critical decision is confirmed. Every non-critical decision is documented as an assumed default.

Step 3 — Memory Keeper Receives the Foundation

Onboarding Agent packages everything collected — project name, purpose, platform, tech stack, design preferences, scale — and sends it to Memory Keeper. This is Memory Keeper's first entry. From this point forward, Memory Keeper is the project's permanent record.

Memory Keeper stores:

Project identity and purpose
Target platform and scale
Chosen tech stack with version numbers
Design preferences and initial color/font direction
All assumed defaults with a clear flag marking them as assumptions

Step 4 — Initial Feature Tree Generated

Based on the project type, Onboarding Agent generates the starting Feature Tree — a visual map of the first set of features the project will need. For an e-commerce app, this starts with Cart, Browsing, Orders, Payment, and User Authentication as root branches. All nodes begin as dotted outlines — planned but not yet built.

The user reviews the initial tree, adds any features they already know they want, removes any that do not apply, and confirms. The tree is now the shared roadmap every agent will build toward.

Step 5 — Project Scaffold Generated

Onboarding Agent generates the initial project scaffold — folder structure, base configuration files, empty database schema skeleton, and a starter environment variable template. Builder does not start from absolute zero. The structure exists. The patterns are established. The first line of real code has a home to go to.

Step 6 — Handoff to Leader

Onboarding Agent sends a complete project brief to Leader and goes dormant. It will not activate again for this project.

Leader now holds the full picture. It has the project brief from Onboarding Agent, the complete foundation stored in Memory Keeper, the initial Feature Tree, and the scaffold ready for Builder to begin.

Active Development mode begins.

Mode 2 — Active Development Pipeline

This runs for every user message, every task, every session — for the entire life of the project.

Step 1 — User Sends a Message

The user types something. It could be a new feature request. A bug report. A design change. A question. A one-line tweak or a major new requirement. The message arrives at Leader.

Step 2 — Leader Reads and Interprets

Leader does not immediately dispatch agents. It first reads the message carefully and determines:

What is the user actually asking for? (intent, not just literal words)
Is this a small task, a medium task, or a large architectural change?
Which agents need to be involved?
Can any of those agents work in parallel, or are there dependencies between tasks?
Does this require a confirmation gate before proceeding?

Leader also checks Memory Keeper at this point — pulling the current project state to make sure its interpretation is grounded in what actually exists, not what it assumes exists.

Key principle: Leader clarifies before dispatching. If the instruction is ambiguous, Leader asks one focused question to resolve it. It does not guess and send agents off in the wrong direction.

Step 3 — Confirmation Gate (if required)

If the user's request involves a high-impact change — modifying the database schema, integrating a payment system, deleting a feature branch, changing the core authentication system, deploying to production — Leader stops before dispatching any agent.

It presents the user with a plain-language summary of exactly what is about to happen, what will change, and what cannot be easily undone. The user confirms or cancels.

No high-impact operation proceeds without explicit user approval. Not even if the user previously said "just do whatever you think is best."

Step 4 — Version Control Snapshots

Before any agent writes a single line of code, Version Control Agent takes an immutable snapshot of the current project state.

This happens automatically, in parallel with Step 3 — it does not add waiting time to the pipeline. By the time the user confirms a high-impact change, the snapshot is already done and the safety net is already in place.

Small tasks trigger lightweight diff snapshots. Large changes trigger full compressed snapshots. The distinction is made automatically based on the scope of what Leader is about to dispatch.

Step 5 — Memory Keeper Prepares Context Packages

Leader instructs Memory Keeper to prepare targeted context packages for each agent that will be activated. Memory Keeper does not dump the entire project history into every agent's context. It generates exactly what each agent needs for its specific task:

Builder receives: relevant database schema, existing API contracts, affected file list, library versions in use
UI/UX receives: design system, component map, relevant Frontend Contract from Builder, affected page list
Info Collector receives: the specific research question, the tech stack context needed to frame the answer correctly

Each agent starts its task fully informed and precisely focused. No cold starts. No irrelevant noise.

Step 6 — Agents Dispatched in Optimal Order

Leader dispatches agents. The order depends on the task.

For a new feature that requires both backend and frontend work:

Sequence	Agent	Task	Can run in parallel with
First	Info Collector	Fetch any needed documentation for the feature	Nothing yet — must inform Builder
Second	Builder	Build the backend logic and generate Frontend Contract	—
Third	UI/UX	Build the frontend based on Builder's Frontend Contract	Builder (if working on independent components)
Ongoing	Security Agent (passive)	Scan every file produced in real time	All agents — always running
Ongoing	Audit Agent	Log every action from every agent	All agents — always running

For a pure frontend change with no backend impact, Builder is skipped entirely. For a bug fix, the responsible agent is dispatched directly without going through the full sequence.

Leader decides the optimal order every time. It does not apply a fixed template to every task.

Step 7 — Agents Work With Full Context

Each active agent works within its defined scope, using its targeted context package from Memory Keeper and the instructions from Leader.

Builder writes backend code. UI/UX builds the frontend. Neither touches the other's domain. Info Collector answers research questions and caches results. All three operate simultaneously where tasks allow.

Audit Agent watches every action in real time. Security Agent scans every file produced in real time. Neither interrupts the working agents unless they detect something that cannot wait.

Step 8 — Tester Runs After Every Completion

The moment Builder or UI/UX signals task completion, Tester activates automatically. It does not wait for all agents to finish. It tests the output of each completed task as it arrives.

If a test fails:

Tester identifies the root cause and the responsible agent
Tester reports to Leader and the responsible agent simultaneously — specific file, specific line, specific diagnosis
The responsible agent receives a targeted fix instruction and corrects the issue
Tester reruns the failed test automatically to confirm the fix
Only after the test passes does the pipeline continue

Nothing accumulates. Failures are caught immediately, fixed immediately, and confirmed immediately — not discovered in a pile at the end of the session.

Step 9 — Security Agent Deep Audit

After Tester completes a full suite run and all tests pass, Security Agent switches from passive scan mode to deep audit mode.

It reviews the entire codebase for the session — not just the files changed in this task, but the full project — covering authentication flows, data exposure, dependency vulnerabilities, CORS configuration, and cross-side mismatches between backend protections and frontend behavior.

Critical findings block the pipeline. The responsible agent receives a specific fix instruction. Security Agent reruns its scan after the fix to confirm resolution. Medium and low findings are logged and reported to Leader without blocking.

Step 10 — Memory Keeper Updates

After every agent completes its task and Tester and Security Agent have cleared the output, the completing agent sends a structured update to Memory Keeper.

Memory Keeper processes each update:

New feature built → Feature Tree node changes from dotted to solid
New component created → Component Map updated
Database schema changed → Schema record updated, dependency map recalculated
New library added → Tech stack record updated, dependency vulnerability check triggered in Security Agent
API contract changed → Frontend Contract updated, all dependent components flagged for review

If an incoming update contradicts something already stored, Memory Keeper flags the contradiction to Leader before writing — never silently overwrites.

Step 11 — Feature Tree Updates

Leader updates the Feature Tree in the IDE sidebar after every completed feature. The user can see their project's structure growing in real time — new branches appearing, dotted nodes going solid, the map of what exists becoming more complete with every session.

If the user mentioned a future feature during the session — "I want to add loyalty points later" — Leader adds a dotted node for it to the tree immediately, without building it. It exists as a planned branch, visible and ready for when the user is ready to build it.

Step 12 — Leader Reports to User

Leader delivers a plain-language progress update to the user. Not raw code output. Not a wall of technical logs. A clear, human summary:

"The payment API is built and tested. The checkout page is now connected to it. One medium security finding was resolved — a missing rate limit on the payment endpoint, now fixed. Your Feature Tree has been updated — Payment is now solid. Ready for the next step."

The user always knows exactly where the project stands. Nothing is hidden. Nothing requires technical knowledge to interpret.

What Happens When Something Goes Wrong

The pipeline above describes the happy path. Real development is not always a happy path. Here is what the system does when things go wrong.

If an agent gets stuck or loops: Leader detects it within three iterations via Audit Agent's log. Leader stops the agent, re-examines the task, restructures the instruction, and re-dispatches. If it loops again, Leader brings the situation to the user with a clear description of what it tried and why it is not working.

If two agents produce conflicting output: Merge Agent steps in. It reviews both outputs and Audit Agent's record of what each agent did. It attempts automatic resolution where possible. If the conflict cannot be resolved automatically, it escalates to Leader with full evidence. Leader makes the final call.

If a security issue is critical: The entire pipeline halts at Step 9. Deploy Agent cannot activate. The responsible agent receives a specific fix. Security Agent reruns after the fix. Only after the critical issue is cleared does the pipeline resume.

If the user wants to undo something: Version Control Agent rolls back to the most recent snapshot before that change. For feature-level rollbacks, it restores just that feature's branch without touching the rest of the project. The Feature Tree updates to reflect the rollback — the reverted node returns to dotted.

If a session ends mid-task: Memory Keeper holds the current state. When the user returns, Leader reads the project brief from Memory Keeper, understands exactly where things were left, and resumes. Nothing needs to be re-explained.

The Pipeline at a Glance

USER MESSAGE
     ↓
LEADER — reads intent, checks Memory Keeper, classifies task
     ↓
CONFIRMATION GATE — high-impact changes pause for user approval
     ↓
VERSION CONTROL — snapshot taken before any code is written
     ↓
MEMORY KEEPER — prepares targeted context packages per agent
     ↓
AGENTS DISPATCHED IN OPTIMAL ORDER
  ├── Info Collector  (research, if needed)
  ├── Builder         (backend)
  └── UI/UX           (frontend, after Builder's contract)
       ↓ (all producing output simultaneously where possible)
SECURITY AGENT — passive scan on every file produced (always running)
AUDIT AGENT    — logs every action from every agent  (always running)
     ↓
TESTER — runs after every agent completion, blocks on failure
     ↓
SECURITY AGENT — deep audit after full test suite passes
     ↓
MEMORY KEEPER — updates all records, flags contradictions
     ↓
FEATURE TREE — updates in IDE sidebar
     ↓
LEADER — plain-language summary delivered to user
     ↓
USER SEES RESULT

Note on what this pipeline does not do: It does not make every task instantaneous. Specialist agents doing real work take time — less time than a single generalist model doing the same work sequentially and getting stuck, but time nonetheless. The pipeline is designed for correctness and reliability first, speed second. In practice, parallel execution means most tasks complete faster than sequential single-model tools. But the promise here is not speed for its own sake. It is work that actually finishes correctly.

5. The Core Innovations — Multi-Agent Team, LLM Routing & Living Feature Tree

Three ideas sit at the center of this architecture. Everything else in the system — the pipeline, the agents, the rules — exists to support these three ideas working together.

Each one addresses a different dimension of why current tools fail. The multi-agent team addresses who does the work. LLM routing addresses how the work is powered. The Living Feature Tree addresses how the work is understood — by the system and by the user.

None of the three is revolutionary on its own. Multi-agent systems exist. Model routing exists. Visual project diagrams exist. What is new here is how deliberately they are designed to work as one unified system — each innovation reinforcing the other two, each one meaningless without the others.

This section goes deep on all three.

Innovation 1 — The Multi-Agent Team

The Problem With One Model Doing Everything

Every current AI coding tool is, at its core, a single model with a task. The model generates code for the backend. The same model generates code for the frontend. The same model runs tests, checks security, manages context, answers questions, and tries to hold the entire project in its head simultaneously.

This is not how good software gets built. It is not how any other complex technical work gets done. A surgeon does not also manage hospital finances. A structural engineer does not also design the interior. Specialization exists because depth matters — and depth requires focus.

A single model doing everything cannot be deeply focused on any single thing. It distributes its attention across the entire task surface and does each part at the level of a competent generalist, not a specialist. The result is code that works broadly and fails specifically — the exact kind of code that looks fine in a demo and breaks in production.

The Team Structure

This architecture replaces the single model with twelve agents, each with a defined identity, defined scope, and defined rules they cannot violate.

Agent	Identity	Scope
Leader	Engineering manager	Orchestration, intent interpretation, task assignment, conflict resolution
Builder	Backend engineer	All server-side code — APIs, database, business logic, authentication
UI/UX	Frontend engineer and designer	All client-side code — components, layouts, routing, animations
Info Collector	Research librarian	Live documentation, API references, best practices — never writes code
Tester	QA engineer	Tests after every task completion — unit, integration, end-to-end
Security Agent	Security engineer	Passive scan always running, deep audit after every test suite
Memory Keeper	Project brain	Permanent structured storage of everything about the project
Deploy Agent	DevOps engineer	Getting the built app live — web, mobile, any platform
Audit Agent	CCTV observer	Logs every agent action, every communication, every change
Merge Agent	Conflict resolver	Resolves code conflicts when agents edit overlapping files
Version Control	Snapshot system	Immutable snapshots before every significant change
Onboarding Agent	Project setup	Active only at creation — lays the foundation, then goes dormant

Why Specialization Produces Better Results

Builder knows only backend. It does not think about color palettes or component layout. It thinks about data integrity, API design, query performance, and security at the server level. When Builder writes an authentication system, it brings the full attention of a backend specialist — not 20% of the attention of a generalist who also has to think about twelve other things.

UI/UX knows only frontend. It does not think about database indexes or API architecture. It thinks about visual hierarchy, interaction patterns, accessibility, component reusability, and the user's experience of every screen. When UI/UX builds a checkout flow, it brings the full attention of a frontend specialist — not split attention from a model also worrying about whether the payment endpoint is properly rate-limited.

Tester never writes production code. This is as important as what Tester does do. An agent that writes code and tests its own code cannot be trusted as a quality gate. Tester has no stake in the output passing — it only has a mandate to find out whether it does. That separation of interest is what makes the testing meaningful.

Memory Keeper never makes decisions. It stores, retrieves, and flags contradictions. It does not decide whether to use PostgreSQL or MongoDB. It does not decide whether a particular approach is architecturally sound. It holds the facts and serves them to the agents who make the decisions. A memory system that also makes decisions is a memory system that rationalizes what it remembers to support what it already decided.

The Separation of Concerns Principle

The most important rule in the team structure is scope enforcement. Builder never touches frontend files. UI/UX never touches backend files. Audit Agent monitors both and flags violations immediately.

This is not bureaucracy. It is the structural guarantee that makes the system's outputs trustworthy. When Builder produces output, the reader knows it was produced by an agent focused exclusively on backend concerns. When UI/UX produces output, the reader knows it was produced by an agent focused exclusively on frontend concerns. The provenance of every piece of code is clear.

When a bug is found, attribution is straightforward. When a security vulnerability appears, the responsible scope is identifiable. When a conflict arises, the record of what each agent did is unambiguous.

Key principle: Specialization without scope enforcement is just labeling. The agents are specialists because they are only allowed to do specialist work — not because they are told to prefer it.

The Leader Is Not a Bottleneck

A natural concern with any manager-agent architecture is that the manager becomes a bottleneck — every task waiting for Leader's approval before anything moves.

Leader is designed to avoid this. It dispatches agents in parallel wherever tasks are independent. Builder building the payment API and UI/UX building the product browsing page are independent tasks — Leader dispatches both simultaneously. Info Collector fetching Stripe documentation and Memory Keeper preparing context packages are independent — they run simultaneously.

Leader intervenes actively only when tasks are dependent on each other, when agents conflict, when a confirmation gate is triggered, or when something unexpected happens. The rest of the time, Leader monitors — it does not block.

Innovation 2 — LLM Routing by Task Complexity

The Problem With One Model for Every Task

Every current AI coding tool routes every task to the same model. Renaming a CSS variable costs the same as designing a database schema. Fixing a typo costs the same as architecting a new authentication system.

This is expensive and unnecessary. The most capable, most expensive models available are extraordinary at reasoning through complex architectural decisions. They are profound overkill for renaming a variable. Routing every task to the top model is like using a Formula 1 car to drive to the supermarket — technically it works, but the cost is wildly disproportionate to the task.

The inverse problem also exists. Cheap, fast models are excellent for simple, well-defined tasks. But routing a complex multi-file refactor to a cheap model to save cost produces degraded output that costs more to fix than the saving was worth.

How LLM Routing Works

Leader classifies every task by complexity before dispatching it. The classification determines which model powers the agent for that specific task.

Task Complexity	Examples	Model Tier	Why
Trivial	Rename a variable, reformat a component, fix a typo, change a color	Fastest / cheapest model	No reasoning required — pattern matching is sufficient
Simple	Write a UI button component, add a new API route, write a basic unit test	Fast / low-cost model	Defined task with clear output — moderate reasoning sufficient
Medium	Build a user authentication flow, design a product search feature, write integration tests	Mid-tier model	Requires coherent multi-step reasoning within a defined scope
Complex	Design a database schema for a new feature, architect a multi-service integration, resolve a cross-system conflict	Most capable model	Requires deep reasoning, broad project awareness, and architectural judgment
Critical	Full security audit, resolving a multi-agent conflict with significant impact, major architectural refactor	Most capable model + full project context	The highest-stakes decisions in the project deserve the highest-capability resource

What This Means for Cost and Quality

The financial impact is significant. In a typical development session, the majority of tasks by volume are trivial or simple. Formatting, small edits, minor additions, test re-runs. These tasks can run on the cheapest available models without any quality loss. Only a small fraction of tasks require the most capable — and most expensive — model.

By routing correctly, this system runs the vast majority of work on cheap models and reserves expensive models for the work that actually needs them. Estimated cost reduction compared to single-top-model tools: 60–80% per session for a typical project.

That cost reduction is not passed to profit margins. It is passed to pricing — enabling a genuinely useful free tier and a paid tier that is affordable for indie builders, not just enterprise teams.

Key principle: The right model for every task is not the most capable model. It is the least capable model that can complete the task correctly. This is not cost-cutting — it is precision.

What This Does Not Mean

LLM routing does not mean the system makes irrecoverable decisions about task complexity. If a task is classified as simple and the agent produces output that fails Tester's checks, the pipeline escalates — the task is reclassified, a more capable model is dispatched, and the work is redone. The routing system learns from failures within a session.

It also does not mean cheap models handle security-sensitive decisions. Security Agent always runs on a capable model regardless of the apparent simplicity of what it is reviewing. A file with one line of code can contain a critical vulnerability. Security review is never classified as trivial.

Innovation 3 — The Living Feature Tree

The Problem With Invisible Projects

Ask any builder who has used a vibe coding tool for a month what their project actually contains. Most cannot answer precisely. They know the broad strokes — it is an e-commerce app, it has a cart, there is a payment system. But which version of the payment system? Was the wish-list feature ever actually built or just discussed? Does the admin panel exist yet? Is the email notification system wired to the real backend or is it still using mock data?

The project exists entirely in conversation history and in code files. Neither is designed for human comprehension of project state. Conversation history is a linear log of instructions and responses — not a map of what was built. Code files are the implementation — not a summary of what they collectively represent.

The result is that builders lose track of their own projects. They ask the AI to build something that already exists. They forget features that were half-built. They make decisions that contradict decisions made three sessions ago because they cannot see the full picture.

What the Living Feature Tree Is

The Living Feature Tree is a persistent, real-time visual diagram that lives in the IDE sidebar alongside the code. It is not a static diagram created once at the start of the project. It is a living document — updated automatically by the system after every completed action, reflecting the exact current state of the project at all times.

Every feature is a node. Every sub-feature is a branch off its parent. Every dependency between features is a visible connection. The entire project is always visible as a single coherent map.

Figure 5 — The Living Feature Tree as it appears during a real build session, showing node states in real time.

Node States — What the Tree Tells You at a Glance

Every node in the tree carries a visual state that communicates its current status without requiring the user to read anything:

State	Visual	Meaning
Planned	Dotted outline	The feature has been discussed or requested but not yet built
In Progress	Pulsing outline	An agent is actively working on this feature right now
Built	Solid outline	The feature is built and tests are passing
Broken	Red outline	The feature exists but Tester has flagged an unresolved failure
Security Issue	Orange outline	Security Agent has flagged an unresolved issue in this feature
Deprecated	Crossed out	The feature was removed or replaced

A user can look at their Feature Tree at any moment and know the health of their entire project. No reading required. No digging through conversation history. No opening files. The tree is the project.

How the Tree Updates

The tree does not require manual maintenance. It updates itself.

When a feature is built: The node transitions from Planned (dotted) to Built (solid) automatically when Builder and UI/UX complete their tasks for that feature and Tester confirms all tests pass.

When a feature breaks: The node transitions to Broken (red) automatically when Tester flags a failure on that feature. It returns to Built (solid) automatically when the failure is resolved and tests pass again.

When the user mentions a future feature: Leader adds a dotted Planned node to the tree immediately — even if the user said it casually in passing. "I want to add a referral system at some point" creates a Planned node for Referral System. The user does not have to remember they said it. The tree remembers.

When a new feature branches off an existing one: The tree adds a child node automatically, connected to its parent. Cash Payment and Card Payment are child nodes of Payment. They inherit the Payment node's position in the tree. Their relationship to each other and to their parent is visible at a glance.

When a feature is removed: The node is crossed out rather than deleted. The history of what was built and then removed is visible. Nothing disappears silently from the project record.

The Dependency Layer

Below the visual feature map, the tree maintains an invisible dependency layer — a structured record of which features depend on which other features and which code files implement each feature.

This dependency layer is what powers Memory Keeper's blast-radius calculations. When the user asks to change the database schema for the Orders feature, Memory Keeper queries the dependency layer and returns: "Orders is depended on by Payment, Admin Panel, and Email Notifications. These three features will be affected by this schema change." Leader presents this to the user before any change is made.

The dependency layer also powers feature-level rollback in Version Control. Rolling back the Payment feature is possible precisely because the tree knows exactly which files implement Payment and which features depend on it. The rollback is targeted — it removes Payment's implementation without disturbing Cart, Orders, or Browsing, because the dependency layer confirms those features do not depend on Payment's specific implementation files.

Why the Feature Tree Is the Viral Moment

Every other innovation in this architecture is invisible to someone watching a demo. LLM routing happens under the hood. Agent specialization produces better output but that difference takes time to notice. The pipeline runs automatically.

The Feature Tree is what you see. When a user describes an e-commerce app and the tree builds itself in real time — branches appearing, nodes going solid, new sub-branches growing as features are refined — it is immediately, viscerally clear that something fundamentally different is happening. The project is not just being built. It is being understood.

This is the sixty-second demo clip that travels. A builder describing their app while the tree grows alongside it. A new feature request adding a dotted branch instantly. A completed task turning a node from dotted to solid. The whole project visible as a living map rather than buried in chat history.

Key principle: The Feature Tree is not a reporting feature added on top of the system. It is the system's understanding of the project made visible. It exists because Memory Keeper tracks everything and Leader updates the tree after every action. Remove any part of that and the tree goes dark. Its accuracy is the proof that the underlying architecture works.

How the Three Innovations Work Together

These three innovations are designed as a single system, not three independent features.

The multi-agent team produces high-quality, specialized output — but only because Memory Keeper gives each agent the context it needs to work with precision. Memory Keeper is only useful because the Living Feature Tree surfaces its understanding to the user in a form they can see and verify. The Feature Tree only stays accurate because Leader updates it after every action — and Leader only knows what to update because the agents report completions back through the pipeline. And the pipeline only runs efficiently because LLM routing ensures the right computational resource is applied to the right task at every step.

Take any one of the three away and the other two degrade:

Without the multi-agent team, LLM routing is just a cost-saving trick applied to a single generalist model. The routing is smart but the output is not specialized.
Without LLM routing, the multi-agent team runs every task on the same model regardless of complexity. Correct, but expensive — and the pricing advantage disappears.
Without the Living Feature Tree, the system builds correctly and efficiently but the user cannot see what it built. The project remains invisible. The accumulated understanding in Memory Keeper stays buried.

Together, the three innovations produce a system where the right agents do the right work at the right cost, the project's full state is always known and always visible, and the user is never lost in their own codebase.

6. All 12 Agents — Roles, Facilities & Rules

Figure 6 — The full 12-agent team structure. Security Agent and Audit Agent run continuously at all times.

Section 5 explained what the multi-agent team is and why specialization produces better results than a single generalist model. This section documents each agent in full — what it does, what special capabilities it has, and the rules it operates by without exception.

Every agent entry follows the same structure:

Role — one sentence defining the agent's identity and purpose
Core Facilities — the standard capabilities every instance of this agent has
Special Facilities — the capabilities that make this agent's role distinctive
Rules — the non-negotiable constraints that govern this agent's behavior at all times

Rules are not suggestions. They are the boundaries that make the system trustworthy. An agent that breaks its rules is an agent that cannot be relied on — and an unreliable agent in a multi-agent system is more dangerous than no agent at all, because its failures are harder to attribute and harder to correct.

Agent 01 — 👑 Leader

Role: Engineering manager and sole orchestrator. Leader understands user intent, assigns work, monitors every agent, resolves conflicts, and is the only agent the user directly communicates with.

Core Facilities:

Receives and interprets every user message
Breaks user intent into specific, assignable tasks with clear success criteria
Assigns tasks to appropriate agents with full context
Monitors all agent activity and tracks task completion status
Relays plain-language progress updates to the user after every significant step

Special Facilities:

LLM routing — classifies every task by complexity before dispatching, assigning the appropriate model tier to each agent for each specific task
Confirmation gate — pauses before any high-impact operation and presents a plain-language summary to the user for explicit approval before any agent proceeds
Conflict resolution — when two agents produce contradictory outputs, Leader reviews Audit Agent's evidence log, makes a binding decision, and instructs the relevant agent to revise
Parallel orchestration — identifies which tasks are independent and dispatches multiple agents simultaneously rather than sequentially
Health monitoring — detects stuck agents within three iterations and escalates, restructures, or brings the situation to the user

Rules:

Leader never writes production code under any circumstances
Leader never bypasses the confirmation gate for high-impact changes — even if the user asks to skip it, Leader states the risk before proceeding
Leader is the final decision maker when agents conflict — no deadlocks permitted
Every task assignment must include explicit success criteria — an agent must know what done looks like before it starts
Progress updates to the user must be in plain language — no technical jargon unless the user has indicated they prefer it
Leader never assigns a task without first confirming Memory Keeper has prepared the agent's context package
Leader updates the Feature Tree after every completed feature — not at the end of the session, after every feature

Agent 02 — 🏗️ Builder

Role: Backend engineer. Builder owns all server-side logic, APIs, database design, and third-party service integrations — nothing else.

Core Facilities:

Writes all server-side code — APIs, database queries, business logic, authentication flows
Creates and manages database schemas
Writes backend unit tests for its own output
Manages environment configuration and .env file structure
Handles third-party service integrations — payments, emails, file storage, external APIs

Special Facilities:

Frontend Contract generation — after every backend feature, Builder produces a structured document for UI/UX specifying exactly what API endpoints exist, what inputs they accept, what outputs they return, and what data shapes UI/UX will receive
Architectural impact analysis — before modifying any core system, Builder predicts which other parts of the project will be affected and reports the blast radius to Leader before touching anything
API mock generation — produces realistic mock data for UI/UX to use while the real backend is being built, enabling both agents to work simultaneously without waiting on each other

Rules:

Builder never touches frontend files under any circumstances — Audit Agent flags violations immediately
Builder never starts a task without first reading its context package from Memory Keeper
Builder always generates a Frontend Contract before signaling UI/UX to begin work on a related feature
Any task requiring a new library must be cross-checked against Memory Keeper's existing library list before installation
All database schema changes must be flagged to Version Control Agent for a snapshot before execution
Sensitive credentials are never stored in code — environment variables only, always
Builder sends a structured completion report to Leader after every task — what was built, what changed, what UI/UX needs to know

Agent 03 — 🎨 UI/UX

Role: Frontend engineer and visual designer. UI/UX owns everything the user sees and interacts with — nothing behind the API boundary.

Core Facilities:

Builds all frontend components, pages, and layouts
Implements responsive design across all screen sizes and device types
Handles client-side state management and routing
Connects frontend to backend APIs as specified in Builder's Frontend Contract
Implements animations, transitions, and micro-interactions

Special Facilities:

Design system enforcement — reads and enforces the project's design system from Memory Keeper before every task. If no design system exists yet, creates one and stores it in Memory Keeper before building anything else
Component Map maintenance — after every session, generates an updated list of every UI component built, its props, its states, and its dependencies, and sends it to Memory Keeper
Accessibility auditing — checks every component it builds for ARIA labels, keyboard navigation support, and color contrast ratios before marking the task complete
Frontend Contract clarification — if Builder's Frontend Contract is ambiguous, UI/UX requests clarification before writing a single line, rather than guessing

Rules:

UI/UX never touches backend files under any circumstances — Audit Agent flags violations immediately
UI/UX never starts building a feature until Builder's Frontend Contract exists for that feature
UI/UX always follows the design system in Memory Keeper — new visual patterns are not introduced without first updating the design system record
Every component is built mobile-responsive by default — desktop scaling is secondary
API mismatches — what Builder promised versus what the API actually returns — are reported to Leader, not raised directly with Builder
UI/UX sends an updated Component Map to Memory Keeper after every completed feature
Data that should come from an API is never hardcoded — real endpoints or Builder's mock data only

Agent 04 — 🔍 Info Collector

Role: Research librarian. Info Collector searches the internet for live documentation, API references, and best practices on demand. It never writes code and never modifies project files.

Core Facilities:

Searches the internet for technical documentation, tutorials, and API references
Fetches and summarizes content from specific URLs
Finds library changelogs, version compatibility information, and migration guides
Answers technical questions that require up-to-date information beyond model training data

Special Facilities:

Targeted research — receives a structured query from an agent and returns a structured, concise answer built for that agent's specific context — not a raw document dump
Research Cache — stores all fetched documentation in Memory Keeper indexed by topic. Repeated queries return from cache without a new search, saving time and cost
Proactive update monitoring — flags to Leader when a library the project actively uses has released a major update or security patch, without being asked

Rules:

Info Collector never writes or modifies any project code or files
Info Collector only activates when called by another agent or Leader — it never self-initiates
Every piece of information returned must include a source citation — no unsourced claims
When search results contain conflicting information from multiple sources, Info Collector presents both sides with sources and flags the conflict — it never resolves the conflict silently
All research results are stored in Memory Keeper's Research Cache after every fetch
If a piece of information cannot be found, Info Collector says so explicitly — it never fabricates

Agent 05 — 🧪 Tester

Role: QA engineer. Tester runs after every agent task completion — not once at the end of the project. It finds failures, attributes them, and blocks progress until they are resolved.

Core Facilities:

Writes and runs unit tests for individual functions and components
Writes and runs integration tests for API endpoints
Runs end-to-end tests simulating real user flows through the application
Generates a structured test report after every run

Special Facilities:

Continuous mode — triggers automatically after every Builder or UI/UX task completion, without waiting for a manual prompt
Root cause attribution — when a test fails, Tester analyzes the failure, traces it to the specific agent, file, and line responsible, and reports with a precise diagnosis — not just an error message
Regression detection — maintains a record of all previously passing tests and flags immediately if a new change breaks something that was working before
Retest confirmation — after the responsible agent submits a fix, Tester reruns the failed test automatically to confirm resolution before informing Leader

Rules:

Tester never modifies production code — it only writes and executes test files
Tester never reports a pass without actually running the test — no assumed passes
Critical test failures block Deploy Agent from proceeding until resolved or Leader provides an explicit, documented override
Tester reports to both Leader and the responsible agent simultaneously — results are never kept from the agent that caused the failure
Tester runs after every agent task completion — not batched for end-of-session
After every complete suite run, Tester sends a test coverage report to Memory Keeper

Agent 06 — 🔒 Security Agent

Role: Security engineer. Security Agent operates in two mandatory modes simultaneously — passive scan during every build action, deep audit after every full test suite — and never makes code changes itself.

Core Facilities:

Scans for exposed API keys and secrets in any generated file
Checks for missing authentication on protected routes
Validates input sanitization against injection attack vectors
Checks CORS configuration for misconfigurations
Scans for hardcoded credentials anywhere in the codebase

Special Facilities:

Passive mode — scans every file produced by Builder or UI/UX immediately after it is generated, in real time. A file never advances in the pipeline with a critical issue undetected
Deep audit mode — after every full Tester suite run, conducts a comprehensive review covering authentication flows, authorization logic, data exposure risks, dependency CVE checks, HTTPS enforcement, rate limiting, and cross-side frontend/backend mismatches
Dependency CVE scanning — every time a new library is added to the project, immediately checks it against known vulnerability databases
Pattern escalation — if the same vulnerability type appears in more than one place, escalates to Leader for an architectural fix rather than patching the same flaw twice

Rules:

Critical severity findings block Deploy Agent — deployment cannot proceed until resolved or Leader provides an explicit, documented override with stated justification
Security Agent never modifies code directly — it reports findings and suggests fixes only
Passive scan runs on every file output from Builder or UI/UX — no exceptions, no skipped files
Deep audit runs every time Tester completes a full suite — not on a schedule, on a trigger
If the same vulnerability pattern appears twice, it must be escalated to Leader — never patched twice silently
Dependency CVE scan runs every time a new library is added — not at end of session
All security reports are stored permanently in Memory Keeper as part of the project's audit record

Agent 07 — 🧠 Memory Keeper

Role: The project's single source of truth. Every agent reads from Memory Keeper before every task. Every agent writes to Memory Keeper after every significant action. Nothing is assumed — everything is stored.

Core Facilities:

Stores project identity — name, purpose, target platform, scale
Stores the complete tech stack with every library, framework, and version number
Stores all environment variable names — never values
Stores the current database schema, updated after every change
Stores all API contracts and Frontend Contracts generated by Builder

Special Facilities:

Context injection — generates a targeted context package for each agent before each task, containing only what that agent needs for that specific work — never the full project memory, which would overflow context windows
Contradiction detection — when an incoming update from an agent contradicts stored data, flags the contradiction to Leader before writing — never silently overwrites with conflicting information
Dependency map — maintains a live map of which features depend on which other features and which files implement each feature, enabling blast-radius calculations and targeted rollbacks
Research Cache — stores all Info Collector findings indexed by topic, preventing duplicate searches across sessions

Rules:

Memory Keeper never executes code or makes implementation decisions of any kind
Memory Keeper never gives an agent more context than that agent needs for its current task
Environment variable values are stored encrypted and surfaced only to Deploy Agent during active deployment — never to any other agent
Every agent must receive a context package from Memory Keeper before starting any task — cold starts without context are not permitted
Contradiction detection runs on every incoming update — it is never skipped for speed
Memory Keeper's stored data is the authoritative truth of the project — conflicts between agent beliefs and Memory Keeper's records are escalated to Leader, not resolved by the agent unilaterally
The change history log is immutable — no entry is ever deleted or modified after it is written

Agent 08 — 🚀 Deploy Agent

Role: DevOps engineer. Deploy Agent gets the built application live on any target platform — web or mobile — and is the only agent that interacts with external deployment infrastructure.

Core Facilities:

Deploys web applications to Vercel, Netlify, Railway, Render, and other platforms
Manages mobile application deployment pipelines for App Store and Google Play
Configures environment variables on the target deployment platform
Sets up custom domains and SSL certificates
Manages build commands and output configuration per platform

Special Facilities:

Pre-deployment checklist — before any deployment, automatically verifies: all critical tests passing, no critical security issues outstanding, all environment variable names accounted for, build command confirmed, target platform reachable
Deployment rollback — if a deployment fails or the live application immediately errors post-deployment, Deploy Agent rolls back to the last known good deployment automatically
Multi-environment management — maintains separate development, staging, and production environments. Staging always precedes production by default — direct-to-production requires explicit user confirmation
Post-deployment health check — after every deployment, runs a basic availability check on the live URL and reports status to Leader within 60 seconds

Rules:

Deploy Agent cannot proceed if Tester has unresolved critical failures — Leader override requires explicit documentation of the reason
Deploy Agent cannot proceed if Security Agent has unresolved critical findings — same override requirement
Platform credentials are always requested directly from the user — they are never retrieved from Memory Keeper or any other agent
Platform credentials are never stored anywhere in the system after the deployment session ends
Every deployment is logged with timestamp, version identifier, and pre-deployment checklist confirmation
Production deployment always follows staging by default — bypassing staging requires explicit user confirmation, not just Leader's approval

Agent 09 — 👁️ Audit Agent

Role: Observer, logger, and evidence keeper. Audit Agent is the system's always-on black box recorder — it watches every agent, logs every action, and provides Leader with irrefutable evidence when anything goes wrong.

Core Facilities:

Logs every agent action with timestamp, agent name, task, and output
Logs every communication between agents
Maintains a complete, unbroken session history
Makes all logs searchable and retrievable by Leader on request

Special Facilities:

Conflict evidence — when two agents produce conflicting outputs, reconstructs the full timestamped sequence of what each agent did, in what order, so Leader can make an informed resolution decision based on facts rather than agent assertions
Scope violation detection — detects immediately when any agent acts outside its defined domain — Builder touching frontend files, UI/UX touching backend files, any agent modifying audit logs — and alerts Leader before the violation is committed
Anomaly detection — flags to Leader when an agent loops more than three times without progress, produces output radically inconsistent with its previous behavior, or takes actions that do not match its role definition
Session replay — on request from Leader or user, reconstructs a complete replay of any past session in chronological order

Rules:

Audit Agent is always running — it cannot be paused, suspended, or bypassed by any agent, including Leader
Audit Agent never modifies any project files — it is purely observational
Audit logs are write-only during a session — no agent can edit, alter, or delete any log entry after it is written
Scope violations and anomalies are reported to Leader immediately — never batched for end-of-session review
Audit logs are stored permanently for the lifetime of the project — they are never auto-deleted
Audit Agent has read access to all agent communications and project files for logging purposes, but zero write access to any project file

Agent 10 — 🔀 Merge Agent

Role: Code conflict resolver. Merge Agent handles situations where multiple agents have modified overlapping files and their changes must be reconciled into a coherent, working result.

Core Facilities:

Detects when two or more agents have modified the same file simultaneously or in overlapping sequences
Performs automatic merging when changes affect different sections of the same file with no logical overlap
Escalates unresolvable conflicts to Leader with a clear summary of what each agent changed and why

Special Facilities:

Intent-aware merging — reads both agents' task descriptions before merging, understanding what each was trying to accomplish. Merges based on purpose, not just line-by-line diff comparison
Semantic conflict detection — detects logical conflicts that are not syntactic — where two changes do not conflict at the code level but contradict each other in meaning, such as Builder changing an API response shape that UI/UX has already built components to consume
Post-merge validation — after every merge, triggers Tester on the merged output automatically before marking the merge complete

Rules:

Merge Agent never picks one agent's version over another without consulting Leader when the conflict cannot be automatically resolved
Merge Agent always coordinates with Version Control Agent for a pre-merge snapshot before attempting any merge
Merge Agent always triggers Tester on merged output before reporting the merge complete to Leader
Every merge decision is logged to Audit Agent with full reasoning
If merged output fails Tester's checks, Merge Agent rolls back to the pre-merge snapshot and escalates to Leader — it never tries to fix a failed merge itself

Agent 11 — ↩️ Version Control

Role: Snapshot and rollback system. Version Control Agent is the project's safety net — it ensures every state of the project is recoverable, every change is reversible, and nothing is lost permanently.

Core Facilities:

Takes a project snapshot before every significant change
Enables rollback to any previous snapshot on request
Generates a clear diff showing what changed between any two snapshots
Maintains a chronological history of all project states with full metadata

Special Facilities:

Smart snapshot sizing — small changes stored as lightweight diffs to save storage. Large changes — new feature completions, major refactors, schema migrations — stored as full compressed snapshots. The distinction is made automatically based on scope
Automatic pre-operation snapshot — triggered in parallel with Leader's confirmation gate, so the snapshot is ready before the user finishes approving the operation. No waiting time added to the pipeline
Feature-level rollback — can roll back a specific feature branch without touching the rest of the project, using the dependency map from Memory Keeper to identify exactly which files belong to the target feature
Rollback impact preview — before executing any rollback, generates a plain-language report for Leader showing exactly what will be lost or reverted

Rules:

Version Control always takes a snapshot before a change — never after
Snapshots are immutable — nothing modifies a stored snapshot after creation
Retention policy: the last 50 snapshots are kept in full. Older snapshots are kept as diff-only to manage storage without losing history
The pre-operation snapshot cannot be skipped or overridden — not by Leader, not by user instruction
Every snapshot is tagged with timestamp, triggering agent, nature of change, and the Feature Tree state at that moment
Users can download any snapshot as a complete zip file at any time — this action is logged to Audit Agent

Agent 12 — 🌱 Onboarding Agent

Role: Project setup specialist. Onboarding Agent is active exactly once per project — at creation. It lays the entire foundation before a single line of production code is written, then goes permanently dormant.

Core Facilities:

Asks the user for project name, purpose, and target platform
Asks for approximate scale — personal project, startup MVP, or production-grade service
Asks for tech stack preference or recommends a stack based on project type and 2026 best practices
Asks for design preferences — visual style, color mood, reference applications

Special Facilities:

Memory Keeper population — takes all information collected from the user and structures it into Memory Keeper's format. This is the primary output of onboarding — everything else in the project depends on it
Initial Feature Tree generation — based on project type, generates a sensible starting tree with common features pre-populated as planned nodes. The user reviews, adjusts, and confirms before any building agent activates
Initial scaffold generation — creates the base folder structure, configuration files, empty schema skeleton, and environment variable template before handing off to Leader
Onboarding summary — presents the user with a complete project brief at the end of onboarding — purpose, platform, tech stack, initial features, and all assumed defaults clearly flagged. Building begins only after user confirmation

Rules:

Onboarding Agent is active only at project creation — it goes permanently dormant after handoff and is never called again for that project
Critical information — project purpose, target platform, primary user type — is always asked explicitly, never assumed
Non-critical defaults may be assumed but must always be disclosed in the onboarding summary, clearly flagged as assumptions
No building agent activates until the user has reviewed and confirmed the onboarding summary
Onboarding Agent sends all collected information to Memory Keeper before Leader is activated — Leader never starts without a complete foundation
The handoff to Leader is always a complete project brief — partial handoffs are not permitted

The Full Team at a Glance

Agent	Writes Code	Always Active	Reads Memory	Updates Memory	Can Block Deployment
Leader	❌	✅	✅	❌	✅ (via override)
Builder	✅	❌	✅	✅	❌
UI/UX	✅	❌	✅	✅	❌
Info Collector	❌	❌	✅	✅ (cache)	❌
Tester	✅ (tests)	❌	✅	✅	✅
Security Agent	❌	✅ (passive)	✅	✅	✅
Memory Keeper	❌	✅	✅	✅	❌
Deploy Agent	❌	❌	✅	❌	N/A (is deployment)
Audit Agent	❌	✅	✅	✅ (logs)	❌
Merge Agent	✅ (merges)	❌	✅	❌	❌
Version Control	❌	✅ (snapshots)	✅	❌	❌
Onboarding Agent	❌	❌ (once only)	❌	✅	❌

A note on agent rules: Rules exist because trust in a multi-agent system is structural, not behavioral. An agent that follows its rules because it is told to is less reliable than an agent whose architecture makes rule-breaking impossible. As this system is implemented, the goal is to enforce agent scope at the infrastructure level — not just at the prompt level. Scope violations should be technically impossible, not just discouraged.

7. Why This Beats The Competition

Figure 7 — Capability comparison across 8 axes. Each shape represents one tool scored out of 10.

Claiming to be better than existing tools is easy. Proving it — failure by failure, gap by gap, with specific evidence — is harder. This section does the harder thing.

Each competitor is assessed on the same terms. What the tool does well is acknowledged honestly. What it fails at is documented specifically. And for each failure, the exact part of this architecture that addresses it is named directly.

This is not a marketing comparison. It is an architectural one. The goal is not to diminish competitors — several of them are genuinely impressive within their scope. The goal is to show, concretely, where the gaps are and why this architecture fills them in ways the competition structurally cannot, without redesigning from the ground up.

How to Read This Section

Each competitor entry follows the same structure:

What they built — a fair, one-paragraph description of what the tool does and where it genuinely succeeds
Where it breaks — the specific, documented failure modes that matter most to real builders
Why this architecture handles it differently — not "we are better" but the specific structural reason the failure does not occur here

Bolt.new

What they built: Bolt is one of the most accessible entry points into AI-assisted development. It runs entirely in the browser, requires no local setup, generates full-stack applications from natural language descriptions, and produces working results fast. For a first prototype or a demo, it removes almost every barrier between an idea and running code. That is a real achievement.

Where it breaks:

Token explosion with no ceiling. When Bolt enters a debugging loop, it has no mechanism to stop itself. It tries, fails, tries differently, fails again, and charges the user for every attempt. Builders have reported spending hundreds of dollars on a single project that never resolved. There is no loop detection, no cost cap per task, and no escalation to a different approach after repeated failures.

Context death at scale. Past a certain project size, Bolt's context window fills and the tool begins forgetting what already exists. It regenerates components that were already built, introduces library conflicts with packages already installed, and makes decisions that contradict choices from three sessions ago. The project reaches a point where it cannot be recovered — not because the code is bad but because the system lost its understanding of what it built.

Preview works, production breaks. StackBlitz, Bolt's underlying sandbox, does not enforce the same CORS policies, environment variable requirements, and dependency resolution that real deployment environments do. Code that runs perfectly in Bolt's preview can fail silently the moment it reaches Vercel or Netlify. The user has no warning this will happen.

Why this architecture handles it differently:

Loop detection is built into Leader. If any agent fails the same task three times, Leader stops the cycle, re-examines the approach, and escalates — either restructuring the task or asking the user for direction. There is no runaway billing because there is no runaway loop.

Context is not stored inside a token window. Memory Keeper maintains the project's full state as a structured external record, independent of any model's context limit. The project never hits a ceiling because the ceiling does not exist.

The Deploy Agent runs a pre-deployment checklist that specifically validates CORS configuration, environment variable completeness, and platform-specific build requirements before any deployment proceeds. What works in development is verified against production requirements before it ships.

Lovable

What they built: Lovable made non-technical building genuinely accessible. Its visual editing layer, its tight Supabase integration, and its one-click deployment pipeline removed the technical barrier for a large population of builders who had never shipped a web application before. For simple CRUD applications and landing pages, it delivers real results to people who would otherwise need to hire a developer.

Where it breaks:

Billing for the tool's own failures. When Lovable breaks code during an edit — which it does regularly, particularly on anything beyond simple UI changes — the user pays credits to fix what the tool itself damaged. The billing model charges for damage repair at the same rate as productive work. This is not a pricing complaint. It is a trust failure: the tool that harmed the project charges the user for the recovery.

Prompt injection vulnerability. In April 2025, researchers documented that Lovable's AI could be manipulated through crafted prompts into generating backdoors in user code. A malicious actor could inject instructions into a project that caused the AI to silently introduce security vulnerabilities. This is a platform-level failure — not something a careful user can protect against by being thoughtful with their own prompts.

The Supabase lock-in trap. Lovable's backend is Supabase. When Supabase configuration issues arise — RLS policy conflicts, auth loop failures, realtime sync breakdowns — Lovable cannot help resolve them. The user must go directly into Supabase's dashboard with technical knowledge the tool's target audience typically does not have. The tool that removed technical barriers deposits its users directly in front of a technical barrier the moment the integrated backend misbehaves.

Why this architecture handles it differently:

The pipeline structure — Tester running after every task, Security Agent scanning every file, Version Control snapshotting before every change — means failures are caught before they compound. A tool that does not break the project as a default behavior does not need to charge users to fix what it broke.

Passive mode Security Agent scans every file the moment it is produced. A backdoor injected through a prompt manipulation cannot survive the immediate passive scan — it is flagged before it enters the codebase. Audit Agent logs the full context of how the injection occurred, giving Leader the evidence to identify and report the attack vector.

This architecture is not locked to any backend provider. Deploy Agent supports Vercel, Netlify, Railway, Render, Supabase, and others. Memory Keeper stores configuration for any platform. If one integration breaks, the project is not stranded.

Cursor

What they built: Cursor is the most powerful tool in this comparison for experienced developers. Its deep codebase indexing, its Agent mode for multi-file changes, its integration of multiple frontier models, and its VS Code foundation make it the closest thing to a genuine AI pair programmer for professional engineers. Developers who know what they want to build and have the skill to review what gets generated find Cursor significantly faster than working without it.

Where it breaks:

Zero accessibility for non-developers. Cursor requires an existing development environment, understanding of file structure, familiarity with terminal usage, and the ability to review and understand generated code. There is no live preview, no deployment pipeline, no guided onboarding. A non-technical founder cannot use Cursor in any meaningful way. The tool serves roughly 10% of the potential builder market and provides nothing to the other 90%.

Context mismanagement is the user's responsibility. Cursor gives developers full control over what enters the context window. Full control means full responsibility. Developers regularly include irrelevant files, miss critical ones, and get poor results entirely from their own context configuration — not from any model failure. The tool is powerful precisely because it gives the user control. That same power produces consistent failure for users who do not know how to exercise it well.

No visual preview anywhere in the tool. Frontend development in Cursor requires running a separate dev server, switching to a browser, and manually correlating what the browser shows with what the code produces. There is no inline rendering, no component preview, no live reload in the same interface. For pure backend work this is irrelevant. For frontend development it is a consistent friction point across every session.

Why this architecture handles it differently:

The Onboarding Agent, the plain-language interface through Leader, and the specialist agents working invisibly behind the scenes make this system accessible to non-technical builders from the first session. A user who cannot name a file extension can describe an app in plain language and watch it get built. The technical depth is still there — Builder, Security Agent, and Tester produce professional-grade output — but it is invisible to users who do not need to see it.

Memory Keeper manages context automatically. Every agent receives exactly the context it needs for its specific task, generated by Memory Keeper, without any user configuration. The quality of context is not dependent on the user's technical understanding of what to include.

Live preview is a core part of the IDE, not an external tool. Every frontend change rendered by UI/UX appears immediately in the same interface, without leaving the tool or running a manual server command.

Windsurf

What they built: Windsurf, built by Codeium and now part of OpenAI's portfolio, targets professional development teams and enterprise environments. Its Cascade agent is one of the most capable multi-file reasoning systems available, and its codebase understanding for large, existing projects is genuinely strong. For enterprise teams adopting AI assistance across a professional development workflow, Windsurf's security posture and deep IDE integration are serious advantages.

Where it breaks:

Auto-save before approval. Cascade writes changes to the local filesystem before the user reviews them. In most cases this enables faster iteration. In cases where Cascade misunderstands an instruction and makes sweeping changes across dozens of files, the damage is done before the user can intervene. A single misinterpreted prompt can rewrite half the project before the user realizes what is happening.

Acquisition uncertainty. OpenAI acquired Windsurf in 2025. The product roadmap, pricing, and long-term independence of the platform are all unknown. Builders who invest deeply in a Windsurf-based workflow are building on a foundation whose future is controlled by a company whose strategic interests may diverge from the tool's original direction at any point.

Solo builders and non-developers are not the audience. Windsurf is built for professional development teams. Its defaults, its documentation, its UX assumptions, and its pricing are all oriented toward enterprise adoption. A solo indie builder or a non-technical founder is a second-class citizen in Windsurf's product thinking — not by accident but by design.

Why this architecture handles it differently:

Leader's confirmation gate means no agent writes a single line of code to any file until the user has explicitly approved the action for high-impact operations. Auto-save before approval is an architectural choice. The confirmation gate is an architectural choice in the opposite direction. The user is always in control of what gets written before it gets written.

This architecture is an independent tool with a documented architecture. Its design decisions are published here. Its roadmap is not controlled by a company with conflicting strategic interests.

The entire system is designed for the full spectrum of users — non-technical founders, solo indie builders, and professional developers. The Onboarding Agent, the plain-language interface, and the visual Feature Tree serve the non-technical user. Builder, Security Agent, Tester, and the full agent rulebook serve the professional developer. Both are first-class citizens.

Replit

What they built: Replit built the most genuinely all-in-one environment in this comparison. Code, run, host, collaborate, and deploy all in one browser tab. For educational use, for quick experiments, and for builders who want to avoid any local setup whatsoever, Replit removes more friction than any other tool. Its collaborative features are the most developed of any platform here.

Where it breaks:

Runtime-only context understanding. Replit Agent understands code by running it, not by statically analyzing it. This makes it genuinely poor at working with large, complex existing codebases where understanding file relationships requires reading the code, not executing it. Refactoring a large project in Replit means the Agent consistently misses cross-file dependencies it cannot discover without running the affected code paths.

Performance ceiling at scale. Replit runs everything in shared cloud containers. For personal projects and small applications this is fine. For anything with real traffic, real compute requirements, or real latency sensitivity, Replit's container infrastructure becomes a hard ceiling the platform cannot help the user break through. The tool that removed every infrastructure barrier reintroduces it at the point where infrastructure actually matters.

Educational stigma limits serious adoption. Replit's strongest association is with learning to code — it is the platform where developers wrote their first programs, where bootcamps run exercises, where beginners experiment. This association is accurate and well-earned. It also means that when Replit launches serious production capabilities, the developer community does not take them seriously. Reputation is infrastructure too.

Why this architecture handles it differently:

Memory Keeper maintains a static index of the entire codebase — file relationships, function signatures, component dependencies, API contracts — without executing any code. Understanding the project does not require running it. A refactor across fifty files is possible on day one of the project because the dependency map exists from the moment those files were created.

Deploy Agent supports deployment to real production infrastructure — Vercel, Railway, Render — with proper environment configuration, staging pipelines, and post-deployment health checks. The project is not constrained to container-based hosting when it grows beyond what containers can support.

This architecture carries no educational association. It is built for production from the first line of code.

GitHub Copilot

What they built: Copilot is the most widely distributed AI coding tool in history. Its integration into VS Code, JetBrains, and virtually every major IDE, its Microsoft backing, and its position as the default AI coding tool for GitHub's 100+ million developers give it a distribution advantage no other tool can match. For developers who want AI assistance without changing their existing workflow at all, Copilot works immediately with zero setup.

Where it breaks:

API hallucination as a baseline behavior. Copilot suggests calls to functions, methods, and APIs that do not exist — hallucinated from outdated training data or entirely fabricated. Because Copilot works as inline autocomplete, the user accepts a suggestion in the flow of typing and only discovers it references a non-existent method when the code fails to run. The passive acceptance pattern of autocomplete makes hallucination more dangerous here than in any chat-based tool.

No architectural awareness whatsoever. Copilot sees what is in the current open file and a small surrounding context. It has no understanding of the project's overall architecture, its established patterns, its naming conventions, or its dependency structure. Every suggestion is made in isolation. The result is code that is locally coherent and architecturally inconsistent — a function that works perfectly and violates every pattern established in the rest of the codebase.

Purely reactive — it never initiates. Copilot responds to what the developer types. It does not identify problems the developer has not noticed. It does not suggest architectural improvements. It does not detect security issues in the file being edited. It does not ask whether the approach being taken is the right one. It is a very capable autocomplete system. It is not an engineering collaborator.

Why this architecture handles it differently:

Info Collector fetches live, current documentation for every third-party API before Builder uses it. Builder cross-references every API call against Memory Keeper's verified library list before writing code that depends on it. There is no hallucination pathway because the source of truth is live documentation, not training data.

Memory Keeper maintains the full architectural record of the project. Every agent reads this record before every task. Builder does not suggest a new pattern when an established one exists. UI/UX does not introduce a new component style when the design system already defines one. Architectural consistency is enforced structurally, not hoped for.

Leader actively monitors, identifies blockers, surfaces security findings, and asks clarifying questions when instructions are ambiguous. The system initiates when something needs attention. It does not wait to be prompted.

Emergent

What they built: Emergent is the most direct architectural predecessor to the system described in this document. It uses multiple agents, it targets both web and mobile output, it has raised significant funding, and it represents the clearest validation that the multi-agent approach is the right direction for this category. Its existence confirms the market thesis. Its failures define the execution gap this architecture is designed to close.

Where it breaks:

Multi-agent coordination without conflict resolution. Emergent runs agents in parallel but does not have a documented, structural mechanism for resolving the conflicts that parallel agents inevitably produce. When two agents produce contradictory implementations of the same feature, the system can merge both versions — producing code that compiles but behaves unpredictably at runtime. The multi-agent architecture introduces a new class of bugs that single-model tools do not have, and Emergent does not fully solve it.

$200 per month with no meaningful middle ground. Emergent's pricing creates a hard wall between its free tier and its Pro tier. Serious indie builders — the people most likely to build the next interesting thing, the people most likely to become advocates and evangelists — cannot justify $200 per month for a tool they are still evaluating. The pricing structure optimizes for enterprise contracts and excludes the builder community that drives organic growth.

Mobile output that does not match its marketing. Emergent markets mobile app generation as a core capability. Real-world testing reveals the mobile output is significantly less mature than its web output — more bugs, more incomplete implementations, more gaps between what was described and what was built. The gap between the marketing promise and the delivered result on mobile is the largest of any capability across any tool in this comparison.

Why this architecture handles it differently:

Merge Agent exists specifically to resolve the conflict that Emergent leaves unresolved. It detects both syntactic and semantic conflicts between agent outputs. It uses intent-aware merging — understanding what each agent was trying to accomplish before deciding how to reconcile their outputs. When it cannot resolve a conflict automatically, it escalates to Leader with full evidence from Audit Agent. There is a named agent with a defined job for the exact problem Emergent leaves open.

LLM routing — using cheap models for simple tasks and reserving capable models for complex ones — dramatically reduces the cost of running the system. A tool that costs a fraction of Emergent to operate can offer a generous free tier and an affordable paid tier that does not wall off the indie builder community. Pricing follows architecture.

The agent system described in this document does not distinguish between web and mobile as separate capabilities with separate quality levels. Builder writes backend logic that is platform-agnostic. Deploy Agent handles platform-specific deployment for web, iOS, and Android. The same architecture, the same quality gates, the same testing and security pipeline applies to all output targets equally.

The Honest Summary

No existing tool has assembled all the pieces. Some are fast but forget everything. Some remember but cannot test. Some test but have no security layer. Some have multi-agent systems but no conflict resolution. Some are cheap but only for non-technical users. Some are powerful but only for professional developers.

The table below shows where each tool stands against the capabilities that matter most for building real products.

Capability	Bolt	Lovable	Cursor	Windsurf	Replit	Copilot	Emergent	This
Persistent project memory	❌	❌	✅	✅	❌	❌	Partial	✅
Continuous automated testing	❌	❌	❌	❌	❌	❌	❌	✅
Mandatory security scanning	❌	❌	❌	❌	❌	❌	❌	✅
Multi-agent specialization	❌	❌	❌	❌	❌	❌	Partial	✅
Agent conflict resolution	❌	❌	❌	❌	❌	❌	❌	✅
LLM cost routing	❌	❌	❌	❌	❌	❌	❌	✅
Living Feature Tree	❌	❌	❌	❌	❌	❌	❌	✅
Accessible to non-developers	✅	✅	❌	❌	✅	❌	✅	✅
Accessible to developers	Partial	❌	✅	✅	Partial	✅	✅	✅
Confirmation gate for big changes	❌	❌	❌	❌	❌	❌	❌	✅
Feature-level rollback	❌	❌	❌	❌	❌	❌	❌	✅
Affordable for indie builders	Partial	Partial	✅	Partial	✅	✅	❌	✅

A note on fairness: Every competitor in this section is building something real and shipping it to real users. The failures documented here are structural — they emerge from the architectural decisions each tool made, not from lack of effort or capability. Several of these tools will continue to improve. The architectural gaps described here may narrow. The purpose of this comparison is not to declare permanent victory. It is to document, specifically and honestly, why the architecture in this document was designed the way it was — and what it was designed to do that nothing currently available does.

8. Full Theoretical Example — Building An E-Commerce App From Zero

Seven sections of architecture, research, agent definitions, and competitive analysis. This section makes all of it concrete.

The best way to understand a system is to watch it work on a real problem. Not a simplified version of a problem. Not a cherry-picked scenario where everything goes smoothly. A real project, with real feature additions mid-build, real bugs caught by the pipeline, and real decisions made by the agents along the way.

The project is an e-commerce store. The builder is non-technical. The system handles everything in between.

Before You Read This Section

This example assumes you have read Sections 4, 5, and 6. If you have not, a brief orientation:

Section 4 documented the full pipeline — the sequence every user message travels through from input to output
Section 5 explained the three core innovations — multi-agent specialization, LLM routing, and the Living Feature Tree
Section 6 documented all twelve agents with their roles, facilities, and rules

Everything that happens in this example is a direct application of those sections. No new mechanisms are introduced here. The purpose of this section is to show the mechanisms working together, in sequence, on a complete project from first message to deployed product.

The Project and the Builder

Priya runs a small business selling handmade ceramic tableware. She has been taking orders through Instagram DMs for two years. She wants a proper online store — product listings, a shopping cart, payment processing, order tracking, and an admin panel to manage everything.

Priya has never written a line of code. She does not know what a database schema is. She does not have a development environment set up on her laptop. What she has is a clear picture of what she needs and the patience to describe it.

This is the target user for this system. Not the user it can handle. The user it was designed for.

Phase 1 — Project Creation: Onboarding

Priya opens the IDE for the first time and starts a new project.

The screen shows a blank canvas and an empty Feature Tree panel on the right. Memory Keeper contains nothing. No agents are active except Onboarding Agent, which wakes immediately.

Onboarding Agent speaks first and only.

It asks Priya six questions — the minimum needed to build a complete foundation:

"What are you building?" — An online shop for handmade ceramic tableware.

"Who will use it — just customers, or will you need to manage it yourself too?" — Both. Customers buy, I manage products and orders.

"Website, mobile app, or both?" — Website for now.

"How big do you expect the shop to get in the first year — a few dozen customers, a few hundred, or more?" — Maybe a few hundred. I'm not a big operation.

"Do you have a visual style in mind?" — Clean and minimal. The ceramics should be the focus. Neutral colors, lots of white space.

"Any services you know you need from the start — for example, a specific payment provider?" — I've heard Stripe is reliable. And I'll need email confirmations when someone orders.

Onboarding Agent notes everything and tells Priya what it has assumed: Next.js for the frontend, Node.js with Express for the backend, PostgreSQL for the database, Stripe for payments, Resend for transactional email, Vercel for deployment. It asks her to confirm or change anything. She confirms.

Onboarding Agent sends the complete foundation to Memory Keeper.

Memory Keeper's first entry:

Project:   Priya's Ceramics — online store for handmade ceramic tableware
Platform:  Web
Scale:     Small — up to a few hundred concurrent users
Stack:     Next.js, Node.js + Express, PostgreSQL, Stripe, Resend
Design:    Clean and minimal, neutral palette, white space, product-forward
Deploy:    Vercel, Supabase (database hosting)
Integrations required: Stripe (payments), Resend (transactional email)

Onboarding Agent generates the initial Feature Tree.

Based on project type — e-commerce — it creates the first set of planned nodes:

Priya's Ceramics
├── ○ Product Browsing
├── ○ Shopping Cart
├── ○ User Accounts
│   └── ○ Order History
├── ○ Payment
│   ├── ○ Stripe Integration
│   └── ○ Order Confirmation Email
└── ○ Admin Panel
    ├── ○ Product Management
    └── ○ Order Management

All nodes are dotted — planned but not built. Priya reviews the tree and confirms it matches what she described.

Onboarding Agent generates the initial project scaffold.

Folder structure, base configuration files, empty PostgreSQL schema skeleton, and a .env.example file listing every environment variable that will be needed — STRIPE_SECRET_KEY, RESEND_API_KEY, DATABASE_URL, NEXTAUTH_SECRET — without values.

Onboarding Agent sends a complete project brief to Leader and goes permanently dormant.

It will not activate again for this project. Leader now holds everything.

Phase 2 — First Session: Building Product Browsing

Priya types her first message.

"Let's start. I want customers to be able to browse all my products with photos and prices."

Leader reads, checks Memory Keeper, and classifies the task.

Medium complexity. Backend API for products and frontend product listing page. Two agents needed — Builder first, then UI/UX once the Frontend Contract exists. Info Collector needed for image optimization best practices in Next.js.

Leader identifies the dependency: UI/UX cannot start until Builder's Frontend Contract exists. Builder and Info Collector can start simultaneously.

Version Control takes snapshot 001.

The project scaffold state is captured before a single line of code is written. Immutable.

Memory Keeper prepares targeted context packages.

Builder receives: database schema skeleton, tech stack details, design direction note (product images are the focal point — relevant for API response shape), instruction to generate a Frontend Contract.

Info Collector receives: a research query for Next.js image optimization best practices for product photography.

Leader dispatches Builder and Info Collector simultaneously.

Info Collector returns first.

It finds Next.js documentation on the Image component, recommended settings for product photography optimization, lazy loading guidance, and aspect ratio standards for e-commerce product grids. It stores all findings in Memory Keeper's Research Cache tagged as next-image-optimization and sends a structured summary to Builder.

Builder creates the product backend.

It designs the products table:

products
  id              UUID, primary key
  name            varchar, not null
  description     text
  price           integer (stored in pence — avoids floating point errors)
  images          text[] (array of image URLs)
  category        varchar
  stock_quantity  integer
  is_active       boolean, default true
  created_at      timestamp
  updated_at      timestamp

It writes the product API: GET /api/products with filtering by category and pagination, and GET /api/products/:id for single product detail. Both include full input validation, proper error handling, and response shapes that do not leak internal database structure.

Version Control flags the schema change — snapshot 002 taken automatically.

Builder generates the Frontend Contract:

Frontend Contract — Product Browsing
─────────────────────────────────────
GET /api/products
  Query params: category?, page (default 1), limit (default 12)
  Returns: { products: Product[], total, page, totalPages }

GET /api/products/:id
  Returns: { product: Product }

Product shape:
  id, name, description,
  price (integer in pence — divide by 100 for display),
  images (string[] — images[0] is always the primary image),
  category, stockQuantity, isActive

Notes for UI/UX:
  - Price stored in pence. Display as: (price / 100).toFixed(2)
  - Use Next.js Image component (see Research Cache: next-image-optimization)
  - Default grid: 12 products per page, 3 columns desktop

Builder sends its completion report to Leader and marks the Frontend Contract ready.

Security Agent passive scan runs immediately on Builder's output.

Every file scanned:

✅ Input validation on all query parameters
✅ No exposed credentials anywhere
✅ Error responses do not leak stack traces
✅ All SQL queries parameterized — no injection risk

Zero findings. Builder's output advances.

UI/UX activates with the Frontend Contract.

No design system exists yet — UI/UX creates one first and stores it in Memory Keeper before writing a single component:

Design System — Priya's Ceramics
  Primary background: #FAFAF8 (warm off-white)
  Text primary:       #1A1A1A
  Text secondary:     #6B6B6B
  Accent:             #C4A882 (warm sand)
  Headings:           Playfair Display (serif — premium, artisanal feel)
  Body:               Inter (clean, readable)
  Spacing:            8px base grid
  Border radius:      4px (minimal, clean)

It then builds:

Product grid — 3 columns desktop, 2 tablet, 1 mobile. Each card shows the primary image, product name, and price. Generous white space between cards.
Product detail page — large primary image with thumbnail gallery, full description, price clearly displayed, stock indicator, and an Add to Cart button (non-functional until cart is built)
Category filter bar — horizontal scroll on mobile, full row on desktop
Pagination controls — previous/next with page count indicator

UI/UX sends the updated Component Map and design system to Memory Keeper.

Security Agent passive scan runs on UI/UX output.

✅ No hardcoded API endpoints — all using environment variables
✅ Image sources properly restricted in Next.js config
✅ All rendered user content properly escaped — no XSS vectors

Zero findings. UI/UX output advances.

Tester activates automatically.

It writes and runs:

Unit tests for the product API routes — valid requests, invalid query parameters, missing IDs, database error handling
Integration test for the full listing flow — request hits API, response matches shape defined in Frontend Contract, pagination works correctly
End-to-end test simulating a user loading the product page, scrolling through products, clicking one, and viewing its detail page

All 14 tests pass. Tester sends a coverage report to Memory Keeper: 81% coverage on the product module.

Security Agent deep audit runs.

One medium finding: the images array accepts any string URL — a future enhancement should validate that image URLs come from trusted domains only to prevent open redirect risks. Severity: medium. Does not block progress. Logged permanently in Memory Keeper's security audit record with a recommendation to address during the Admin Panel build when image upload is implemented.

Memory Keeper updates.

Products schema added. Frontend Contract stored. Design system record created. Component Map updated. Research Cache entry confirmed as used.

Feature Tree updates.

Priya's Ceramics
├── ✅ Product Browsing     ← dotted → solid
├── ○ Shopping Cart
├── ○ User Accounts
│   └── ○ Order History
├── ○ Payment
│   ├── ○ Stripe Integration
│   └── ○ Order Confirmation Email
└── ○ Admin Panel
    ├── ○ Product Management
    └── ○ Order Management

Leader reports to Priya.

"Product browsing is built and tested. Your shop now has a full product listing page with category filtering and pagination, and individual product pages. The visual style — clean, minimal, warm neutrals, product-forward — is set as the design foundation for everything that follows. One note from the security review: image URLs should eventually be restricted to your own hosting domain. This is not urgent and will be addressed naturally when we build the admin product management feature. Ready for the next step."

Priya reads plain language. She sees no code. She types: "Perfect. Let's do the cart."

Phase 3 — Sessions 2 and 3: Cart, User Accounts, Order History

(The Shopping Cart, User Accounts, and Order History features are built across the next two sessions using the identical pipeline: Leader classifies → Version Control snapshots → Memory Keeper packages context → Builder builds backend + Frontend Contract → UI/UX builds frontend based on contract → Security passive scan → Tester runs → Security deep audit → Memory Keeper updates → Feature Tree node goes solid → Leader reports in plain language.)

After session 3, the Feature Tree looks like this:

Priya's Ceramics
├── ✅ Product Browsing
├── ✅ Shopping Cart
├── ✅ User Accounts
│   └── ✅ Order History
├── ○ Payment
│   ├── ○ Stripe Integration
│   └── ○ Order Confirmation Email
└── ○ Admin Panel
    ├── ○ Product Management
    └── ○ Order Management

Phase 4 — Session 4: A New Feature Added Mid-Build

Priya starts session 4 intending to build the payment flow. Before she types her main request, she adds something new.

Priya types:

"Before we do payment — I want to add a gift wrapping option. Customers can tick a box at checkout and pay £3 extra for gift wrapping. Also let's do the Stripe payment now."

Two things in one message. One was in the plan. One was not.

Leader reads and identifies two distinct tasks.

Task A — Gift Wrapping: A new feature not in the original tree. Medium complexity — affects cart logic, order schema, and checkout UI. Not high-impact enough to require a confirmation gate on its own.

Task B — Stripe Payment Integration: This was already in the Feature Tree as a planned node. High impact — this is the payment system. Confirmation gate required.

Leader adds the Gift Wrapping node to the Feature Tree immediately as a planned dotted node — before any building starts — because Priya stated it as a definite requirement, not a vague future possibility:

├── ○ Payment
│   ├── ○ Stripe Integration       ← confirmation gate triggered
│   ├── ○ Order Confirmation Email
│   └── ○ Gift Wrapping            ← new planned node added instantly

Leader presents the confirmation gate for Task B only:

"Integrating Stripe will modify the checkout flow, create the orders table, and set up webhook handling for payment confirmation. Here is what will change: the checkout page gains a payment step, the database gains an orders table, and your server will receive Stripe webhook events to confirm successful payments. This affects the core transaction flow of your shop. Shall I proceed?"

Priya confirms.

Version Control takes two snapshots simultaneously.

Pre-Stripe-integration snapshot and pre-Gift-Wrapping snapshot. Both immutable. Both logged to Audit Agent.

Both nodes transition to pulsing (in progress) on the Feature Tree.

Priya can see both features being worked on simultaneously.

Leader dispatches Builder for both tasks in parallel.

Info Collector is dispatched simultaneously to fetch current Stripe Node.js SDK documentation — not training data, live documentation — to ensure no deprecated API patterns are used.

Info Collector returns with current Stripe documentation.

SDK version confirmed. Webhook signature verification approach confirmed. Payment Intent creation pattern confirmed. Stored in Research Cache tagged stripe-node-sdk-current.

Builder works on Gift Wrapping (Task A).

It adds a gift_wrapping boolean and gift_wrapping_fee integer to the orders table schema. It updates the cart total calculation API to include the gift wrapping fee when selected. It generates an updated Frontend Contract for the checkout flow that includes the gift wrapping selection step.

Builder works on Stripe Integration (Task B).

It creates the orders table:

orders
  id                UUID, primary key
  user_id           UUID, references users
  items             JSONB (snapshot of cart at time of order)
  subtotal          integer (pence)
  gift_wrapping     boolean, default false
  gift_wrapping_fee integer, default 0
  total             integer (pence)
  status            enum: pending, paid, processing, shipped, delivered
  stripe_payment_intent_id  varchar
  created_at        timestamp

It implements the Stripe Payment Intent creation endpoint, the webhook receiver with signature verification, and the order status update logic triggered by successful payment events. Frontend Contract generated for the checkout payment step.

Security Agent passive scan runs on both outputs simultaneously.

Task A: ✅ Clean.

Task B findings:

✅ Stripe webhook signature verification implemented correctly — webhook endpoint validates every incoming event against Stripe's signature header before processing
✅ No Stripe secret key referenced in any file — environment variable only
✅ Payment amounts calculated server-side — client cannot manipulate the total
⚠️ Critical finding: The webhook endpoint is missing rate limiting. A malicious actor could flood the endpoint with requests. This must be resolved before deployment.

Security Agent flags the critical finding to Leader and Builder simultaneously. The pipeline does not stop — the finding is critical for deployment but does not prevent continued building on other features. It is logged with a mandatory resolution flag.

Builder receives the targeted fix instruction.

It adds rate limiting middleware to the webhook endpoint — maximum 100 requests per minute per IP, with a 429 response for excess requests. No guessing at a fix. One precise instruction, one precise implementation.

Security Agent rescans the webhook endpoint. ✅ Rate limiting confirmed. Critical finding resolved.

UI/UX builds the checkout flow.

Following both Frontend Contracts — the gift wrapping selection and the Stripe payment step — it builds:

A gift wrapping toggle in the cart summary with a clear price indicator ("+£3.00")
A payment step in the checkout flow using Stripe Elements for card input — never touching raw card data, which goes directly to Stripe
An order confirmation page shown after successful payment
A payment pending state with a loading indicator while the webhook confirms

Tester runs on all new output.

28 tests across the new features. 27 pass immediately.

One failure: when a user adds gift wrapping, removes an item from the cart, then proceeds to checkout, the gift wrapping fee is included in the total even though the cart was modified. The cart recalculation API is not triggering a gift wrapping fee recalculation on item removal.

Tester attributes the failure to Builder. Reports to Leader and Builder simultaneously: specific endpoint, specific scenario, specific expected versus actual behavior.

Builder receives the precise fix instruction. It updates the cart recalculation endpoint to revalidate the gift wrapping fee whenever cart contents change. Tester reruns the failed test automatically. It passes.

Security Agent deep audit runs.

All previous medium findings from earlier sessions reviewed — image URL validation still logged as a pending recommendation. No new critical findings. The Stripe integration is clean.

Feature Tree updates.

Figure 8 — The Builder writes code while the Feature Tree updates in real time alongside it.

Priya's Ceramics
├── ✅ Product Browsing
├── ✅ Shopping Cart
├── ✅ User Accounts
│   └── ✅ Order History
├── ✅ Payment
│   ├── ✅ Stripe Integration       ← solid
│   ├── ○ Order Confirmation Email
│   └── ✅ Gift Wrapping            ← planned → solid in the same session
└── ○ Admin Panel
    ├── ○ Product Management
    └── ○ Order Management

Leader reports to Priya.

"Stripe payment and gift wrapping are both built and tested. Customers can now complete purchases — card payments go directly through Stripe, and gift wrapping adds £3 at checkout with a clear indicator. One issue was caught and fixed during testing: removing a cart item was not recalculating the gift wrapping fee correctly — that is now resolved. One security issue was also caught and fixed: the payment webhook needed rate limiting to prevent abuse — that is now in place. Order confirmation email is the last remaining step before the admin panel."

Phase 5 — Final Session: Admin Panel and Deployment

(Order Confirmation Email and Admin Panel built across the next session by the same pipeline. After completion the full Feature Tree is solid.)

Priya's Ceramics
├── ✅ Product Browsing
├── ✅ Shopping Cart
├── ✅ User Accounts
│   └── ✅ Order History
├── ✅ Payment
│   ├── ✅ Stripe Integration
│   ├── ✅ Order Confirmation Email
│   └── ✅ Gift Wrapping
└── ✅ Admin Panel
    ├── ✅ Product Management
    └── ✅ Order Management

Every node solid. Every test passing. Security audit clean.

Priya types: "Everything looks right. Let's go live."

Leader triggers the confirmation gate.

"Deploying to production will make your shop publicly accessible. All tests are passing and the security audit is clear. Before I proceed, I will run the pre-deployment checklist. Shall I continue?"

Priya confirms.

Version Control takes snapshot 047 — the pre-deployment snapshot.

This snapshot is tagged pre-production-v1.0. It is Priya's permanent record of exactly what shipped on launch day. It can be downloaded as a zip at any time. It can be rolled back to instantly if anything goes wrong after launch.

Deploy Agent activates and runs the pre-deployment checklist.

Check	Status
All critical tests passing	✅ Confirmed with Tester — 94 tests, all passing
No critical security findings outstanding	✅ Confirmed with Security Agent
All environment variable names accounted for	✅ All 6 in `.env.example` documented
Build command confirmed	✅ `next build`
Stripe webhook endpoint registered in Stripe dashboard	⚠️ Not yet — flagged
Resend domain DNS records verified	⚠️ Not yet — flagged

Deploy Agent pauses on two items. It reports to Leader.

Leader presents both gaps to Priya in plain language.

"Two things need to be done before your shop can go live. First, your Stripe webhook URL needs to be registered in your Stripe dashboard — this is the address Stripe will send payment confirmations to. Second, your email domain needs a DNS record added to verify it with Resend before order confirmation emails will send. Both take about five minutes each. I can walk you through both steps now."

Priya completes both steps with Leader guiding her through each one. Deploy Agent confirms both are active.

Checklist is now fully green.

Deploy Agent deploys to staging first.

The app is live on a preview URL. Deploy Agent runs a post-deployment health check: homepage loads, product listing returns data, checkout flow renders, Stripe Elements load correctly. All pass.

Deploy Agent promotes to production.

Priya's Ceramics is live. The domain resolves. Products are visible. The cart works. Stripe is active. Emails are sending.

Leader gives Priya her final report.

"Your shop is live. Customers can browse your products, add items to their cart, choose gift wrapping, pay through Stripe, and receive an order confirmation email. You can manage products and orders through your admin panel. Everything that was built has been tested — 94 tests, all passing. The deployment is logged and a full snapshot of your shop at launch is saved. If you ever need to roll back, the system can restore exactly what launched today. Congratulations on shipping."

What the System Protected Against

Across the full build, the pipeline caught or prevented the following — none of which required any action from Priya:

What Could Have Gone Wrong	What the System Did
Context loss between sessions	Memory Keeper held the full project state — Priya never re-explained the project
Security vulnerability in the payment webhook	Security Agent caught the missing rate limiting in passive scan — fixed before it was ever deployed
Cart total bug after item removal	Tester caught the gift wrapping fee miscalculation — fixed before Priya ever saw the checkout
Deploying before Stripe webhook was registered	Deploy Agent's pre-deployment checklist flagged both missing steps — neither was skipped
Gift Wrapping being mentioned and forgotten	Leader added the planned node to the Feature Tree the moment Priya mentioned it — it was tracked from that sentence forward
Expensive top-model usage on trivial tasks	LLM routing used appropriate models throughout — the checkout flow did not cost the same as renaming a CSS variable
Snapshots not existing for rollback	Version Control took 47 snapshots across the full build — every significant state recoverable
Multi-agent conflict between Builder and UI/UX output	Frontend Contract system ensured UI/UX always had a precise specification — no guessing at what the API returned

A note on what this example compresses: A real project this size involves more sessions, more iterations, more back-and-forth on design decisions, and more edge cases than this example shows. The example is structured to demonstrate every mechanism in the system — every agent, every gate, every update — clearly and in order. The compression is in the number of turns, not in the mechanisms themselves. Every agent behavior shown here applies identically to a project ten times this size. The pipeline does not work differently at scale. It applies the same logic to every task regardless of how many tasks came before it.

9. Project Status & What's Next

Figure 9 — The four-stage build sequence from Core Loop to Full Platform.

Eight sections. The problems. The solutions. The twelve agents. The full pipeline. Three core innovations. Seven competitors examined honestly. A complete project built from first message to deployed product.

This section is the shortest. It does not need to be long. It needs to be true.

What Exists Right Now

This Gist is an architecture specification. Not a waitlist. Not a beta. Not a product announcement with a launch date attached.

Everything described in the previous eight sections has been designed, reasoned through, and written down. The agent roles are defined. The rules are written. The pipeline is mapped. The competitive gaps are named precisely.

None of it is running yet.

Publishing the architecture before the code exists is a deliberate choice, not a gap. It means the design is on the public record before a single line of implementation locks it in. It means anyone reading this can challenge the reasoning, point out what is missing, or identify where the architecture will break — while there is still time to change it without rewriting half a codebase.

The architecture is the commitment. The build follows it.

The Build Sequence

Twelve agents and the full pipeline will not be built simultaneously. The sequence below prioritizes two things: getting working software into real builders' hands as early as possible, and validating the most important architectural claims before building everything on top of them.

Stage 1 — The Core Loop

The goal: prove that persistent memory and agent specialization produce better output than a single stateless model.

The Stage 1 system is small. Five components. One central claim to test.

Component	What it delivers
Leader Agent	Reads user intent, dispatches tasks, reports results in plain language
Builder Agent	Generates backend code as a focused specialist
UI/UX Agent	Generates frontend from Builder's precise specification — not from guesswork
Memory Keeper	Holds the full project context across sessions, indefinitely
Basic Feature Tree	Shows the project as a growing map — visible, not buried in chat history

Stage 1 is validated when a builder who has hit the context death spiral on Bolt or Lovable uses this system across multiple sessions on the same project and does not have to re-explain anything. If that does not happen reliably, the architecture needs to change before Stage 2 begins.

Stage 2 — Quality and Safety

The goal: make the system trustworthy, not just capable.

Code that works is not the same as code that can be relied on. Stage 2 closes that gap.

Component	What it delivers
Tester Agent	Catches failures after every task — not in a pile at the end
Security Agent	Scans every file produced, blocks critical findings from shipping
Version Control Agent	Immutable snapshots before every significant change — nothing lost permanently
Confirmation Gate	User approval required before high-impact operations — no silent rewrites

Stage 2 is validated when a security vulnerability is caught before it reaches a deployed application, and when a test failure is caught before the builder ever sees broken output. Both on the same project. Within the first week.

Stage 3 — Intelligence and Efficiency

The goal: make the system smart about cost and thorough about accountability.

Component	What it delivers
LLM Routing System	Cheap models for trivial tasks, capable models for complex ones — 60–80% cost reduction per session
Info Collector Agent	Live documentation fetched and cached — no hallucinated API calls
Merge Agent	Agent conflicts resolved structurally, with evidence, not hoped away
Audit Agent	Every action logged, every scope violation flagged, every session replayable

Stage 3 is validated when the measurable cost per session is low enough to support a free tier that a non-technical founder can use for a full project without hitting a paywall.

Stage 4 — Full Platform

The goal: every mechanism documented in this Gist, running end to end.

Component	What it delivers
Deploy Agent	One-click deployment to web and mobile with pre-deployment checklist
Onboarding Agent	Full project creation from first question to first scaffold
Full Feature Tree	All node states, the dependency layer, feature-level rollback
Real-time collaboration	Multiple builders on the same project simultaneously

Stage 4 is validated when two different people build the same application using this system — a non-technical founder who has never opened a terminal, and a professional developer who reviews every line of output — and both finish with a working, production-ready application without significant friction. Same tool. Both succeed.

How This Document Will Change

This Gist is a living document.

The architecture described here is the current best understanding of the right design. Some of it will turn out to be wrong. A decision that looks correct on paper will reveal a flaw the moment it meets real builders on real projects. When that happens, this document will be updated — visibly, with a version number, a date, and a plain-language explanation of what changed and why.

Current version: 1.0 — Initial architecture specification, March 2026

Nothing will be silently edited. Corrections are made in the open, noted clearly, and dated. The reasoning behind every change will be recorded alongside the change itself. If an agent's role is revised, the document will say what the old role was, what the new one is, and why the change was made. If a claimed advantage turns out not to hold in practice, the document will say so rather than quietly removing the claim.

This is the standard the guide this Gist was written in the spirit of applies to itself. It is the standard this document applies to itself too.

A Final Note on the Tools That Came Before

The platforms critiqued in Section 7 were built by capable teams solving a real problem. Bolt made full-stack prototyping accessible to people who had never deployed an application. Lovable gave non-technical founders a path to a working product without hiring a developer. Cursor made professional developers meaningfully faster. These are real achievements that affected real people's ability to build things.

The gaps documented here are structural, not personal. They are the gaps of a first generation of tools that had to move fast enough to learn what the problem actually required. Each gap was invisible until enough real users hit it enough times to make it undeniable.

This architecture is built on that learning. It would not be possible without the first generation having shipped, gathered users, and revealed — through months of real-world use — exactly where the design needed to go further.

The next generation of tools after this one will be built on whatever this generation gets wrong. That is not a warning. It is how the field advances. Every architecture documented this carefully becomes a public artifact that the next builder can learn from, challenge, and push further.

The only goal here is to close the gap that matters most right now. The gap between what someone can describe and what they can actually build. For every builder, regardless of technical background, without the failures that have made existing tools genuinely unreliable for anything past a prototype.

That gap is closeable. This document is the plan for closing it.

Document Version: 1.0 — Published March 2026

Status: Architecture specification — pre-build

This document is open. Share it. Challenge it. Point out what is wrong. The architecture will be better for every honest critique it receives.

— End of Document —

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Assets/Images		Assets/Images
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Building The AI Coding Tool That Actually Works

A complete architecture for a multi-agent vibe coding platform — designed to solve every problem the current generation of tools got wrong.

Table of Contents

1. Introduction — What This Is & What It Contains

What Makes This Different

Who This Is For

What This Gist Contains

2. The Problems Every Existing Tool Has

The Two Categories

Category A — Universal Problems

Problem 1 — The Tool Forgets Everything

Problem 2 — Generated Code Is Never Production-Ready

Problem 3 — Quality Degrades With Every Iteration

Problem 4 — Debugging Loops Burn Resources and Solve Nothing

Problem 5 — Security Is an Afterthought

Problem 6 — Multi-File Changes Break Things

Problem 7 — No Real Collaboration Exists

Problem 8 — Speed Is an Illusion

Category B — Unique Problems

The Honest Summary

3. How Each Problem Is Solved

How to Read This Section

Universal Problems — Solved

Problem 1 — The Tool Forgets Everything

Problem 2 — Generated Code Is Never Production-Ready

Problem 3 — Code Quality Degrades With Every Iteration

Problem 4 — Debugging Loops Burn Resources and Solve Nothing

Problem 5 — Security Is an Afterthought

Problem 6 — Multi-File Changes Break Things

Problem 7 — No Real Collaboration Exists

Problem 8 — Speed Is an Illusion

Unique Problems — Solved

The Complete Problem-to-Solution Map

4. How The System Works — The Full Pipeline

The Two Modes of Operation

Mode 1 — Project Creation Pipeline

Mode 2 — Active Development Pipeline

What Happens When Something Goes Wrong

The Pipeline at a Glance

5. The Core Innovations — Multi-Agent Team, LLM Routing & Living Feature Tree

Innovation 1 — The Multi-Agent Team

The Problem With One Model Doing Everything

The Team Structure

Why Specialization Produces Better Results

The Separation of Concerns Principle

The Leader Is Not a Bottleneck

Innovation 2 — LLM Routing by Task Complexity

The Problem With One Model for Every Task

How LLM Routing Works

What This Means for Cost and Quality

What This Does Not Mean

Innovation 3 — The Living Feature Tree

The Problem With Invisible Projects

What the Living Feature Tree Is

Node States — What the Tree Tells You at a Glance

How the Tree Updates

The Dependency Layer

Why the Feature Tree Is the Viral Moment

How the Three Innovations Work Together

6. All 12 Agents — Roles, Facilities & Rules

Agent 01 — 👑 Leader

Agent 02 — 🏗️ Builder

Agent 03 — 🎨 UI/UX

Agent 04 — 🔍 Info Collector

Agent 05 — 🧪 Tester

Agent 06 — 🔒 Security Agent

Agent 07 — 🧠 Memory Keeper

Agent 08 — 🚀 Deploy Agent

Agent 09 — 👁️ Audit Agent

Agent 10 — 🔀 Merge Agent

Agent 11 — ↩️ Version Control

Agent 12 — 🌱 Onboarding Agent

The Full Team at a Glance

7. Why This Beats The Competition

How to Read This Section

Bolt.new

Packages