From 3f90fb53aad09d54d5a2b2119edc48605c016c50 Mon Sep 17 00:00:00 2001
From: Tobias Grosse-Puppendahl <tobias@grosse-puppendahl.de>
Date: Thu, 11 Jun 2026 10:06:05 +0200
Subject: [PATCH 1/5] docs(openspec): propose semantic process layer
 restructure

Proposal restructure-agent-platform: Connections/Builder/Agent navigation,
single project agent with Settings-based LLM config (builder env fallback),
Data Sources header tools as overlay dialogs, Improvements & Testing panel
with failing tests, and a new agent-scaffold capability (plugin filesystem,
seeded .mcp.json, zip export).

Co-authored-by: Cursor <cursoragent@cursor.com>
---
 .../restructure-agent-platform/design.md      |  79 +++
 .../restructure-agent-platform/proposal.md    |  60 +++
 .../specs/agent-scaffold/spec.md              | 110 ++++
 .../specs/connection-management-ui/spec.md    |  68 +++
 .../specs/data-browser/spec.md                |  36 ++
 .../specs/duckdb-console/spec.md              |  38 ++
 .../specs/frontend-shell/spec.md              |  98 ++++
 .../specs/home-dashboard/spec.md              |  29 ++
 .../specs/project-management/spec.md          | 181 +++++++
 .../specs/semantic-model-agent/spec.md        | 166 ++++++
 .../specs/semantic-models/spec.md             |  52 ++
 .../specs/testing-suite/spec.md               | 475 ++++++++++++++++++
 .../restructure-agent-platform/tasks.md       |  70 +++
 13 files changed, 1462 insertions(+)
 create mode 100644 openspec/changes/restructure-agent-platform/design.md
 create mode 100644 openspec/changes/restructure-agent-platform/proposal.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/connection-management-ui/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/data-browser/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/duckdb-console/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/frontend-shell/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/home-dashboard/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/project-management/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/tasks.md

diff --git a/openspec/changes/restructure-agent-platform/design.md b/openspec/changes/restructure-agent-platform/design.md
new file mode 100644
index 0000000..17494ee
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/design.md
@@ -0,0 +1,79 @@
+# Design — restructure-agent-platform
+
+## Context
+
+archmax currently presents itself as a database-semantics manager: connections are federated through DuckDB, semantic models are authored by a Deep Agents builder, and an arbitrary number of "test agents" (each with its own LLM credentials) can be exercised in a playground and batch test runs. The product vision is a **semantic process layer**: a project's output is an **agent scaffold** — a plugin-style filesystem consumed by an agent harness — and the project has exactly one agent whose quality is measured by the test suite. This change is mostly an information-architecture and configuration-model restructure; the existing LangChain Deep Agents playground/test-runner already provides the required test harness.
+
+Constraints:
+
+- Single-user system, MongoDB + YAML-files-on-disk storage, typed Hono RPC client (no raw fetch in frontend).
+- The active change `add-llm-prompt-caching` touches `agent.ts` / `playground-agent.ts`; implementation must be sequenced against it.
+- Spec conventions: settings pages use inline label+input grids; popups use `--popover` (page-grey) backgrounds; filters use ghost styling.
+
+## Goals / Non-Goals
+
+- **Goals**
+  - Navigation that mirrors the process: Connections → Builder → Agent → Testing → MCP Access → Settings.
+  - One agent per project, configured in Settings, used by playground and test runs.
+  - Per-project LLM configuration for the builder with env fallback.
+  - Formalize the project directory as an exportable, agent-authored plugin scaffold.
+  - Surface failing tests where improvement work happens (Builder panel).
+- **Non-Goals**
+  - APIs connections and API Models (both ship as visible-but-disabled "soon" placeholders only).
+  - A scaffold *generation pipeline* — the builder agent authors scaffold files directly with its existing filesystem tools.
+  - Hosting or executing external harnesses; the export is a downloadable artifact.
+  - Multi-agent support of any kind.
+
+## Decisions
+
+### D1 — Agent and builder LLM config live on the `Project` document
+
+Two optional subdocuments: `builderLlm { baseUrl?, encryptedApiKey?, model? }` and `agentLlm { baseUrl, encryptedApiKey, model, systemPrompt }`. API keys reuse the AES-256-GCM encryption already used by `TestAgent`/`github.encryptedToken`, with the same SSRF validation rules for base URLs. A dedicated `llm-settings` route family handles GET (masked) / PUT (re-encrypt on new key) / test-connection, rather than overloading `PUT /api/projects/:id`, so key masking and partial updates stay isolated.
+
+- *Alternative considered:* a singleton `ProjectAgent` collection — rejected; it resurrects the TestAgent shape and adds a join for no benefit in a single-user system.
+
+### D2 — Builder resolution is per-field project → env; the agent requires explicit config
+
+The builder keeps working out of the box via `AGENT_*` env vars; project values override field-by-field. The **agent** has no env fallback: it is the project's deliverable and its credentials are an explicit choice. Playground input and run-creation are blocked with a pointer to Settings → Agent until configured. This also makes the migration story honest (see D6).
+
+### D3 — Scaffold lives at the project-directory root
+
+The existing project dir (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) *is* the agent filesystem: model YAMLs and `AGENTS.md` already live there; `commands/`, `agents/`, `skills/`, `hooks/`, `scripts/`, `.mcp.json` join them. The Deep Agents `FilesystemBackend` already roots there, so the builder can author scaffold files with no new tooling. Export excludes internal entries (`.git/`, `large_tool_results/`, `uploads/`, `duckdb.db*`, temp files).
+
+- *Alternative considered:* a `scaffold/` subdirectory — rejected; it splits the agent filesystem in two ( YAML models would sit outside the scaffold) and complicates the Git story which already versions the whole project dir.
+
+### D4 — `.mcp.json` is platform-seeded with a token placeholder
+
+Seeded on project creation and refreshed on slug change: an `archmax` MCP server entry pointing at the project's MCP endpoint, with `Authorization: Bearer ${ARCHMAX_MCP_TOKEN}`. Real tokens never reach the file (it is Git-versioned and exported). The builder may edit the file to add further servers; JSON validation on write protects `hooks/hooks.json` and `.mcp.json` the same way YAML validation protects models.
+
+### D5 — Browser and Console become full-size overlay dialogs with `?tool=` deep links
+
+The sidebar loses both entries; the Data Sources header hosts Browser (icon+text), Console (icon-only), Re-initialize schemas (icon-only). The dialogs are near-viewport-size with shadow, reusing the existing page components. A `tool=browser|console` search param on `/connections` opens the corresponding dialog so old routes can redirect losslessly and the dialogs stay deep-linkable.
+
+### D6 — Migration drops all TestAgents (user decision: manual reconfiguration)
+
+A schema migration soft-deletes every `TestAgent` document and unsets `TestCase.testAgent`. `TestRun.testAgent` becomes optional; historical runs remain readable (legacy agent name shown when present). New runs snapshot the project agent's `llmModel` per run instead of referencing an agent document. Conversations: playground conversations are identified by a `playground: true` flag going forward; legacy `testAgent` references remain readable.
+
+### D7 — "Failing tests" = latest result per test case
+
+A test case is *failing* when the most recent `TestRun` embedded result for it has status `failed` or `error`. A new endpoint aggregates this (`latest-results`), powering the **Improvements & Testing** panel. No new persistent state is introduced — the registry is derived from existing `TestRun` data, so it can never drift.
+
+## Risks / Trade-offs
+
+- **Large rename surface** (routes, labels, docs, e2e selectors) → mitigated by keeping all backend route prefixes except `test-agents` stable, and adding redirects for moved frontend routes.
+- **Removing TestAgent breaks the builder's `list_test_agents` tool and prompt flow** → tool removed and prompt updated in the same change; `create_test_case` is simplified rather than left referencing dead concepts.
+- **Concurrent edit conflict with `add-llm-prompt-caching`** → no overlapping spec requirements; sequence implementation (rebase whichever lands second).
+- **Export could leak secrets** → export reuses an explicit denylist and `.mcp.json` is placeholder-only by construction; a test asserts no `encryptedApiKey`/token material is present in the bundle.
+- **Blocking playground/test-runs until the agent is configured adds first-run friction** → mitigated by prominent empty states deep-linking to Settings → Agent.
+
+## Migration Plan
+
+1. Ship schema migration `00X-drop-test-agents`: soft-delete all `TestAgent` docs, `$unset` `TestCase.testAgent`, leave `TestRun` documents untouched.
+2. Seed `.mcp.json` for existing projects lazily (on first builder-agent start or settings save) and for new projects on creation.
+3. Frontend redirects: `/testing/playground → /agent`, `/testing/agents → /settings/agent`, `/connections/data → /connections?tool=browser`, `/connections/console → /connections?tool=console`, `/data → /connections?tool=browser`.
+4. Rollback: the migration is destructive for TestAgents by user decision; rollback restores routes/UI only.
+
+## Open Questions
+
+- Should the scaffold export embed a generated `README.md` describing harness setup? (Lean yes, deferred to implementation detail of the export task.)
+- Whether `uploads/` should optionally be includable in the export for harnesses that want source documents — excluded for now.
diff --git a/openspec/changes/restructure-agent-platform/proposal.md b/openspec/changes/restructure-agent-platform/proposal.md
new file mode 100644
index 0000000..1de437e
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/proposal.md
@@ -0,0 +1,60 @@
+# Change: Restructure archmax as a Semantic Process Layer (Agent Platform)
+
+## Why
+
+archmax is repositioning from "a tool that manages semantic descriptions of databases" to a **semantic process layer**: each project produces an **agent scaffold** — a plugin-style filesystem (skills, subagents, commands, hooks, MCP definition) plus data models — that is consumed by an agent harness. The current UI taxonomy (Data Federation / Semantic Models / Testing with multiple ad-hoc test agents) does not communicate this. The product has *one* builder (authors the scaffold) and *one* agent per project (the deliverable, exercised in the playground and by the test harness). Navigation, settings, and the testing area must be restructured around that mental model.
+
+## What Changes
+
+### Navigation & shell
+
+- Rename the **Data Federation** sidebar group to **Connections**, containing **Data Sources** and a greyed-out, inactive **APIs** entry with a "soon" tag. Browser and Console leave the sidebar.
+- Rename the **Semantic Models** main-menu item to **Builder** (route `/$projectId/models` unchanged).
+- Add a new top-level **Agent** menu item at `/$projectId/agent` hosting the playground (moved from `/$projectId/testing/playground`).
+- **Testing** group shrinks to **Test Cases** and **Test Runs** (Test Agents page removed, Playground moved).
+- **Settings** becomes a collapsible group: **General** (`/settings`), **Builder** (`/settings/builder`), **Agent** (`/settings/agent`).
+
+### Data Sources page (headline tools)
+
+- The Data Sources page header gains: a **Browser** button (icon + text), a **Console** icon-only button, and an icon-only **Re-initialize schemas** button. Browser and Console open as full-width/full-height overlay dialogs with shadow; reinit executes directly (current behavior, icon-only). Old routes (`/connections/data`, `/connections/console`, legacy `/data`) redirect to `/connections?tool=browser|console`.
+
+### Builder side panel
+
+- The left panel's "Semantic Models" section becomes **Agent Scaffold** with sub-entries **Data Models** (the current model list) and **API Models** (greyed out, "soon" tag).
+- The "Chat" section is renamed **Build**.
+- "Improvement Requests" is renamed **Improvements & Testing** and additionally lists the project's currently failing test cases (latest run result `failed`/`error`), each linking to the failing run and offering a Refine-style prefill into the Build chat.
+
+### Single project agent (**BREAKING**)
+
+- Multiple test agents are no longer supported; each project has exactly **one** agent. The `TestAgent` model, its CRUD API (`/api/projects/:projectId/test-agents`), and the Test Agents page are removed.
+- The agent's configuration (OpenAI-compatible base URL, API key, model name, system prompt) moves to **Settings → Agent** with a test-connection button. **Migration: all existing TestAgent documents are dropped; the user reconfigures the agent manually in Settings.** `TestCase.testAgent` is removed; historical `TestRun` documents stay readable.
+- Playground and test runs always execute with the project agent; both are blocked with a clear pointer to Settings → Agent while it is unconfigured.
+
+### Per-project builder LLM settings
+
+- **Settings → Builder** stores per-project overrides for the builder LLM (base URL, API key, model) with a test-connection button. Resolution is per-field: project value → env (`AGENT_API_BASE_URL` / `AGENT_API_KEY` / `AGENT_MODEL`). The "agent not configured" banner becomes project-aware and points at the settings pages.
+
+### Agent scaffold (new capability)
+
+- The project directory is formalized as a plugin-style **agent filesystem** authored directly by the builder agent (no generation pipeline): `commands/`, `agents/`, `skills/<name>/SKILL.md`, `hooks/hooks.json`, `scripts/`, and `.mcp.json`, alongside the existing data-model YAML files and `AGENTS.md`.
+- `.mcp.json` is seeded and maintained by the platform, pointing at the project's MCP endpoint with an env-var token placeholder (never a real token).
+- The builder's file backend gains JSON syntax validation on write (mirroring the existing YAML validation).
+- A scaffold export endpoint (`GET /api/projects/:projectId/scaffold/export`) downloads the scaffold as a zip for use in external Deep-Agents-compatible harnesses; an Export action is available in the Agent Scaffold panel. The existing LangChain Deep Agents playground/test-runner remains the built-in test harness.
+
+### Builder agent tool changes (**BREAKING**)
+
+- `list_test_agents` tool removed; `create_test_case` loses `testAgentId`; `list_test_cases` no longer reports agent assignment.
+
+## Impact
+
+- Affected specs: `frontend-shell`, `connection-management-ui`, `data-browser`, `duckdb-console`, `testing-suite`, `semantic-model-agent`, `semantic-models`, `project-management`, `home-dashboard`, and new capability `agent-scaffold`.
+- Affected code:
+  - `apps/frontend/src/components/layout/app-sidebar.tsx` (nav restructure)
+  - `apps/frontend/src/routes/_auth/$projectId/` — `connections/*`, `models.tsx`, `testing/*`, new `agent.tsx`, `settings*` (route moves, overlay dialogs, panel restructure)
+  - `packages/core/src/models/` — remove `TestAgent.ts`, edit `Project.ts`, `TestCase.ts`, `TestRun.ts`, `Conversation.ts`
+  - `apps/api/src/routes/` — remove `test-agents.ts`; add `llm-settings.ts`, `scaffold.ts`; edit `test-cases.ts`, `test-runs.ts`, `playground.ts`, `config`
+  - `packages/core/src/services/` — `agent.ts`, `playground-agent.ts`, `test-runner.ts`, `agent-tools.ts`, `git.ts` (scaffold ignore rules), filesystem backend validation
+  - `apps/worker/src/processor.ts` (playground branching without testAgentId)
+  - Schema migration (drop TestAgents, unset `TestCase.testAgent`)
+  - `apps/docs` (navigation, testing, settings, new agent-scaffold guide)
+- Coordination: the active change `add-llm-prompt-caching` also edits `packages/core/src/services/agent.ts` and `playground-agent.ts`. No spec-requirement overlap, but implementation should be sequenced (caching first or rebase this change on it).
diff --git a/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md b/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
new file mode 100644
index 0000000..c67e20b
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
@@ -0,0 +1,110 @@
+## ADDED Requirements
+
+### Requirement: Agent Scaffold Filesystem Layout
+
+Each project's data directory (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) SHALL constitute the project's **agent scaffold**: a plugin-style filesystem intended for consumption by an agent harness. In addition to the existing data-model YAML files and the optional `AGENTS.md`, the scaffold SHALL support the following conventional entries:
+
+```
+<project-dir>/
+├── *.yaml               # data models (existing semantic model files)
+├── AGENTS.md            # agent instructions / memory (existing)
+├── commands/            # slash commands (.md) — legacy, prefer skills/
+├── agents/              # subagent definitions (.md)
+├── skills/
+│   └── <skill-name>/
+│       └── SKILL.md
+├── hooks/
+│   └── hooks.json       # event handlers
+├── .mcp.json            # MCP server definitions
+└── scripts/             # helper scripts
+```
+
+Scaffold files SHALL be authored **directly by the builder agent** through its existing Deep Agents filesystem tools (no separate generation pipeline). The builder's system prompt SHALL document the scaffold layout and conventions, including that `skills/` is preferred over `commands/` for new capabilities. Scaffold directories and files SHALL be included in the project's Git versioning (they are source, not build output).
+
+#### Scenario: Builder authors a skill
+
+- **WHEN** the user asks the builder to add a "monthly revenue report" capability
+- **THEN** the builder uses `write_file` to create `skills/monthly-revenue-report/SKILL.md` inside the project directory
+- **AND** the file participates in publish/Git versioning like any other project file
+
+#### Scenario: Scaffold coexists with data models
+
+- **WHEN** a project contains scaffold directories alongside `*.yaml` model files
+- **THEN** semantic-model listing and MCP tools continue to operate on the YAML files unchanged
+- **AND** the scaffold entries do not interfere with model parsing
+
+#### Scenario: System prompt documents the layout
+
+- **WHEN** the builder agent's system prompt is composed
+- **THEN** it describes the scaffold layout (`commands/`, `agents/`, `skills/<name>/SKILL.md`, `hooks/hooks.json`, `.mcp.json`, `scripts/`) and the skills-over-commands preference
+
+### Requirement: Seeded MCP Server Definition
+
+The platform SHALL seed and maintain a `.mcp.json` file at the project root containing an `archmax` MCP server entry pointing at the project's MCP endpoint (derived from the configured application base URL and the project slug). The entry SHALL reference the bearer token via an environment-variable placeholder (e.g. `${ARCHMAX_MCP_TOKEN}`); real token values MUST NOT be written to the file. The file SHALL be created on project creation, recreated if missing when the builder agent starts, and updated when the project slug changes. The builder agent MAY extend the file with additional servers; the platform SHALL preserve unknown entries when updating its own.
+
+#### Scenario: New project gets a seeded .mcp.json
+
+- **WHEN** a project with slug `ecommerce` is created
+- **THEN** `.mcp.json` exists at the project root with an `archmax` server entry whose URL targets the project's MCP endpoint for `ecommerce`
+- **AND** authorization references `${ARCHMAX_MCP_TOKEN}` rather than a literal token
+
+#### Scenario: Slug change updates the endpoint
+
+- **WHEN** the project slug is changed in settings
+- **THEN** the `archmax` entry's URL in `.mcp.json` is updated to the new slug
+- **AND** any additional user/agent-added server entries are preserved
+
+#### Scenario: No secrets in the file
+
+- **WHEN** `.mcp.json` is written or updated by the platform
+- **THEN** the file contains no literal bearer tokens, API keys, or other secret material
+
+### Requirement: JSON Syntax Validation on Write
+
+The builder agent's filesystem backend SHALL validate JSON syntax before persisting any file whose path ends in `.json` (including `.mcp.json` and `hooks/hooks.json`). When the content is not valid JSON, the `write_file` tool MUST return an error describing the syntax issue instead of writing the file. When an `edit_file` operation on a JSON file produces syntactically invalid content, the tool MUST return an error so the agent can self-correct. This mirrors the existing YAML validation for `.yaml`/`.yml` files.
+
+#### Scenario: Invalid JSON rejected
+
+- **WHEN** the builder invokes `write_file` for `hooks/hooks.json` with malformed JSON
+- **THEN** the tool returns an error describing the syntax problem
+- **AND** the file is not written to disk
+
+#### Scenario: Valid JSON written
+
+- **WHEN** the builder writes syntactically valid JSON to `.mcp.json`
+- **THEN** the file is persisted normally
+
+### Requirement: Scaffold Export API
+
+The API SHALL expose an authenticated `GET /api/projects/:projectId/scaffold/export` endpoint that streams a zip archive of the project's agent scaffold, named `<project-slug>-scaffold.zip`. The archive SHALL contain the project directory contents **excluding** internal entries: `.git/`, `large_tool_results/`, `uploads/`, `duckdb.db` and its side files (`*.wal`, `*.tmp`), and any dotfile temp artifacts. The archive MUST NOT contain secret material; `.mcp.json` is included as seeded (placeholder token only). The endpoint SHALL require admin session auth and return 404 for unknown projects.
+
+#### Scenario: Export a scaffold
+
+- **WHEN** an authenticated GET request is made to `/api/projects/:projectId/scaffold/export` for a project with models, `AGENTS.md`, and a skill
+- **THEN** the response is a zip download named `<slug>-scaffold.zip`
+- **AND** it contains the YAML models, `AGENTS.md`, `skills/`, and `.mcp.json`
+- **AND** it contains no `.git/`, `large_tool_results/`, `uploads/`, or DuckDB files
+
+#### Scenario: No secrets in the export
+
+- **WHEN** an exported archive is inspected
+- **THEN** it contains no bearer tokens, API keys, or encrypted credential material
+
+#### Scenario: Unauthenticated export rejected
+
+- **WHEN** the request lacks a valid admin session
+- **THEN** a 401 error is returned
+
+### Requirement: Scaffold Export UI
+
+The Builder side panel's **Agent Scaffold** section header SHALL provide an icon-only Export action (download icon with accessible name "Export scaffold"). Activating it SHALL download the scaffold archive via the export endpoint. While the export is being prepared the control SHALL be disabled; on failure an error toast with the server message SHALL be shown.
+
+#### Scenario: Export from the panel
+
+- **WHEN** the user clicks the Export action in the Agent Scaffold section header
+- **THEN** the browser downloads `<slug>-scaffold.zip` from the export endpoint
+
+#### Scenario: Export failure surfaces an error
+
+- **WHEN** the export endpoint responds with an error
+- **THEN** an error toast displays the server-provided message
diff --git a/openspec/changes/restructure-agent-platform/specs/connection-management-ui/spec.md b/openspec/changes/restructure-agent-platform/specs/connection-management-ui/spec.md
new file mode 100644
index 0000000..89ab178
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/connection-management-ui/spec.md
@@ -0,0 +1,68 @@
+## RENAMED Requirements
+
+- FROM: `### Requirement: Re-explore Schemas Control`
+- TO: `### Requirement: Re-initialize Schemas Control`
+
+## MODIFIED Requirements
+
+### Requirement: Re-initialize Schemas Control
+
+The Data Sources page header SHALL display an icon-only "Re-initialize schemas" button (refresh icon with a tooltip/`title` of "Re-initialize schemas") alongside the other header tools and the "New Connection" button. Activating the button SHALL trigger a project-wide refresh that invalidates the cached DuckDB instance and re-attaches every active connection so the data browser, semantic-model agent, and MCP tools observe the current upstream schema. While the refresh is in flight, the button SHALL be disabled and display a loading spinner. On success, the page SHALL show a success toast including the number of tables visible after the refresh (e.g. `Schemas refreshed — 42 tables visible`) and SHALL invalidate the cached connection list query. On failure, the page SHALL show an error toast containing the server-provided error message. When the project has no connections, the button SHALL be disabled.
+
+#### Scenario: Refresh schemas with attached connections
+
+- **WHEN** the user clicks the icon-only "Re-initialize schemas" button on a project that has at least one active connection
+- **THEN** a `POST` request is sent to `/api/projects/:projectId/connections/reinit`
+- **AND** while the request is pending the button is disabled and shows a spinner
+- **AND** on a successful response `{ ok: true, tableCount: N }` a success toast displays `Schemas refreshed — N tables visible`
+- **AND** the `["connections", projectId]` query cache is invalidated
+
+#### Scenario: Refresh fails when a connection is unreachable
+
+- **WHEN** the user clicks "Re-initialize schemas" and the server returns `{ ok: false, error: "..." }` with HTTP 400
+- **THEN** an error toast displays the server-provided error message
+- **AND** the button returns to its idle state
+
+#### Scenario: Button disabled when there are no connections
+
+- **WHEN** the project has no active connections
+- **THEN** the "Re-initialize schemas" button is rendered in a disabled state so it cannot be activated
+
+#### Scenario: Icon-only button exposes its label accessibly
+
+- **WHEN** the "Re-initialize schemas" button is rendered
+- **THEN** it shows only an icon (no text label)
+- **AND** its accessible name ("Re-initialize schemas") is available via tooltip/`title`/`aria-label`
+
+## ADDED Requirements
+
+### Requirement: Data Sources Header Tools
+
+The Data Sources page header SHALL display, alongside the "New Connection" button and the "Re-initialize schemas" control, two tool buttons: **Browser** (icon plus the text label "Browser") and **Console** (icon-only, with a tooltip/accessible name of "Console").
+
+Clicking Browser SHALL open the data browser, and clicking Console SHALL open the DuckDB federation console, each in a **full-width/full-height overlay dialog**: a modal surface sized to (near) the full viewport, with a drop shadow and a visible close control. The dialogs SHALL use the standard overlay background (`bg-popover` page-grey per the UI surface hierarchy).
+
+The page SHALL support a `tool` search parameter on `/$projectId/connections`: `?tool=browser` opens the Browser dialog and `?tool=console` opens the Console dialog on page load. Opening or closing a dialog SHALL update/clear the `tool` search param so the dialogs are deep-linkable. Closing a dialog returns the user to the Data Sources page state.
+
+#### Scenario: Open the data browser from the header
+
+- **WHEN** the user clicks the "Browser" button in the Data Sources page header
+- **THEN** a full-width/full-height overlay dialog with shadow opens containing the data browser
+- **AND** the URL search params include `tool=browser`
+
+#### Scenario: Open the console from the header
+
+- **WHEN** the user clicks the icon-only Console button
+- **THEN** a full-width/full-height overlay dialog with shadow opens containing the DuckDB console
+- **AND** the URL search params include `tool=console`
+
+#### Scenario: Close a tool dialog
+
+- **WHEN** the user closes the Browser or Console dialog
+- **THEN** the dialog is dismissed and the Data Sources page is visible again
+- **AND** the `tool` search param is cleared
+
+#### Scenario: Deep link opens the dialog
+
+- **WHEN** the user navigates directly to `/<projectId>/connections?tool=console`
+- **THEN** the Data Sources page renders with the Console dialog already open
diff --git a/openspec/changes/restructure-agent-platform/specs/data-browser/spec.md b/openspec/changes/restructure-agent-platform/specs/data-browser/spec.md
new file mode 100644
index 0000000..af08218
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/data-browser/spec.md
@@ -0,0 +1,36 @@
+## MODIFIED Requirements
+
+### Requirement: Data Browser Frontend Page
+
+The frontend SHALL render the data browser inside a full-width/full-height overlay dialog opened from the Data Sources page header (Browser button, or deep link `/$projectId/connections?tool=browser`). The browser SHALL display the project's attached databases and their tables in a navigable layout: the left panel SHALL list databases as expandable sections, each showing its tables; selecting a table SHALL display its data in a paginated table on the right.
+
+The previous standalone routes (`/$projectId/data` and `/$projectId/connections/data`) SHALL no longer render their own pages and SHALL redirect to `/$projectId/connections?tool=browser`.
+
+#### Scenario: View databases and tables
+
+- **WHEN** the user opens the Browser dialog from the Data Sources page
+- **THEN** the left panel lists all attached databases
+- **AND** expanding a database reveals its tables
+
+#### Scenario: Select table and view data
+
+- **WHEN** the user clicks on a table name
+- **THEN** the right panel displays the table's data in a paginated table
+- **AND** column headers show column names and types
+- **AND** pagination controls are visible at the bottom
+
+#### Scenario: Navigate between pages
+
+- **WHEN** the user clicks a pagination control (next, previous, or page number)
+- **THEN** the table data updates to show the corresponding page
+
+#### Scenario: No connections
+
+- **WHEN** the project has no active connections
+- **THEN** an empty state message is displayed indicating no databases are available
+
+#### Scenario: Legacy browser routes redirect
+
+- **WHEN** the user navigates to `/$projectId/data` or `/$projectId/connections/data`
+- **THEN** they are redirected to `/$projectId/connections?tool=browser`
+- **AND** the Browser dialog opens automatically
diff --git a/openspec/changes/restructure-agent-platform/specs/duckdb-console/spec.md b/openspec/changes/restructure-agent-platform/specs/duckdb-console/spec.md
new file mode 100644
index 0000000..c2faa46
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/duckdb-console/spec.md
@@ -0,0 +1,38 @@
+## MODIFIED Requirements
+
+### Requirement: DuckDB Console Page
+
+The frontend SHALL render the federation console inside a full-width/full-height overlay dialog opened from the Data Sources page header (icon-only Console button, or deep link `/$projectId/connections?tool=console`). The previous standalone route `/$projectId/connections/console` SHALL redirect to `/$projectId/connections?tool=console`. The Console SHALL NOT appear as a sidebar navigation item.
+
+The console SHALL present a **single SQL editor** (textarea is sufficient) and a **Run** control in the dialog header. Run SHALL route the submitted statement based on its leading keyword:
+
+- Statements beginning with `INSTALL` or `LOAD` SHALL be submitted to `POST .../duckdb-console/extensions`; on success a `toast.success` SHALL confirm the loaded extension.
+- All other statements SHALL be submitted to `POST .../duckdb-console/query` and the results rendered in a table with column headers.
+
+The console SHALL NOT render a separate setup-commands panel or a separate extension-install control; the one editor serves both purposes. The console MAY load `GET .../duckdb-console/setup` to determine whether the project has active connections.
+
+When the project has no active connections, the console SHALL show an empty state directing the user to add connections on the Data Sources page, and the **Run** control SHALL be disabled.
+
+#### Scenario: Run query from console
+
+- **WHEN** the user enters `SELECT 1` and clicks **Run**
+- **THEN** the results table shows one row
+- **AND** a success toast is not shown for query success (results are sufficient); errors use `toast.error` with the server message
+
+#### Scenario: Install extension from the same editor
+
+- **WHEN** the user enters `INSTALL spatial FROM community` and clicks **Run**
+- **THEN** the statement is sent to the extensions endpoint
+- **AND** a `toast.success` confirms the extension was loaded
+
+#### Scenario: Open console from the Data Sources header
+
+- **WHEN** the user clicks the icon-only Console button on the Data Sources page
+- **THEN** the console opens in a full-width/full-height overlay dialog with shadow
+- **AND** the URL search params include `tool=console`
+
+#### Scenario: Legacy console route redirects
+
+- **WHEN** the user navigates to `/<projectId>/connections/console`
+- **THEN** they are redirected to `/<projectId>/connections?tool=console`
+- **AND** the Console dialog opens automatically
diff --git a/openspec/changes/restructure-agent-platform/specs/frontend-shell/spec.md b/openspec/changes/restructure-agent-platform/specs/frontend-shell/spec.md
new file mode 100644
index 0000000..d0e3869
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/frontend-shell/spec.md
@@ -0,0 +1,98 @@
+## MODIFIED Requirements
+
+### Requirement: Sidebar Navigation
+
+The sidebar SHALL display navigation items below the project selector. Each item has an icon and a label. The top-level items are, in order: Home, Connections, Builder, Agent, Testing, MCP Access, and Settings. The active route is visually highlighted.
+
+The Home item SHALL be a leaf link pointing to `/$projectId` (the project dashboard). It SHALL use exact-match active detection so it is only highlighted when the user is on the dashboard, not on any sub-route.
+
+The Connections item SHALL be a collapsible group with two sub-items: **Data Sources** (`/$projectId/connections`) and **APIs**. The APIs sub-item SHALL be rendered greyed out (muted/disabled styling), SHALL display a small "soon" tag, and SHALL NOT be clickable or navigate anywhere. Browser and Console SHALL NOT appear in the sidebar (they are reachable from the Data Sources page header).
+
+The Builder item SHALL be a leaf link pointing to `/$projectId/models` (the former "Semantic Models" entry, renamed).
+
+The Agent item SHALL be a leaf link pointing to `/$projectId/agent` (the agent playground).
+
+The Testing item SHALL be a collapsible group with two sub-items: Test Cases (`/$projectId/testing/cases`) and Test Runs (`/$projectId/testing/runs`). The group expands automatically when the active route is within the testing section. Clicking the Testing label toggles the group open/closed. Test Agents and Playground SHALL NOT appear under Testing.
+
+The MCP Access item SHALL be a collapsible group with two sub-items: Tokens (`/$projectId/mcp-access`) and Log (`/$projectId/monitoring`).
+
+The Settings item SHALL be a collapsible group with three sub-items: General (`/$projectId/settings`), Builder (`/$projectId/settings/builder`), and Agent (`/$projectId/settings/agent`). The group expands automatically when the active route is within the settings section.
+
+Removed or moved routes SHALL redirect: `/$projectId/testing/playground` → `/$projectId/agent`, `/$projectId/testing/agents` → `/$projectId/settings/agent`, `/$projectId/connections/data` → `/$projectId/connections?tool=browser`, `/$projectId/connections/console` → `/$projectId/connections?tool=console`, and the legacy `/$projectId/data` → `/$projectId/connections?tool=browser`.
+
+#### Scenario: Navigate to Home
+
+- **WHEN** the user clicks the Home nav item
+- **THEN** the URL changes to `/<projectId>`
+- **AND** the Home item is highlighted as active
+
+#### Scenario: Home not active on sub-routes
+
+- **WHEN** the user is on `/<projectId>/connections`
+- **THEN** the Home item is not highlighted
+- **AND** the Connections group is highlighted instead
+
+#### Scenario: Navigate to Data Sources
+
+- **WHEN** the user clicks the Data Sources sub-item under Connections
+- **THEN** the URL changes to `/<projectId>/connections`
+- **AND** the Data Sources item is highlighted as active
+
+#### Scenario: APIs entry is inactive with soon tag
+
+- **WHEN** the user views the expanded Connections group
+- **THEN** an APIs entry is shown in greyed-out styling with a "soon" tag
+- **AND** clicking it does not navigate or change the URL
+
+#### Scenario: Navigate to Builder
+
+- **WHEN** the user clicks the Builder nav item
+- **THEN** the URL changes to `/<projectId>/models`
+- **AND** the Builder item is highlighted as active
+
+#### Scenario: Navigate to Agent
+
+- **WHEN** the user clicks the Agent nav item
+- **THEN** the URL changes to `/<projectId>/agent`
+- **AND** the Agent item is highlighted as active
+
+#### Scenario: Navigate to MCP Access
+
+- **WHEN** the user clicks the MCP Access nav item
+- **THEN** the URL changes to `/<projectId>/mcp-access`
+- **AND** the MCP Access item is highlighted as active
+
+#### Scenario: Navigate to Testing sub-item
+
+- **WHEN** the user clicks a Testing sub-item (Test Cases or Test Runs)
+- **THEN** the URL changes to the corresponding route (e.g. `/<projectId>/testing/runs`)
+- **AND** the sub-item is highlighted as active
+- **AND** the Testing group is expanded
+
+#### Scenario: Testing group auto-expands on active route
+
+- **WHEN** the user navigates to any `/<projectId>/testing/*` route
+- **THEN** the Testing group is automatically expanded
+- **AND** the matching sub-item is highlighted
+
+#### Scenario: Collapse Testing group
+
+- **WHEN** the user clicks the Testing group label while it is expanded
+- **THEN** the sub-items are hidden
+- **AND** clicking again re-expands the group
+
+#### Scenario: Navigate to Settings sub-item
+
+- **WHEN** the user clicks the Agent sub-item under Settings
+- **THEN** the URL changes to `/<projectId>/settings/agent`
+- **AND** the sub-item is highlighted and the Settings group is expanded
+
+#### Scenario: Legacy playground route redirects
+
+- **WHEN** the user navigates to `/<projectId>/testing/playground`
+- **THEN** they are redirected to `/<projectId>/agent`
+
+#### Scenario: Legacy test agents route redirects
+
+- **WHEN** the user navigates to `/<projectId>/testing/agents`
+- **THEN** they are redirected to `/<projectId>/settings/agent`
diff --git a/openspec/changes/restructure-agent-platform/specs/home-dashboard/spec.md b/openspec/changes/restructure-agent-platform/specs/home-dashboard/spec.md
new file mode 100644
index 0000000..fc967bb
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/home-dashboard/spec.md
@@ -0,0 +1,29 @@
+## MODIFIED Requirements
+
+### Requirement: Dashboard Page
+
+The system SHALL render a project-scoped dashboard at `/$projectId` showing metric cards for each major feature area. Each card SHALL display the primary count prominently and link to the corresponding detail page.
+
+The dashboard SHALL display three metric cards followed by an MCP calls chart:
+1. **Data Connections** — executed query count (14d) as the primary value, with total connection count as a sub-stat; links to `/$projectId/connections`
+2. **Data Models** — total model count as the primary value, with sub-stats for total datasets and open improvement requests; links to `/$projectId/models` (card formerly labeled "Semantic Models")
+3. **MCP Access** — total MCP call count (14d) as the primary value, with sub-stats for token count and error calls; links to `/$projectId/mcp-access`
+4. **MCP Calls Chart** — a full-width area chart showing calls and errors per day over the last 14 days, using CI colors (sage for calls, purple for errors). When there is no data, a centered empty-state message is shown instead.
+
+All three metric cards SHALL use the CI color palette for their icon styling: sage (`#8c987f`) for the icon stroke and blue (`#c2d0e4`) at reduced opacity for the icon background.
+
+#### Scenario: Dashboard with populated data
+
+- **WHEN** an authenticated user navigates to `/$projectId`
+- **THEN** three metric cards are displayed with current counts
+- **AND** each card is clickable, navigating to its detail page
+
+#### Scenario: Data Models card label
+
+- **WHEN** the dashboard renders the model metrics card
+- **THEN** the card is labeled "Data Models" and links to `/$projectId/models`
+
+#### Scenario: Dashboard loading state
+
+- **WHEN** the dashboard stats are being fetched
+- **THEN** skeleton placeholders are shown in place of the metric cards
diff --git a/openspec/changes/restructure-agent-platform/specs/project-management/spec.md b/openspec/changes/restructure-agent-platform/specs/project-management/spec.md
new file mode 100644
index 0000000..c01f68d
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/project-management/spec.md
@@ -0,0 +1,181 @@
+## MODIFIED Requirements
+
+### Requirement: Project Model
+
+The system SHALL store projects in MongoDB with the following fields: `title` (string, required), `slug` (string, required, matching `/^[a-z0-9][a-z0-9-]*[a-z0-9]$/`), `description` (string, default empty), `mcpPageSize` (number, default 50, min 10, max 200), `github` (optional subdocument: `url` (string, required — full HTTPS URL of the GitHub repository, e.g. `https://github.com/owner/repo.git`), `branch` (string, default `"main"`), `encryptedToken` (string, required — AES-256-GCM encrypted GitHub PAT using `ENCRYPTION_KEY`)), `builderLlm` (optional subdocument: `baseUrl` (string, optional), `encryptedApiKey` (string, optional — AES-256-GCM encrypted when `ENCRYPTION_KEY` is set, plaintext otherwise), `model` (string, optional)), `agentLlm` (optional subdocument: `baseUrl` (string, required), `encryptedApiKey` (string, required — same encryption rules), `model` (string, required), `systemPrompt` (string, required)), `_schemaVersion` (number, default 0), `createdAt` (Date, auto), `updatedAt` (Date, auto). Slugs SHALL be unique among non-deleted projects and auto-generated from the title on creation.
+
+The `baseUrl` fields of `builderLlm` and `agentLlm` SHALL be validated with the same SSRF rules previously applied to test agents: `https://` required, no private/loopback/link-local IP targets (RFC 1918, `127.0.0.0/8`, `169.254.0.0/16`, `::1`, `fe80::/10`); `http://` accepted only for `localhost`/`127.0.0.1`.
+
+#### Scenario: Create a project
+
+- **WHEN** a project is created with title "Sales Analytics"
+- **THEN** a slug is generated as "sales-analytics"
+- **AND** the project is stored with default `mcpPageSize: 50`, no `github`, no `builderLlm`, and no `agentLlm`
+
+#### Scenario: Project with GitHub configured
+
+- **WHEN** a project has `github` set with `url: "https://github.com/myorg/semlayer-models.git"`, `branch: "main"`, and an encrypted PAT
+- **THEN** publish operations push to that repository
+- **AND** sync operations pull from that repository
+
+#### Scenario: Project with agent configured
+
+- **WHEN** a project's `agentLlm` is set with `baseUrl: "https://api.openai.com/v1"`, an encrypted API key, `model: "gpt-4o"`, and a system prompt
+- **THEN** the playground and test runs execute with this configuration
+
+#### Scenario: SSRF-unsafe base URL rejected
+
+- **WHEN** `builderLlm.baseUrl` or `agentLlm.baseUrl` is set to a private address (e.g. `http://169.254.169.254/`, `http://10.0.0.1/v1`)
+- **THEN** a 400 error is returned indicating the URL targets a restricted address
+
+### Requirement: Project Settings UI
+
+Project settings SHALL be presented as a settings group with three pages reachable from the sidebar Settings group: **General** (`/$projectId/settings`), **Builder** (`/$projectId/settings/builder`), and **Agent** (`/$projectId/settings/agent`).
+
+The **General** page SHALL display: a "Project Identity" card with title and slug fields, an "MCP Page Size" input (10–200, no spinner arrows), a "GitHub" card for upstream configuration, a "Publish History" card showing recent commits, and a "Danger Zone" card for project deletion.
+
+The "GitHub" card SHALL contain: a text input for the repository URL (placeholder: `https://github.com/owner/repo.git`), a password input for the Personal Access Token (masked, placeholder: `ghp_...`), a text input for the branch name (default: `main`), a "Save" button to persist the configuration, a "Remove" button to clear the GitHub configuration (shown only when configured), and a "Sync Now" button (shown only when configured) that triggers a pull/merge from the remote. The PAT SHALL be encrypted using `ENCRYPTION_KEY` before storage. The PAT input SHALL show a masked placeholder when a token is already stored (never expose the actual token).
+
+When the project does not yet have Git initialized (determined by `GET /api/projects/:projectId/git/status` returning `initialized: false`), the General page SHALL display a "Version Control" card with an informational message explaining that this project has not been migrated to Git versioning yet, and a "Initialize Git" button. Clicking the button SHALL call `POST /api/projects/:projectId/git/init`, show a success toast on completion, and replace the migration card with the normal GitHub and Publish History cards. While Git is not initialized, the GitHub card and Publish History card SHALL be hidden (they require a Git repo to function).
+
+The "Publish History" card SHALL display a list of recent commits from the local Git repository (fetched from `GET /api/projects/:projectId/git/log`). Each entry SHALL show the commit message and a human-readable relative timestamp (e.g. "2 hours ago"). The list SHALL show the most recent 10 commits. If the project has no commits yet, the card SHALL display a placeholder message such as "No publish history yet."
+
+#### Scenario: Configure GitHub
+
+- **WHEN** the user enters a repository URL, PAT, and branch in the GitHub card and clicks "Save"
+- **THEN** the PAT is encrypted and stored in `github.encryptedToken`
+- **AND** the URL and branch are stored in `github.url` and `github.branch`
+- **AND** a success toast is shown
+
+#### Scenario: Sync from settings
+
+- **WHEN** the user clicks "Sync Now" in the GitHub card
+- **THEN** the system pulls and merges upstream changes
+- **AND** on success, a toast shows the sync result
+- **AND** on conflict, a toast shows the conflicted file paths
+
+#### Scenario: Remove GitHub configuration
+
+- **WHEN** the user clicks "Remove" in the GitHub card
+- **THEN** the `github` subdocument is removed from the project
+- **AND** subsequent publishes only create local commits without pushing
+
+#### Scenario: Sync button disabled during operation
+
+- **WHEN** a sync operation is in progress
+- **THEN** the "Sync Now" button shows a loading state and is not clickable
+
+#### Scenario: View publish history
+
+- **WHEN** the user views the General settings page for a project with 5 commits
+- **THEN** the "Publish History" card lists all 5 commits with messages and relative timestamps
+- **AND** the most recent commit appears first
+
+#### Scenario: Empty publish history
+
+- **WHEN** the user views General settings for a project with no commits
+- **THEN** the "Publish History" card shows "No publish history yet."
+
+#### Scenario: Existing project without Git shows migration prompt
+
+- **WHEN** the user views General settings for a project that has not been migrated to Git
+- **THEN** a "Version Control" card is shown with an explanation and an "Initialize Git" button
+- **AND** the GitHub card and Publish History card are hidden
+
+#### Scenario: User migrates project to Git
+
+- **WHEN** the user clicks "Initialize Git" on the migration card
+- **THEN** the system initializes a Git repository with all existing files
+- **AND** a success toast is shown ("Git repository initialized")
+- **AND** the migration card is replaced by the GitHub and Publish History cards
+
+#### Scenario: Migration button shows loading state
+
+- **WHEN** the Git initialization is in progress
+- **THEN** the "Initialize Git" button shows a loading state and is not clickable
+
+#### Scenario: Navigate between settings pages
+
+- **WHEN** the user clicks the Builder or Agent sub-item in the sidebar Settings group
+- **THEN** the corresponding settings page (`/$projectId/settings/builder` or `/$projectId/settings/agent`) is rendered
+
+## ADDED Requirements
+
+### Requirement: LLM Settings API
+
+The API SHALL expose project-scoped LLM settings endpoints under `/api/projects/:projectId/llm-settings`:
+
+- `GET /builder` — returns the builder LLM settings with the API key masked (e.g. `sk-...****`) plus an `apiKeySet` boolean and the resolved effective configuration source per field (`project` or `env`)
+- `PUT /builder` — updates `builderLlm` (accepts `baseUrl`, `apiKey`, `model`, each optional; when `apiKey` is provided it is encrypted and replaces the stored key; clearing a field removes the project override)
+- `POST /builder/test-connection` — verifies connectivity against the **effective** builder configuration (project override merged with env fallback) by issuing a lightweight request to the configured endpoint
+- `GET /agent` — returns the agent settings (`baseUrl`, `model`, `systemPrompt`, masked API key, `apiKeySet`) or an unconfigured indicator
+- `PUT /agent` — creates/updates `agentLlm` (requires `baseUrl`, `model`, `systemPrompt`; `apiKey` required on first save, optional afterwards — omitting it preserves the stored key)
+- `POST /agent/test-connection` — verifies connectivity against the agent configuration
+
+API keys SHALL never be returned in plaintext. Base URLs SHALL be re-validated against the SSRF rules on every PUT and before every outbound test-connection request. All endpoints SHALL require admin session auth. The `/api/config` surface SHALL report per-project configuration state (`builderConfigured`, `agentConfigured`) so the frontend can gate chat inputs and run buttons.
+
+#### Scenario: Save agent settings
+
+- **WHEN** a PUT request to `/agent` provides `baseUrl`, `apiKey`, `model`, and `systemPrompt`
+- **THEN** the API key is encrypted and stored in `agentLlm.encryptedApiKey`
+- **AND** subsequent GETs return the key masked with `apiKeySet: true`
+
+#### Scenario: Update agent settings without changing the key
+
+- **WHEN** a PUT request to `/agent` updates `systemPrompt` without providing `apiKey`
+- **THEN** the existing encrypted API key is preserved
+
+#### Scenario: Builder test connection uses effective config
+
+- **WHEN** a POST request is made to `/builder/test-connection` for a project whose `builderLlm` sets only the model
+- **THEN** the connectivity test runs against the env base URL and API key with the project's model
+- **AND** a success or error result is returned
+
+#### Scenario: Test connection blocked for SSRF-unsafe URL
+
+- **WHEN** a test-connection request resolves a base URL to a private IP address
+- **THEN** a 400 error is returned without making the outbound HTTP request
+
+### Requirement: Builder LLM Settings Page
+
+The frontend SHALL provide a Builder settings page at `/$projectId/settings/builder` with a card containing inline label–input rows for: OpenAI-compatible base URL, API key (password input, masked placeholder when a key is stored), and model name. Each field SHALL indicate its env-default value as placeholder text when no project override is set. The page SHALL provide a "Save" button and a "Test Connection" button. Test Connection SHALL first persist unsaved changes, then call `POST /api/projects/:projectId/llm-settings/builder/test-connection`, showing a success or error toast. Buttons SHALL be disabled with a loading indicator while operations are in flight. A "Reset to defaults" action SHALL clear the project overrides so the env configuration applies again.
+
+#### Scenario: Override the builder model
+
+- **WHEN** the user enters a model name and clicks "Save"
+- **THEN** the project's `builderLlm.model` is persisted
+- **AND** subsequent builder conversations in this project use the overridden model
+
+#### Scenario: Test the builder connection
+
+- **WHEN** the user clicks "Test Connection"
+- **THEN** unsaved changes are persisted first
+- **AND** the connectivity test runs against the effective configuration
+- **AND** a success toast ("Connection is healthy") or error toast with the server message is shown
+
+#### Scenario: Reset to env defaults
+
+- **WHEN** the user activates "Reset to defaults"
+- **THEN** the `builderLlm` overrides are cleared
+- **AND** the fields show the env defaults as placeholders again
+
+### Requirement: Agent Settings Page
+
+The frontend SHALL provide an Agent settings page at `/$projectId/settings/agent` configuring the project's single agent. The page SHALL contain a card with inline label–input rows for: OpenAI-compatible base URL, API key (password input, masked placeholder when a key is stored), and model name; and a card with the agent's system prompt (textarea). The page SHALL provide "Save" and "Test Connection" buttons with the same save-then-test behavior, disabled states, and loading indicators as the Builder settings page. While the agent is unconfigured, the page SHALL display an informational note that the Agent playground and test runs are unavailable until configuration is saved.
+
+#### Scenario: Configure the agent for the first time
+
+- **WHEN** the user fills in base URL, API key, model, and system prompt and clicks "Save"
+- **THEN** the agent configuration is persisted with the key encrypted
+- **AND** the Agent playground and test-run controls become available
+
+#### Scenario: Test the agent connection
+
+- **WHEN** the user clicks "Test Connection" with valid saved or unsaved settings
+- **THEN** the settings are saved first and the connectivity test executes
+- **AND** a success or error indicator is shown
+
+#### Scenario: Editing preserves the stored key
+
+- **WHEN** the user edits the system prompt and saves without entering a new API key
+- **THEN** the existing encrypted key is preserved
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md
new file mode 100644
index 0000000..4927e6e
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md
@@ -0,0 +1,166 @@
+## REMOVED Requirements
+
+### Requirement: List Test Agents Tool
+
+**Reason**: Test agents no longer exist; the project has a single agent configured in Settings. The builder no longer needs to enumerate or assign agents.
+**Migration**: The tool is removed from the builder's tool map and the system prompt. `create_test_case` no longer accepts an agent reference.
+
+## MODIFIED Requirements
+
+### Requirement: LLM Provider Configuration
+
+The deep agent SHALL use an OpenAI-compatible API endpoint resolved per project. For each of the three settings — base URL, API key, and model — the effective value SHALL be the project's `builderLlm` value when set, otherwise the corresponding environment variable: `AGENT_API_BASE_URL` (defaults to `https://openrouter.ai/api/v1`), `AGENT_API_KEY`, and `AGENT_MODEL`. This allows using OpenRouter, direct OpenAI, Azure OpenAI, local Ollama, or any OpenAI-compatible provider, globally or per project.
+
+#### Scenario: Agent uses OpenRouter via env defaults
+
+- **WHEN** the project has no `builderLlm` settings and `AGENT_API_BASE_URL` is set to `https://openrouter.ai/api/v1` with a valid `AGENT_API_KEY`
+- **THEN** the agent sends LLM requests through OpenRouter
+- **AND** the model specified in `AGENT_MODEL` is used
+
+#### Scenario: Project settings override env
+
+- **WHEN** the project's `builderLlm` has `model: "gpt-5"` and an API key, while env vars configure a different model and key
+- **THEN** the builder agent for that project uses the project's model and key
+- **AND** other projects without `builderLlm` continue using the env configuration
+
+#### Scenario: Per-field fallback
+
+- **WHEN** the project's `builderLlm` sets only `model` and env vars provide the base URL and API key
+- **THEN** the agent uses the project's model with the env base URL and API key
+
+### Requirement: Agent Configuration Missing Banner
+
+The agent chat empty state (shown before any messages are sent) SHALL display an error banner when the backend reports that no LLM configuration is resolvable for the surface. Configuration state SHALL be reported per project: the builder is configured when an API key resolves from `Project.builderLlm` or `AGENT_API_KEY`; the project agent is configured when `Project.agentLlm` is set.
+
+The banner SHALL:
+
+- Appear in place of the default empty state description on the Builder chat (Build section) when the builder is unconfigured, and on the Agent playground page when the project agent is unconfigured
+- Use a warning/caution visual treatment (icon + tinted background) consistent with the application's design system
+- Explain that an AI provider configuration is required for the agent to work
+- Link to the relevant settings page: `/$projectId/settings/builder` for the builder, `/$projectId/settings/agent` for the agent — and mention the env-var fallback (`AGENT_API_KEY`, `AGENT_API_BASE_URL`, `AGENT_MODEL`) for the builder
+- Disable the chat input (prevent sending) while the surface is not configured
+
+The banner SHALL NOT appear once the corresponding configuration resolves.
+
+#### Scenario: Builder chat without any builder configuration
+
+- **WHEN** the user navigates to the Build chat
+- **AND** neither `Project.builderLlm` nor `AGENT_API_KEY` provides an API key
+- **THEN** the empty state displays a warning banner explaining that an API key is required
+- **AND** the banner links to `/$projectId/settings/builder` and mentions the env-var fallback
+- **AND** the chat input is disabled
+
+#### Scenario: Builder chat with project-level configuration only
+
+- **WHEN** `AGENT_API_KEY` is not set but the project's `builderLlm` contains an API key
+- **THEN** the default empty state is shown and the chat input is enabled
+
+#### Scenario: Agent playground without agent configuration
+
+- **WHEN** the user navigates to the Agent page
+- **AND** the project has no `agentLlm` configuration
+- **THEN** the empty state displays a warning banner linking to `/$projectId/settings/agent`
+- **AND** the chat input is disabled
+
+### Requirement: Create Test Case Tool
+
+The deep agent SHALL have access to a `create_test_case` tool that creates a test case document in MongoDB for the current project. The tool accepts `title` (string, required), `semanticModel` (string, required), `inputMessage` (string, required), and `expectedFacts` (array of strings, min 1). The tool SHALL NOT accept any test-agent reference — test cases always execute with the single project agent.
+
+The tool SHALL automatically add "auto-generated" to the test case's `tags` array so that auto-generated cases are distinguishable from manually created ones.
+
+The agent's system prompt SHALL document the tool and instruct the agent to only create test cases when the user explicitly provides ground-truth facts or expected answers. The agent SHALL NOT invent expected facts from its own data exploration or query results.
+
+#### Scenario: Agent creates a test case
+
+- **WHEN** the user provides ground-truth facts
+- **AND** the agent invokes `create_test_case` with `{ "title": "Total revenue 2024", "semanticModel": "ecommerce", "inputMessage": "What is the total revenue for 2024?", "expectedFacts": ["Total revenue for 2024 is 1.65 MEUR"] }`
+- **THEN** a TestCase document is created for the project
+- **AND** the `tags` array contains "auto-generated"
+
+#### Scenario: Agent does not create test cases without user-provided facts
+
+- **WHEN** the agent has finished writing a semantic model
+- **AND** the user has not provided any ground-truth facts or expected answers
+- **THEN** the agent SHALL NOT invoke `create_test_case` on its own
+- **AND** the agent MAY suggest creating test cases and ask the user to supply expected answers
+
+#### Scenario: Invalid input rejected
+
+- **WHEN** the agent invokes `create_test_case` with an empty `expectedFacts` array
+- **THEN** the tool returns an error indicating at least one expected fact is required
+- **AND** no TestCase document is created
+
+#### Scenario: Auto-generated tag always present
+
+- **WHEN** the agent invokes `create_test_case` for any test case
+- **THEN** the resulting TestCase always includes "auto-generated" in its `tags` array regardless of any other tags provided
+
+### Requirement: List Test Cases Tool
+
+The deep agent SHALL have access to a `list_test_cases` tool that returns existing test cases for the current project. The tool accepts an optional `semanticModel` parameter to filter results by model name. It SHALL return a JSON array of objects, each containing `id` (string), `title` (string), `semanticModel` (string), `inputMessage` (string), `expectedFactsCount` (number), and `tags` (array of strings). The agent's system prompt SHALL instruct the agent to call `list_test_cases` before creating new test cases to review existing coverage and avoid duplicates.
+
+#### Scenario: Agent lists test cases for a semantic model
+
+- **WHEN** the agent invokes `list_test_cases` with `{ "semanticModel": "ecommerce" }`
+- **AND** the project has three test cases for "ecommerce" and two for "hr"
+- **THEN** the tool returns a JSON array with only the three "ecommerce" test cases
+
+#### Scenario: Agent lists all test cases
+
+- **WHEN** the agent invokes `list_test_cases` without a `semanticModel` filter
+- **THEN** the tool returns all non-deleted test cases for the project
+
+#### Scenario: No test cases exist
+
+- **WHEN** the agent invokes `list_test_cases`
+- **AND** the project has no test cases
+- **THEN** the tool returns an empty JSON array
+
+### Requirement: Sidebar Model List
+
+The Builder side panel's **Data Models** entry (nested under the **Agent Scaffold** section) SHALL display semantic models as a flat, non-expandable list. Each entry shows the model name. Clicking a model name selects it and opens the visualization view in the main content area. The currently selected model SHALL be visually highlighted. Clicking the selected model again deselects it and returns to the chat view.
+
+#### Scenario: User clicks a model in the panel
+
+- **WHEN** the user clicks a model name under Agent Scaffold → Data Models
+- **THEN** the model is highlighted as selected
+- **AND** the main content area shows the visualization for that model
+
+#### Scenario: User deselects the active model
+
+- **WHEN** the user clicks the currently selected model name
+- **THEN** the model is deselected
+- **AND** the main content area returns to the chat message view
+
+#### Scenario: Models listed without expansion
+
+- **WHEN** the Data Models list is rendered
+- **THEN** each model is displayed as a single row with the model name and a database icon
+- **AND** no expand/collapse chevron or subtree is shown
+
+## ADDED Requirements
+
+### Requirement: Builder Side Panel Structure
+
+The Builder page (`/$projectId/models`) SHALL render its left side panel with three sections, in order:
+
+1. **Agent Scaffold** — containing two sub-entries: **Data Models** (the semantic model list, including the existing Publish control) and **API Models**, which SHALL be rendered greyed out with a "soon" tag and SHALL NOT be interactive.
+2. **Build** — the builder chat conversation history with the new-chat ("+") control (the section formerly labeled "Chat"; routes under `/$projectId/models/chat/*` are unchanged).
+3. **Improvements & Testing** — the improvements-and-failing-tests panel (see the `semantic-models` capability for its content requirements).
+
+#### Scenario: Panel sections rendered
+
+- **WHEN** the user opens the Builder page
+- **THEN** the side panel shows the sections Agent Scaffold, Build, and Improvements & Testing in that order
+- **AND** Agent Scaffold contains the Data Models list and a disabled API Models entry tagged "soon"
+
+#### Scenario: Build section behaves like the former Chat section
+
+- **WHEN** the user clicks "+" in the Build section or selects a past conversation
+- **THEN** the chat opens at `/$projectId/models/chat/new` or `/$projectId/models/chat/:conversationId` exactly as before the rename
+
+#### Scenario: API Models entry is inert
+
+- **WHEN** the user clicks the API Models entry
+- **THEN** nothing happens (no navigation, no selection)
+- **AND** the entry is visually muted with a "soon" tag
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
new file mode 100644
index 0000000..2574dcb
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
@@ -0,0 +1,52 @@
+## RENAMED Requirements
+
+- FROM: `### Requirement: Improvements UI in Semantic Models Sidebar`
+- TO: `### Requirement: Improvements & Testing Panel`
+
+## MODIFIED Requirements
+
+### Requirement: Improvements & Testing Panel
+
+The Builder page side panel SHALL include an **Improvements & Testing** accordion section (formerly "Improvement Requests") below the Build section. The section SHALL display two kinds of entries:
+
+1. **Improvement requests** — all improvement suggestions for the project. Each item SHALL show a lightbulb icon, the truncated title, and a checkmark overlay if the improvement has been implemented. Clicking an improvement SHALL navigate to its detail view in the main content area. Each improvement row SHALL show a trash icon on hover that soft-deletes the improvement when clicked, matching the conversation row delete pattern.
+2. **Failing tests** — the project's currently failing test cases, sourced from `GET /api/projects/:projectId/test-cases/latest-results` (entries with `latestStatus` of `failed` or `error`). Each item SHALL show a distinct test/alert icon and the truncated test case title. Clicking a failing-test entry SHALL navigate to the latest run's detail page (`/$projectId/testing/runs/:runId`). Each failing-test row SHALL additionally offer a refine affordance (wand icon on hover) that opens `/$projectId/models/chat/new` with a `prefill` prompt referencing the failing test case and its unmet facts so the builder can improve the model.
+
+The section header SHALL display a pending-count badge equal to the number of pending improvements plus the number of failing tests.
+
+#### Scenario: Panel shows pending improvements
+
+- **WHEN** the user views the Builder page and there are 3 pending improvements
+- **THEN** the "Improvements & Testing" section shows 3 improvement items with lightbulb icons and no checkmarks
+
+#### Scenario: Panel shows implemented improvements
+
+- **WHEN** an improvement has status `implemented`
+- **THEN** it appears in the panel with a checkmark icon overlay
+
+#### Scenario: Panel shows failing tests
+
+- **WHEN** two test cases have a latest run result of `failed` or `error`
+- **THEN** the section lists both as failing-test entries with a test/alert icon
+- **AND** the section header badge counts them together with pending improvements
+
+#### Scenario: Failing test navigates to run detail
+
+- **WHEN** the user clicks a failing-test entry
+- **THEN** the browser navigates to the test run detail page of the latest run containing that case
+
+#### Scenario: Refine a failing test from the panel
+
+- **WHEN** the user activates the refine affordance on a failing-test entry
+- **THEN** the Build chat opens at `/$projectId/models/chat/new` with a `prefill` prompt describing the failing test case and its unmet expected facts
+
+#### Scenario: Empty state
+
+- **WHEN** there are no improvements and no failing tests for the project
+- **THEN** the section shows a message indicating that improvement requests are submitted by MCP clients and failing tests appear after test runs
+
+#### Scenario: Delete improvement from panel
+
+- **WHEN** the user hovers over an improvement row and clicks the trash icon
+- **THEN** the improvement is soft-deleted via the API and removed from the list
+- **AND** if the deleted improvement was the active detail view, the user is navigated away
diff --git a/openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md b/openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md
new file mode 100644
index 0000000..209dcfa
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md
@@ -0,0 +1,475 @@
+## REMOVED Requirements
+
+### Requirement: Test Agent Model
+
+**Reason**: The platform no longer supports multiple test agents — each project has exactly one agent, configured as the `agentLlm` subdocument on the Project (see `project-management` deltas).
+**Migration**: A schema migration soft-deletes all existing `TestAgent` documents. The user reconfigures the single project agent manually under Settings → Agent. The SSRF validation and AES-256-GCM encryption rules carry over to the `agentLlm`/`builderLlm` subdocuments.
+
+### Requirement: Test Agent CRUD API
+
+**Reason**: The `/api/projects/:projectId/test-agents` endpoints are removed along with the model. Agent configuration is managed via the LLM Settings API (see `project-management` deltas).
+**Migration**: Frontend callers move to `GET/PUT /api/projects/:projectId/llm-settings/agent` and its test-connection endpoint.
+
+### Requirement: Testing UI — Test Agents Page
+
+**Reason**: With a single project agent there is no agent list to manage. The page is removed from the Testing group.
+**Migration**: `/$projectId/testing/agents` redirects to `/$projectId/settings/agent`, where the agent's LLM credentials, model, and system prompt are configured.
+
+## RENAMED Requirements
+
+- FROM: `### Requirement: Testing UI — Playground Page`
+- TO: `### Requirement: Agent Playground Page`
+
+## MODIFIED Requirements
+
+### Requirement: Agent Playground Page
+
+The frontend SHALL provide the Agent page at `/$projectId/agent` with a chat interface for conversing with the project agent. There SHALL be no agent selector — the page always uses the single project agent. The chat interface SHALL reuse the existing chat components (`AgentChat`, `ToolCallCard`, `ChatInput`, `MarkdownContent`) adapted to work with playground conversations. A history panel SHALL show past playground conversations for the project. Tool calls (list_semantic_models, get_semantic_model_overview, get_dataset_fields, execute_query) SHALL be rendered with the same card-based visualization as the semantic model builder. The playground conversation list API response SHALL include an `isStreaming` boolean per item. The history panel SHALL display an animated spinner icon instead of the static message icon for conversations that are actively streaming, matching the behavior of the Builder chat sidebar.
+
+When the project agent is not configured, the page SHALL display an empty state explaining that the agent must be configured first, with a link/button to `/$projectId/settings/agent`, and the chat input SHALL be disabled.
+
+The former route `/$projectId/testing/playground` SHALL redirect to `/$projectId/agent`.
+
+#### Scenario: Chat with the project agent
+
+- **WHEN** the user opens `/$projectId/agent` with a configured agent
+- **THEN** past playground conversations are shown in the history panel
+- **AND** the user can start a new conversation or resume an existing one
+
+#### Scenario: Tool calls displayed in playground
+
+- **WHEN** the playground agent invokes `execute_query`
+- **THEN** the tool call card shows the SQL query with syntax highlighting and result table (same as semantic model builder)
+
+#### Scenario: Unconfigured agent empty state
+
+- **WHEN** the user opens `/$projectId/agent` and the project agent is not configured
+- **THEN** an empty state explains that the agent needs LLM credentials
+- **AND** a link navigates to `/$projectId/settings/agent`
+- **AND** the chat input is disabled
+
+#### Scenario: Active streaming conversation shown in history panel
+
+- **WHEN** a playground conversation has an active streaming session
+- **AND** the user views the history panel
+- **THEN** the entry for that conversation displays an animated spinning icon instead of the static message icon
+- **AND** the icon reverts to the static message icon once streaming completes and the next poll cycle refreshes the list
+
+### Requirement: Playground Chat
+
+The system SHALL provide an interactive playground chat where the user converses with **the project agent**. The playground agent SHALL be configured from the project's `agentLlm` settings (decrypted API key, base URL, model) and the configured system prompt, and SHALL have access to MCP-style tools scoped to **all of the project's semantic models**:
+
+- `list_semantic_models` — list available semantic models
+- `get_semantic_model_overview` — get model overview (datasets, relationships, metrics)
+- `get_dataset_fields` — get dataset fields with types, examples, and AI context
+- `execute_query` — run read-only SQL queries via scoped DuckDB VIEWs
+
+The tools SHALL read from the current development state of semantic models (YAML files on disk), not from any published snapshot. Playground conversations SHALL be persisted in the existing `Conversation` model and identified by a `playground: true` flag; legacy conversations referencing a deleted `testAgent` SHALL remain readable. Playground interactions SHALL NOT be logged to `McpCallLog`.
+
+When the project agent is not configured (no `agentLlm` settings), the playground chat endpoint SHALL reject messages with a 400 error indicating that the agent must be configured under Settings → Agent.
+
+#### Scenario: Start a playground conversation
+
+- **WHEN** the user sends a message in the playground and the project agent is configured
+- **THEN** a new Conversation is created with `playground: true`
+- **AND** the agent is initialized with the project agent's LLM config and MCP-style tools
+- **AND** the response streams via SSE using the same protocol as the semantic model builder
+
+#### Scenario: Playground agent queries a semantic model
+
+- **WHEN** the playground agent invokes `execute_query` with a model name and SQL
+- **THEN** scoped VIEWs are created for the model's datasets (same pattern as MCP server)
+- **AND** the query executes against the project's DuckDB instance
+- **AND** results are returned to the agent
+
+#### Scenario: Playground conversations excluded from access log
+
+- **WHEN** the playground agent executes MCP-style tools
+- **THEN** no entries are created in `McpCallLog`
+
+#### Scenario: Resume a playground conversation
+
+- **WHEN** the user selects a past playground conversation from the history
+- **THEN** the conversation is loaded with full message and tool call history
+- **AND** subsequent messages continue in the same conversation context
+
+#### Scenario: Playground reuses chat UI components
+
+- **WHEN** the playground renders a conversation
+- **THEN** messages, tool call cards, markdown rendering, and streaming indicators use the same components as the semantic model builder chat (`agent-chat`, `tool-call-card`, `chat-input`, `markdown-components`)
+
+#### Scenario: Playground blocked while agent unconfigured
+
+- **WHEN** the user sends a playground message and the project has no `agentLlm` configuration
+- **THEN** the API responds with a 400 error stating the agent must be configured under Settings → Agent
+
+### Requirement: Test Case Model
+
+The system SHALL provide a `TestCase` Mongoose model with the following fields: `title` (string, required), `project` (ObjectId ref to Project, required, indexed), `semanticModel` (string, required), `inputMessage` (string, required), `expectedFacts` (array of strings, required, min 1), `tags` (array of strings, default empty, normalized to lowercase, trimmed), `maxToolCalls` (number, optional), `deleted` (boolean, default false), `deletedAt` (Date, optional), `createdAt` (Date), `updatedAt` (Date). The model SHALL use the shared soft-delete plugin. The model SHALL NOT have a `testAgent` reference — test cases always execute with the single project agent.
+
+#### Scenario: Create a test case
+
+- **WHEN** a TestCase is created with `title: "Revenue 2025"`, `semanticModel: "ecommerce"`, `inputMessage: "What's the revenue for 2025?"`, `expectedFacts: ["Revenue is 1.65 MEUR"]`
+- **THEN** the test case is persisted in MongoDB
+
+#### Scenario: Create a test case with multiple expected facts
+
+- **WHEN** a TestCase is created with `expectedFacts: ["Revenue is 1.65 MEUR", "Growth rate is 12%", "Top market is Germany"]`
+- **THEN** all three facts are stored and each will be individually evaluated during a test run
+
+#### Scenario: Legacy testAgent reference removed by migration
+
+- **WHEN** the drop-test-agents schema migration runs against a database containing test cases with a `testAgent` reference
+- **THEN** the `testAgent` field is unset on all test cases
+- **AND** the test cases remain otherwise unchanged and eligible for runs with the project agent
+
+### Requirement: Test Case CRUD API
+
+The API SHALL expose CRUD endpoints for test cases at `/api/projects/:projectId/test-cases`:
+
+- `GET /` — List all non-deleted test cases for the project (supports filtering by `semanticModel` and `tags` query parameters)
+- `POST /` — Create a new test case (accepts title, semanticModel, inputMessage, expectedFacts, tags, maxToolCalls)
+- `PUT /:caseId` — Update an existing test case
+- `DELETE /:caseId` — Soft-delete a test case
+
+The endpoints SHALL NOT accept or filter by a test agent reference. All endpoints SHALL require admin session auth.
+
+#### Scenario: List test cases for a project
+
+- **WHEN** a GET request is made to `/api/projects/:projectId/test-cases`
+- **THEN** all non-deleted test cases are returned with title, semanticModel, inputMessage, expectedFacts, tags, and timestamps
+
+#### Scenario: Create a test case
+
+- **WHEN** a POST request creates a test case with valid fields
+- **THEN** the test case is created and the response includes all test case fields
+
+#### Scenario: Delete a test case
+
+- **WHEN** a DELETE request is made for a test case
+- **THEN** the test case is soft-deleted and no longer appears in list queries
+
+### Requirement: Test Run Model
+
+The system SHALL provide a `TestRun` Mongoose model representing a batch execution of test cases. Fields: `project` (ObjectId ref to Project, required), `llmModel` (string — snapshot of the project agent's model identifier at run start), `testAgent` (ObjectId, optional — legacy field retained so historical runs remain readable; not set on new runs), `status` (enum: `pending`, `running`, `completed`, `failed`, `cancelled`, required), `cases` (array of embedded results), `startedAt` (Date), `completedAt` (Date), `createdAt` (Date), `updatedAt` (Date). Each embedded case result SHALL contain: `testCase` (ObjectId ref to TestCase), `title` (string — snapshot of test case title), `semanticModel` (string), `inputMessage` (string), `expectedFacts` (array of strings), `maxToolCalls` (number, optional — snapshot of the limit at execution time), `status` (enum: `pending`, `running`, `passed`, `failed`, `error`, `cancelled`), `agentResponse` (string — the agent's final text response), `toolCalls` (array of tool call records), `factResults` (array of `{ fact: string, passed: boolean, reasoning: string }`), `durationMs` (number), `errorMessage` (string, optional).
+
+#### Scenario: Create a test run
+
+- **WHEN** a batch run is initiated with a set of test cases and the project agent is configured
+- **THEN** a TestRun document is created with `status: "pending"`, `llmModel` snapshotted from the project agent config, and each case embedded with `status: "pending"`
+- **AND** `maxToolCalls` is snapshotted from each test case into the embedded result
+
+#### Scenario: Test run completes successfully
+
+- **WHEN** all test cases in a run finish processing
+- **THEN** the TestRun status is set to `completed`
+- **AND** `completedAt` is set
+
+#### Scenario: Individual test case passes
+
+- **WHEN** the judge evaluates the agent's response against the expected facts
+- **AND** all facts are satisfied
+- **THEN** the case result status is `passed` and each factResult has `passed: true`
+
+#### Scenario: Individual test case fails
+
+- **WHEN** the judge evaluates the agent's response and one or more expected facts are not satisfied
+- **THEN** the case result status is `failed`
+- **AND** the `factResults` array shows which facts passed and which did not, with reasoning
+
+#### Scenario: Test run is cancelled
+
+- **WHEN** the user cancels a running or pending test run
+- **THEN** the TestRun status is set to `cancelled` and `completedAt` is set
+- **AND** all cases still in `pending` status are marked as `cancelled`
+- **AND** any in-flight cases (`running` status) are aborted and marked as `cancelled` with partial results preserved
+
+#### Scenario: Historical run with legacy agent reference remains readable
+
+- **WHEN** a TestRun created before the single-agent migration is fetched
+- **THEN** the run and its embedded case results are returned unchanged
+- **AND** the legacy `testAgent` reference does not block reading or deleting the run
+
+### Requirement: Test Run Batch Execution
+
+The system SHALL execute test runs via a dedicated `test-runs` BullMQ queue processed by the worker (`apps/worker/`). When a batch run is initiated:
+
+1. The API verifies the project agent is configured (`agentLlm` present); otherwise it rejects with 400
+2. The API creates a `TestRun` document and enqueues one job per test case on the `test-runs` queue
+3. Each job creates a playground-style agent with the project agent's LLM config, scoped to the test case's semantic model
+4. The agent processes the input message and produces a response with tool calls
+5. If `maxToolCalls` is set and the agent exceeds the limit, the runner aborts the agent loop and marks the case as `error` with message "Exceeded max tool calls (N)"
+6. A judge LLM call evaluates each expected fact against the agent's response
+7. The case result (status, response, tool calls, fact results, duration) is written to the `TestRun` document
+8. When all cases complete, the `TestRun` status is updated to `completed`
+
+The batch execution SHALL integrate with the existing worker infrastructure: same Redis connection, same graceful shutdown handling, same stalled job detection. When Redis is not configured, the API SHALL fall back to in-process execution (sequential).
+
+When a test run is cancelled, the system SHALL cooperatively abort in-flight test cases. In the Redis/worker path, the worker SHALL subscribe to a per-test-run cancel channel and abort the agent stream via `AbortController`. Queued BullMQ jobs that have not started SHALL be removed from the queue. In the in-process (no Redis) path, the sequential loop SHALL check a cancellation flag between cases and skip remaining ones.
+
+#### Scenario: Batch run enqueues jobs via worker
+
+- **GIVEN** Redis is configured, the worker is running, and the project agent is configured
+- **WHEN** a batch run is initiated with 5 test cases
+- **THEN** 5 jobs are enqueued on the `test-runs` queue
+- **AND** the worker processes them concurrently (up to `WORKER_CONCURRENCY`)
+
+#### Scenario: Batch run rejected while agent unconfigured
+
+- **WHEN** a batch run is initiated for a project without `agentLlm` configuration
+- **THEN** the API responds with 400 and an error directing the user to configure the agent under Settings → Agent
+- **AND** no TestRun document is created
+
+#### Scenario: Batch run without Redis falls back to in-process
+
+- **GIVEN** `REDIS_URL` is not set
+- **WHEN** a batch run is initiated
+- **THEN** test cases are executed sequentially in the API process
+
+#### Scenario: Test case execution error
+
+- **WHEN** the agent pipeline fails for a test case (LLM error, timeout, etc.)
+- **THEN** the case result status is set to `error` with the error message
+- **AND** remaining test cases continue processing
+
+#### Scenario: Fact evaluation via LLM judge
+
+- **GIVEN** a test case with `expectedFacts: ["Revenue is 1.65 MEUR", "Growth rate is 12%"]`
+- **WHEN** the agent responds with "The total revenue for 2025 was approximately 1.65 million EUR, representing a year-over-year growth of 12%."
+- **THEN** the judge returns `[{ fact: "Revenue is 1.65 MEUR", passed: true, reasoning: "..." }, { fact: "Growth rate is 12%", passed: true, reasoning: "..." }]`
+
+#### Scenario: Max tool calls exceeded
+
+- **GIVEN** a test case with `maxToolCalls: 3`
+- **WHEN** the agent invokes a 4th tool call during execution
+- **THEN** the runner aborts the agent loop
+- **AND** the case result status is set to `error` with errorMessage "Exceeded max tool calls (3)"
+- **AND** the partial agent response and tool calls up to that point are preserved
+
+#### Scenario: Cancel aborts in-flight worker jobs
+
+- **GIVEN** a test run with 10 cases, 3 currently running in the worker, 5 still queued
+- **WHEN** the cancel endpoint is called
+- **THEN** the 5 queued BullMQ jobs are removed from the queue
+- **AND** the 3 running cases receive an abort signal and terminate their LLM streams
+- **AND** all 8 non-completed cases are marked as `cancelled`
+- **AND** the 2 already-completed cases retain their original status
+
+#### Scenario: Cancel stops in-process sequential execution
+
+- **GIVEN** `REDIS_URL` is not set and a test run is executing in-process
+- **WHEN** the cancel endpoint is called while case 3 of 10 is running
+- **THEN** cases 4 through 10 are skipped
+- **AND** case 3 completes (or is marked `cancelled` if still in the agent stream)
+- **AND** the run status is set to `cancelled`
+
+### Requirement: Test Run API
+
+The API SHALL expose endpoints for managing test runs at `/api/projects/:projectId/test-runs`:
+
+- `GET /` -- List all test runs for the project with server-side pagination (`page`, `limit` query params returning `{ items, total, page, limit }`); each item is a summary: id, llmModel snapshot (or legacy agent name when present), case count, passed/failed/error counts, status, timestamps
+- `GET /:runId` -- Get a single test run with full case results (the full embedded cases array)
+- `POST /` -- Initiate a batch run (accepts `testCaseIds` array); rejects with 400 when the project agent is not configured; returns the new TestRun ID
+- `POST /:runId/cancel` -- Cancel a running or pending test run; marks remaining cases as `cancelled`, aborts in-flight executions, and sets the run status to `cancelled`
+- `DELETE /:runId` -- Delete a test run
+
+All endpoints SHALL require admin session auth.
+
+#### Scenario: List test runs with pagination
+
+- **WHEN** a GET request is made to `/api/projects/:projectId/test-runs?page=1&limit=25`
+- **THEN** up to 25 test run summaries are returned along with `total`, `page`, and `limit` fields
+
+#### Scenario: Initiate a batch run
+
+- **WHEN** a POST request is made with an array of `testCaseIds` and the project agent is configured
+- **THEN** a TestRun is created and jobs are enqueued
+- **AND** the response returns the TestRun ID and status `running`
+
+#### Scenario: Initiate rejected while agent unconfigured
+
+- **WHEN** a POST request is made with `testCaseIds` for a project without agent configuration
+- **THEN** a 400 error is returned directing the user to Settings → Agent
+
+#### Scenario: Poll test run progress
+
+- **WHEN** a GET request is made for a running test run
+- **THEN** the response includes the current status of each case (pending, running, passed, failed, error)
+- **AND** completed cases include their full results
+
+#### Scenario: Cancel a running test run
+
+- **WHEN** a POST request is made to `/:runId/cancel` for a run with status `running`
+- **THEN** the run status is set to `cancelled` and `completedAt` is set
+- **AND** all `pending` cases are marked as `cancelled`
+- **AND** in-flight cases are signaled for abort
+- **AND** the response returns `{ ok: true }`
+
+#### Scenario: Cancel a non-active test run
+
+- **WHEN** a POST request is made to `/:runId/cancel` for a run with status `completed`, `failed`, or `cancelled`
+- **THEN** a 400 error is returned indicating the run is not active
+
+### Requirement: Testing UI — Test Cases Page
+
+The frontend SHALL provide a Test Cases page at `/$projectId/testing/cases` displaying a paginated table of all test cases with columns: title, model (badge), input message (truncated), tags (badges), expected facts count, and actions (edit, delete). A "Create Test Case" button SHALL open a form dialog with fields: title, semantic model, input message, expected facts (dynamic list), tags (chip input), and max tool calls (optional number). A "Run Batch" button SHALL open a dialog with model/tag filter controls and a live count of matching cases; on confirm, a batch run is initiated and the user is navigated to the test run detail page.
+
+There SHALL be no test-agent column, selector, or filter anywhere on the page — all executions use the project agent. Filter controls above the table SHALL allow filtering by semantic model and tags. Filtering is server-side via query params.
+
+The test case form dialog SHALL include a "Run Test" button in both create and edit mode. When clicked, the dialog SHALL first save the test case (create via POST or update via PUT), then initiate a single-case test run via the existing test-runs API (`POST /api/projects/:projectId/test-runs` with the saved case's ID). On successful run creation, the dialog SHALL close and the user SHALL be navigated to the test run detail page (`/$projectId/testing/runs/:runId`) to view live results. The "Run Test" and "Run Batch" buttons SHALL be disabled when the project agent is not configured (with a cue pointing to Settings → Agent) and while a save or run operation is in progress.
+
+#### Scenario: Create a test case via UI
+
+- **WHEN** the user fills in the create form with title, model, input, at least one expected fact, optional tags, and optional max tool calls
+- **THEN** the test case is created and appears in the table
+
+#### Scenario: Filter test cases by tag
+
+- **WHEN** the user selects a tag in the filter bar
+- **THEN** only test cases with that tag are shown
+- **AND** the page resets to page 1
+
+#### Scenario: Run a batch from the test cases page
+
+- **WHEN** the user clicks "Run Batch", optionally adjusts model/tag filters in the dialog, and confirms
+- **THEN** a test run is initiated via the API with the matching case IDs
+- **AND** the user is navigated to `/$projectId/testing/runs/:runId` showing live progress
+
+#### Scenario: Run test during test case creation
+
+- **WHEN** the user clicks "Run Test" while creating a new test case with all required fields and the project agent is configured
+- **THEN** the test case is saved via POST first
+- **AND** a single-case test run is initiated via the test-runs API
+- **AND** the dialog closes and the user is navigated to `/$projectId/testing/runs/:runId`
+
+#### Scenario: Run test during test case editing
+
+- **WHEN** the user clicks "Run Test" while editing an existing test case
+- **THEN** the test case is updated via PUT first
+- **AND** a single-case test run is initiated via the test-runs API
+- **AND** the dialog closes and the user is navigated to `/$projectId/testing/runs/:runId`
+
+#### Scenario: Run controls disabled while agent unconfigured
+
+- **WHEN** the project agent is not configured
+- **THEN** the "Run Test" and "Run Batch" buttons are disabled
+- **AND** a tooltip or visual cue directs the user to Settings → Agent
+
+#### Scenario: Run test button disabled during operation
+
+- **WHEN** a save or run operation is in progress
+- **THEN** the "Run Test" button is disabled and shows a loading spinner
+
+### Requirement: Testing UI — Test Runs List Page
+
+The frontend SHALL provide a Test Runs page at `/$projectId/testing/runs` displaying a server-side-paginated table of all test runs for the project. Columns: status icon, model (the run's `llmModel` snapshot, or the legacy agent name for pre-migration runs), case count, result summary (passed/failed/errors as badges), date. Each row links to the run detail page at `/$projectId/testing/runs/:runId`. The page auto-refreshes while any run is in `pending` or `running` status.
+
+The `cancelled` status SHALL be displayed with a neutral grey icon (ban/slash) consistent with the detail page styling.
+
+#### Scenario: View test runs list
+
+- **WHEN** the user navigates to `/$projectId/testing/runs`
+- **THEN** a paginated table of test runs is displayed with status, model, case count, pass/fail/error counts, and timestamps
+
+#### Scenario: Navigate to run detail
+
+- **WHEN** the user clicks a test run row
+- **THEN** the browser navigates to `/$projectId/testing/runs/:runId`
+
+#### Scenario: Empty state
+
+- **WHEN** no test runs exist for the project
+- **THEN** an empty state message is shown prompting the user to run a test batch from the Test Cases page
+
+#### Scenario: Cancelled run displayed in list
+
+- **WHEN** a cancelled test run exists
+- **THEN** it is displayed with a neutral grey ban icon and the status text "Cancelled"
+
+### Requirement: Testing UI — Test Run Detail Page
+
+The frontend SHALL provide a Test Run Detail page at `/$projectId/testing/runs/:runId` showing run metadata (model snapshot — or legacy agent name for pre-migration runs — status, started/completed timestamps, overall pass/fail counts) and a client-side-paginated list of case results. Each case result shows: status icon, title, semantic model badge, input message, agent response (expandable), tool calls (expandable), fact results with pass/fail icons and reasoning, duration, and error message if applicable. The page auto-refreshes (polls every 3 seconds) while the run status is `pending` or `running`. A back link returns to the Test Runs list.
+
+A "Cancel Run" button SHALL be displayed in the page header next to the status badge when the run status is `pending` or `running`. Clicking the button SHALL call `POST /api/projects/:projectId/test-runs/:runId/cancel`. On success, the button SHALL disappear and the status badge SHALL update to `cancelled`. The button SHALL be disabled while the cancel request is in progress and SHALL display a loading indicator.
+
+The `cancelled` run status SHALL be rendered as a grey/neutral badge with text "Cancelled". Individual cases with `cancelled` status SHALL display a ban/slash icon in neutral grey.
+
+A "Refine" button SHALL appear for all completed test cases (passed, failed, or error). The "Refine" button SHALL navigate to `/$projectId/models/chat/new` with a `prefill` search parameter containing a prompt focused on model efficiency: improving ai_context descriptions, simplifying naming, adding missing relationships, or reorganizing structure so the agent can answer with fewer tool calls. The "Refine" button SHALL use a wand icon and an outline visual style. The button SHALL appear within the expanded case card, below the tabs section.
+
+#### Scenario: View completed run detail
+
+- **WHEN** the user navigates to a completed test run detail page
+- **THEN** run metadata is shown at the top (model, status, timestamps, pass/fail summary)
+- **AND** case results are listed below with client-side pagination
+- **AND** each case shows its full status, response, tool calls, and fact evaluations
+
+#### Scenario: View in-progress run detail
+
+- **WHEN** the user navigates to a running test run detail page
+- **THEN** the page polls for updates every 3 seconds
+- **AND** case results update in real-time as they complete (pending -> running -> passed/failed/error)
+- **AND** the overall status badge updates when the run completes
+
+#### Scenario: Cancel a running test run from the UI
+
+- **WHEN** the user clicks the "Cancel Run" button on the detail page while the run is `running`
+- **THEN** a POST request is sent to `/:runId/cancel`
+- **AND** the button shows a loading state during the request
+- **AND** on success, the status badge updates to "Cancelled" and the cancel button disappears
+- **AND** the page stops polling
+
+#### Scenario: Cancel button not shown for terminal states
+
+- **WHEN** the user views a test run with status `completed`, `failed`, or `cancelled`
+- **THEN** the "Cancel Run" button is not displayed
+
+#### Scenario: Cancelled cases displayed correctly
+
+- **WHEN** a cancelled test run is viewed
+- **THEN** cases that were completed before cancellation show their original status (passed/failed/error)
+- **AND** cases that were cancelled show a neutral ban icon and "cancelled" status
+- **AND** the summary counts reflect the actual distribution
+
+#### Scenario: Paginate case results
+
+- **WHEN** the run has more than 25 case results
+- **THEN** pagination controls are shown below the case list
+- **AND** the user can navigate between pages of results
+
+#### Scenario: Navigate back to runs list
+
+- **WHEN** the user clicks the back link on the detail page
+- **THEN** the browser navigates to `/$projectId/testing/runs`
+
+#### Scenario: Refine model via chat for any completed case
+
+- **WHEN** a test case has status `passed`, `failed`, or `error`
+- **AND** the user expands the case result card
+- **THEN** a "Refine" button is displayed below the tabs section
+- **AND** clicking the button navigates to `/$projectId/models/chat/new?prefill=<prompt>`
+- **AND** the prefill prompt focuses on improving the semantic model's navigability: ai_context, naming, relationships, and structure to reduce tool calls
+
+#### Scenario: No action buttons for pending, running, or cancelled cases
+
+- **WHEN** a test case has status `pending`, `running`, or `cancelled`
+- **THEN** no "Refine" button is displayed in the expanded detail view
+
+## ADDED Requirements
+
+### Requirement: Latest Test Case Results API
+
+The API SHALL expose `GET /api/projects/:projectId/test-cases/latest-results` returning, for every non-deleted test case of the project, the most recent embedded run result (if any). Each item SHALL include: `testCaseId`, `title`, `semanticModel`, `latestStatus` (`passed` | `failed` | `error` | `cancelled` | `running` | `pending` | `never_run`), `runId` (the TestRun containing the latest result, when present), and `finishedAt` (when present). The latest result SHALL be determined by the most recent TestRun (by `createdAt`) that contains the test case. The endpoint SHALL require admin session auth.
+
+A test case SHALL be considered **failing** when its `latestStatus` is `failed` or `error`. This endpoint powers the failing-tests section of the Builder's "Improvements & Testing" panel.
+
+#### Scenario: Latest results across runs
+
+- **GIVEN** test case A passed in run 1 and failed in run 2 (run 2 is newer), and test case B has never been run
+- **WHEN** a GET request is made to `/api/projects/:projectId/test-cases/latest-results`
+- **THEN** the response lists A with `latestStatus: "failed"` and the `runId` of run 2
+- **AND** B with `latestStatus: "never_run"` and no `runId`
+
+#### Scenario: Unauthenticated request
+
+- **WHEN** the request lacks a valid admin session
+- **THEN** a 401 error is returned
diff --git a/openspec/changes/restructure-agent-platform/tasks.md b/openspec/changes/restructure-agent-platform/tasks.md
new file mode 100644
index 0000000..8c5f433
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/tasks.md
@@ -0,0 +1,70 @@
+# Tasks — restructure-agent-platform
+
+> Sequencing note: coordinate with the active `add-llm-prompt-caching` change — both edit `packages/core/src/services/agent.ts` and `playground-agent.ts`. Land one before starting the other's section 3/4 work.
+
+## 1. Data model & migration
+
+- [ ] 1.1 Extend `Project` model with `builderLlm` and `agentLlm` subdocuments (encryption via `ENCRYPTION_KEY`, SSRF base-URL validation lifted from `TestAgent`); unit tests for validation and encryption
+- [ ] 1.2 Remove `testAgent` from `TestCase`; make `testAgent` optional/legacy on `TestRun` and add `llmModel` snapshot field; add `playground` flag to `Conversation`
+- [ ] 1.3 Schema migration `00X-drop-test-agents`: soft-delete all `TestAgent` docs, `$unset TestCase.testAgent`, set `playground: true` on conversations that have a `testAgent` reference; migration test
+- [ ] 1.4 Delete `packages/core/src/models/TestAgent.ts` and its exports/usages
+
+## 2. LLM settings API & resolution
+
+- [ ] 2.1 New `apps/api/src/routes/llm-settings.ts`: GET/PUT `/builder`, GET/PUT `/agent`, POST `/{builder,agent}/test-connection` (masked keys, re-encrypt on new key, SSRF re-validation); integration tests
+- [ ] 2.2 Builder LLM resolution helper in core: per-field `Project.builderLlm` → env (`AGENT_API_BASE_URL`/`AGENT_API_KEY`/`AGENT_MODEL`); wire into `createSemlayerAgent(projectId)` and title agent base config
+- [ ] 2.3 Update `/api/config` (or project payload) to expose per-project `builderConfigured` / `agentConfigured`
+- [ ] 2.4 Remove `apps/api/src/routes/test-agents.ts` and its app mount
+
+## 3. Single project agent (playground & test harness)
+
+- [ ] 3.1 Rework `createPlaygroundAgent` to take `projectId` and read `Project.agentLlm` (config, system prompt, all-models scope); 400 path when unconfigured
+- [ ] 3.2 Update playground routes (`chat`, `cancel`, `subscribe`, conversation list) to drop `testAgentId`; persist conversations with `playground: true`
+- [ ] 3.3 Update `apps/worker/src/processor.ts` branching (playground flag instead of `testAgentId`) and `test-runner.ts` to use the project agent; snapshot `llmModel` onto new runs
+- [ ] 3.4 Test-runs API: reject `POST /` with 400 when `agentLlm` is unconfigured; list/detail payloads expose `llmModel` (legacy agent name fallback); update tests
+- [ ] 3.5 Builder agent tools: remove `list_test_agents`, drop `testAgentId` from `create_test_case`, drop `testAgent` from `list_test_cases` output; update `semantic-model-agent.md` prompt accordingly
+
+## 4. Agent scaffold
+
+- [ ] 4.1 `.mcp.json` seeding service: create on project creation, recreate-if-missing on builder start, update on slug change, preserve foreign entries; placeholder token only; unit tests
+- [ ] 4.2 JSON syntax validation on `write_file`/`edit_file` for `.json` paths in the builder filesystem backend (mirroring YAML validation); tests
+- [ ] 4.3 Extend the builder system prompt with the scaffold layout and skills-over-commands guidance
+- [ ] 4.4 `GET /api/projects/:projectId/scaffold/export` zip endpoint with exclusion denylist (`.git/`, `large_tool_results/`, `uploads/`, `duckdb.db*`, temp files); integration test asserting exclusions and absence of secret material
+- [ ] 4.5 Ensure scaffold directories are covered by Git versioning (review `git.ts` ignore rules) and publish flow
+
+## 5. Navigation & settings UI
+
+- [ ] 5.1 Restructure `app-sidebar.tsx`: Connections group (Data Sources + disabled APIs w/ "soon" tag), Builder leaf, Agent leaf, Testing group (Cases, Runs), Settings group (General, Builder, Agent)
+- [ ] 5.2 New routes `settings/builder.tsx` and `settings/agent.tsx` (inline label–input grids, masked key fields, env-default placeholders for builder, save-then-test connection buttons, reset-to-defaults for builder); move existing settings page to General
+- [ ] 5.3 Route moves & redirects: `testing/playground → /agent`, `testing/agents → /settings/agent`, `connections/data` + legacy `/data` → `/connections?tool=browser`, `connections/console → /connections?tool=console`; delete the Test Agents page and `testing/agents.tsx`
+
+## 6. Data Sources header tools
+
+- [ ] 6.1 Header controls on `connections/index.tsx`: Browser (icon+text), Console (icon-only), Re-initialize schemas (icon-only with tooltip); `tool` search param handling
+- [ ] 6.2 Full-width/full-height overlay dialogs (shadow, `bg-popover`) hosting the existing data-browser and console components; deep-link open on `?tool=`, param cleared on close
+
+## 7. Builder page restructure
+
+- [ ] 7.1 Side panel sections: Agent Scaffold (Data Models list + Publish + Export action; disabled API Models w/ "soon" tag), Build (renamed Chat), Improvements & Testing
+- [ ] 7.2 `GET /api/projects/:projectId/test-cases/latest-results` endpoint with tests
+- [ ] 7.3 Improvements & Testing panel: failing-test entries (icon, link to latest run, refine prefill affordance), combined pending-count badge, updated empty state
+- [ ] 7.4 Agent page (`agent.tsx`): playground chat without agent selector, history panel, unconfigured empty state linking to Settings → Agent
+- [ ] 7.5 Update the configuration-missing banner to be project-aware and link to the settings pages
+
+## 8. Testing pages cleanup
+
+- [ ] 8.1 Test Cases page: remove agent column/filter/selector; gate Run Test / Run Batch on `agentConfigured` with pointer to Settings → Agent
+- [ ] 8.2 Test Runs list/detail: show model snapshot (legacy agent name fallback); verify Refine flow against renamed Build chat
+- [ ] 8.3 Dashboard: rename model card to "Data Models"
+
+## 9. Verification
+
+- [ ] 9.1 Update/extend unit & integration tests across touched routes and services; `npx vitest run` green
+- [ ] 9.2 Update e2e tests for new navigation, dialogs, settings pages, and single-agent flows
+- [ ] 9.3 `pnpm typecheck` and `pnpm lint` exit 0; `pnpm --filter @archmax/api build` passes
+
+## 10. Documentation
+
+- [ ] 10.1 Update `apps/docs`: navigation/screens in getting-started and guides, testing guide (single agent, Settings → Agent), configuration reference (per-project LLM settings, env fallback), data-federation guide (console/browser as dialogs)
+- [ ] 10.2 New docs guide: "Agent Scaffold" (layout, `.mcp.json`, export, harness usage with LangChain Deep Agents)
+- [ ] 10.3 Update `.env.example` comments to mention per-project overrides

From 1114814f3a4429b9a9bbfe1429225d5008fe9f91 Mon Sep 17 00:00:00 2001
From: Tobias Grosse-Puppendahl <tobias@grosse-puppendahl.de>
Date: Thu, 11 Jun 2026 10:24:28 +0200
Subject: [PATCH 2/5] docs(openspec): store data models under data_models/
 directory

Rename the on-disk semantic-model storage from src/ to data_models/ to
match the new "Data Models" product vocabulary and reserve room for a
future api_models/ sibling. Updates the agent-scaffold layout, the
SemanticModelFileService/Dataset path requirements, the Deep Agent
backend write path, the startup migration (src/ or root -> data_models/),
and design/proposal/tasks accordingly.

Co-authored-by: Cursor <cursoragent@cursor.com>
---
 .../restructure-agent-platform/design.md      |  15 +-
 .../restructure-agent-platform/proposal.md    |   6 +-
 .../specs/agent-scaffold/spec.md              |  16 +-
 .../specs/semantic-model-agent/spec.md        |  25 +++
 .../specs/semantic-models/spec.md             | 144 ++++++++++++++++++
 .../restructure-agent-platform/tasks.md       |   1 +
 6 files changed, 192 insertions(+), 15 deletions(-)

diff --git a/openspec/changes/restructure-agent-platform/design.md b/openspec/changes/restructure-agent-platform/design.md
index 17494ee..5ce2a28 100644
--- a/openspec/changes/restructure-agent-platform/design.md
+++ b/openspec/changes/restructure-agent-platform/design.md
@@ -36,11 +36,13 @@ Two optional subdocuments: `builderLlm { baseUrl?, encryptedApiKey?, model? }` a
 
 The builder keeps working out of the box via `AGENT_*` env vars; project values override field-by-field. The **agent** has no env fallback: it is the project's deliverable and its credentials are an explicit choice. Playground input and run-creation are blocked with a pointer to Settings → Agent until configured. This also makes the migration story honest (see D6).
 
-### D3 — Scaffold lives at the project-directory root
+### D3 — Scaffold lives at the project-directory root; data models move to `data_models/`
 
-The existing project dir (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) *is* the agent filesystem: model YAMLs and `AGENTS.md` already live there; `commands/`, `agents/`, `skills/`, `hooks/`, `scripts/`, `.mcp.json` join them. The Deep Agents `FilesystemBackend` already roots there, so the builder can author scaffold files with no new tooling. Export excludes internal entries (`.git/`, `large_tool_results/`, `uploads/`, `duckdb.db*`, temp files).
+The existing project dir (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) *is* the agent filesystem and remains the `FilesystemBackend` root, so the builder can author scaffold files with no new tooling. The scaffold entries (`commands/`, `agents/`, `skills/`, `hooks/`, `scripts/`, `.mcp.json`) and `AGENTS.md` live at the root. The semantic model YAML files move out of the current `src/` directory into a dedicated **`data_models/`** subdirectory, matching the new "Data Models" product vocabulary and reserving room for a future `api_models/` sibling. Export excludes internal entries (`.git/`, `large_tool_results/`, `uploads/`, `duckdb.db*`, temp files) and includes `data_models/` plus the scaffold dirs.
 
-- *Alternative considered:* a `scaffold/` subdirectory — rejected; it splits the agent filesystem in two ( YAML models would sit outside the scaffold) and complicates the Git story which already versions the whole project dir.
+- *Why rename `src/` → `data_models/`:* "src" is opaque and conflicts with the scaffold's plugin vocabulary; "Data Models" is the user-facing label in both the sidebar (Agent Scaffold → Data Models) and the dashboard card, so the on-disk directory should match. A sibling `api_models/` slot can later hold the "API Models (soon)" content without overloading `src/`.
+- *Alternative considered:* keeping `src/` on disk and only relabeling it "Data Models" in the UI — rejected; the path leaks into the agent prompt, exports, and docs, so a mismatch between the on-disk name and the product term is a lasting source of confusion.
+- *Alternative considered:* a single `scaffold/` subdirectory holding everything — rejected; it splits the agent filesystem from the project root and complicates the Git story which already versions the whole project dir.
 
 ### D4 — `.mcp.json` is platform-seeded with a token placeholder
 
@@ -69,9 +71,10 @@ A test case is *failing* when the most recent `TestRun` embedded result for it h
 ## Migration Plan
 
 1. Ship schema migration `00X-drop-test-agents`: soft-delete all `TestAgent` docs, `$unset` `TestCase.testAgent`, leave `TestRun` documents untouched.
-2. Seed `.mcp.json` for existing projects lazily (on first builder-agent start or settings save) and for new projects on creation.
-3. Frontend redirects: `/testing/playground → /agent`, `/testing/agents → /settings/agent`, `/connections/data → /connections?tool=browser`, `/connections/console → /connections?tool=console`, `/data → /connections?tool=browser`.
-4. Rollback: the migration is destructive for TestAgents by user decision; rollback restores routes/UI only.
+2. Ship filesystem migration `migrate-data-models-layout.ts` (replaces `migrate-src-layout.ts`): on startup, for any project dir lacking `data_models/`, move model YAMLs into `data_models/` from the legacy `src/` directory (or, for very old projects, from the project root). `uploads/` and scaffold dirs are left in place. Idempotent and safe to re-run.
+3. Seed `.mcp.json` for existing projects lazily (on first builder-agent start or settings save) and for new projects on creation.
+4. Frontend redirects: `/testing/playground → /agent`, `/testing/agents → /settings/agent`, `/connections/data → /connections?tool=browser`, `/connections/console → /connections?tool=console`, `/data → /connections?tool=browser`.
+5. Rollback: the TestAgent migration is destructive by user decision; the `data_models/` move is reversible by moving files back to `src/`. Rollback otherwise restores routes/UI only.
 
 ## Open Questions
 
diff --git a/openspec/changes/restructure-agent-platform/proposal.md b/openspec/changes/restructure-agent-platform/proposal.md
index 1de437e..3f34142 100644
--- a/openspec/changes/restructure-agent-platform/proposal.md
+++ b/openspec/changes/restructure-agent-platform/proposal.md
@@ -36,7 +36,8 @@ archmax is repositioning from "a tool that manages semantic descriptions of data
 
 ### Agent scaffold (new capability)
 
-- The project directory is formalized as a plugin-style **agent filesystem** authored directly by the builder agent (no generation pipeline): `commands/`, `agents/`, `skills/<name>/SKILL.md`, `hooks/hooks.json`, `scripts/`, and `.mcp.json`, alongside the existing data-model YAML files and `AGENTS.md`.
+- The project directory is formalized as a plugin-style **agent filesystem** authored directly by the builder agent (no generation pipeline): `commands/`, `agents/`, `skills/<name>/SKILL.md`, `hooks/hooks.json`, `scripts/`, and `.mcp.json`, alongside `AGENTS.md` and a dedicated **`data_models/`** directory for the semantic model YAML files.
+- Semantic model files move from the current `src/` directory into `data_models/` (file service, agent backend, and publish assembly updated; a startup migration relocates existing `src/` content). This matches the "Data Models" product label and reserves a future `api_models/` sibling.
 - `.mcp.json` is seeded and maintained by the platform, pointing at the project's MCP endpoint with an env-var token placeholder (never a real token).
 - The builder's file backend gains JSON syntax validation on write (mirroring the existing YAML validation).
 - A scaffold export endpoint (`GET /api/projects/:projectId/scaffold/export`) downloads the scaffold as a zip for use in external Deep-Agents-compatible harnesses; an Export action is available in the Agent Scaffold panel. The existing LangChain Deep Agents playground/test-runner remains the built-in test harness.
@@ -53,7 +54,8 @@ archmax is repositioning from "a tool that manages semantic descriptions of data
   - `apps/frontend/src/routes/_auth/$projectId/` — `connections/*`, `models.tsx`, `testing/*`, new `agent.tsx`, `settings*` (route moves, overlay dialogs, panel restructure)
   - `packages/core/src/models/` — remove `TestAgent.ts`, edit `Project.ts`, `TestCase.ts`, `TestRun.ts`, `Conversation.ts`
   - `apps/api/src/routes/` — remove `test-agents.ts`; add `llm-settings.ts`, `scaffold.ts`; edit `test-cases.ts`, `test-runs.ts`, `playground.ts`, `config`
-  - `packages/core/src/services/` — `agent.ts`, `playground-agent.ts`, `test-runner.ts`, `agent-tools.ts`, `git.ts` (scaffold ignore rules), filesystem backend validation
+  - `packages/core/src/services/` — `agent.ts`, `playground-agent.ts`, `test-runner.ts`, `agent-tools.ts`, `git.ts` (scaffold ignore rules), `SemanticModelFileService` (`src/` → `data_models/`), filesystem backend validation
+  - `apps/api/src/scripts/` — replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (`src/` → `data_models/`)
   - `apps/worker/src/processor.ts` (playground branching without testAgentId)
   - Schema migration (drop TestAgents, unset `TestCase.testAgent`)
   - `apps/docs` (navigation, testing, settings, new agent-scaffold guide)
diff --git a/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md b/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
index c67e20b..b58be50 100644
--- a/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
@@ -2,11 +2,13 @@
 
 ### Requirement: Agent Scaffold Filesystem Layout
 
-Each project's data directory (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) SHALL constitute the project's **agent scaffold**: a plugin-style filesystem intended for consumption by an agent harness. In addition to the existing data-model YAML files and the optional `AGENTS.md`, the scaffold SHALL support the following conventional entries:
+Each project's data directory (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) SHALL constitute the project's **agent scaffold**: a plugin-style filesystem intended for consumption by an agent harness. The semantic model YAML files SHALL live under a dedicated `data_models/` subdirectory (see the `semantic-models` capability for the exact file layout). Alongside `data_models/` and the optional `AGENTS.md`, the scaffold SHALL support the following conventional entries:
 
 ```
 <project-dir>/
-├── *.yaml               # data models (existing semantic model files)
+├── data_models/         # data models (semantic model YAML files)
+│   ├── <model>.yaml
+│   └── <model>/<dataset>.yaml
 ├── AGENTS.md            # agent instructions / memory (existing)
 ├── commands/            # slash commands (.md) — legacy, prefer skills/
 ├── agents/              # subagent definitions (.md)
@@ -19,7 +21,7 @@ Each project's data directory (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) SHALL
 └── scripts/             # helper scripts
 ```
 
-Scaffold files SHALL be authored **directly by the builder agent** through its existing Deep Agents filesystem tools (no separate generation pipeline). The builder's system prompt SHALL document the scaffold layout and conventions, including that `skills/` is preferred over `commands/` for new capabilities. Scaffold directories and files SHALL be included in the project's Git versioning (they are source, not build output).
+Scaffold files SHALL be authored **directly by the builder agent** through its existing Deep Agents filesystem tools (no separate generation pipeline). The builder's system prompt SHALL document the scaffold layout and conventions, including that semantic models live under `data_models/` and that `skills/` is preferred over `commands/` for new capabilities. Scaffold directories and files SHALL be included in the project's Git versioning (they are source, not build output).
 
 #### Scenario: Builder authors a skill
 
@@ -29,14 +31,14 @@ Scaffold files SHALL be authored **directly by the builder agent** through its e
 
 #### Scenario: Scaffold coexists with data models
 
-- **WHEN** a project contains scaffold directories alongside `*.yaml` model files
-- **THEN** semantic-model listing and MCP tools continue to operate on the YAML files unchanged
+- **WHEN** a project contains scaffold directories (`skills/`, `agents/`, `hooks/`) alongside the `data_models/` directory
+- **THEN** semantic-model listing and MCP tools continue to operate on the YAML files under `data_models/` unchanged
 - **AND** the scaffold entries do not interfere with model parsing
 
 #### Scenario: System prompt documents the layout
 
 - **WHEN** the builder agent's system prompt is composed
-- **THEN** it describes the scaffold layout (`commands/`, `agents/`, `skills/<name>/SKILL.md`, `hooks/hooks.json`, `.mcp.json`, `scripts/`) and the skills-over-commands preference
+- **THEN** it describes the scaffold layout (`data_models/`, `commands/`, `agents/`, `skills/<name>/SKILL.md`, `hooks/hooks.json`, `.mcp.json`, `scripts/`), that semantic models live under `data_models/`, and the skills-over-commands preference
 
 ### Requirement: Seeded MCP Server Definition
 
@@ -82,7 +84,7 @@ The API SHALL expose an authenticated `GET /api/projects/:projectId/scaffold/exp
 
 - **WHEN** an authenticated GET request is made to `/api/projects/:projectId/scaffold/export` for a project with models, `AGENTS.md`, and a skill
 - **THEN** the response is a zip download named `<slug>-scaffold.zip`
-- **AND** it contains the YAML models, `AGENTS.md`, `skills/`, and `.mcp.json`
+- **AND** it contains the `data_models/` directory, `AGENTS.md`, `skills/`, and `.mcp.json`
 - **AND** it contains no `.git/`, `large_tool_results/`, `uploads/`, or DuckDB files
 
 #### Scenario: No secrets in the export
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md
index 4927e6e..023bc64 100644
--- a/openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-model-agent/spec.md
@@ -138,6 +138,31 @@ The Builder side panel's **Data Models** entry (nested under the **Agent Scaffol
 - **THEN** each model is displayed as a single row with the model name and a database icon
 - **AND** no expand/collapse chevron or subtree is shown
 
+### Requirement: Deep Agent Backend
+
+The API SHALL expose a streaming endpoint for the semantic model agent. The agent uses LangChain Deep Agents with `FilesystemBackend({ rootDir: "<ARCHMAX_DATA_DIR>/projects/<projectId>", virtualMode: true })`, giving it sandboxed filesystem access to the project's agent scaffold (the `data_models/` directory plus scaffold entries such as `skills/`, `agents/`, `hooks/`, `.mcp.json`, and `scripts/`). Semantic model YAML files SHALL be authored under the `data_models/` subdirectory. The agent system prompt SHALL document the OSI-compliant YAML schema including: snake_case keys (`ai_context`, `primary_key`, `unique_keys`, `from_columns`, `to_columns`), the OSI Expression object format (`{ dialects: [{ dialect: ANSI_SQL, expression: "..." }] }`), `custom_extensions` for project-specific field metadata (`data_type`, `example_data`, `distinct_values` under `vendor_name: COMMON`), and the `dimension` property with `is_time` for temporal fields. The system prompt SHALL instruct the agent that semantic model files live under `data_models/`. The agent SHALL also have access to a `read_document` tool that reads uploaded documents from the project's `uploads/` directory and returns their content as markdown, enabling the agent to reference data dictionaries, ERDs, business glossaries, and other supplementary documentation when building semantic models.
+
+#### Scenario: Agent lists semantic models
+- **WHEN** the user asks "What semantic models exist?"
+- **THEN** the agent uses the `ls` filesystem tool to list YAML files under the `data_models/` directory
+- **AND** returns a summary to the user
+
+#### Scenario: Agent creates a new semantic model
+- **WHEN** the user asks "Create a model for the orders schema"
+- **THEN** the agent uses `write_file` to create a new YAML file conforming to the OSI schema with snake_case keys and Expression objects
+- **AND** the file is written to `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/<model-name>.yaml`
+
+#### Scenario: Agent writes fields with extensions
+- **WHEN** the agent creates a dataset with fields that have data types and example data
+- **THEN** the field's `data_type`, `example_data`, and `distinct_values` are placed in `custom_extensions` with `vendor_name: COMMON`
+- **AND** timestamp/date fields include `dimension: { is_time: true }`
+
+#### Scenario: Agent reads an uploaded document
+- **WHEN** the user says "Use the data dictionary PDF to create the model"
+- **THEN** the agent invokes `read_document` with the PDF filename
+- **AND** receives the document content as markdown
+- **AND** uses the extracted information to inform semantic model creation
+
 ## ADDED Requirements
 
 ### Requirement: Builder Side Panel Structure
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
index 2574dcb..50edbab 100644
--- a/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
@@ -3,6 +3,9 @@
 - FROM: `### Requirement: Improvements UI in Semantic Models Sidebar`
 - TO: `### Requirement: Improvements & Testing Panel`
 
+- FROM: `### Requirement: Source Directory Layout Migration`
+- TO: `### Requirement: Data Models Directory Layout Migration`
+
 ## MODIFIED Requirements
 
 ### Requirement: Improvements & Testing Panel
@@ -50,3 +53,144 @@ The section header SHALL display a pending-count badge equal to the number of pe
 - **WHEN** the user hovers over an improvement row and clicks the trash icon
 - **THEN** the improvement is soft-deleted via the API and removed from the list
 - **AND** if the deleted improvement was the active detail view, the user is navigated away
+
+### Requirement: Semantic Model YAML Structure
+
+A semantic model SHALL be stored as a YAML root file at `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/<modelName>.yaml` containing: `name` (string, required), `description` (string), `ai_context` (string or object with optional `instructions`, `synonyms`, `examples`), `relationships` (array), `metrics` (array), and `custom_extensions` (optional array of `{ vendor_name, data }` objects). Datasets SHALL NOT be stored in the root file when a per-dataset directory exists.
+
+#### Scenario: Root file contains model-level data
+
+- **WHEN** a semantic model is written to disk
+- **THEN** the root YAML file is stored at `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/<modelName>.yaml`
+- **AND** the file contains name, description, ai_context, relationships, metrics, and custom_extensions
+- **AND** datasets are stored as individual files in a `data_models/<modelName>/` subdirectory
+
+#### Scenario: AI context as structured object
+
+- **WHEN** a model is saved with `ai_context: { instructions: "Use for retail analytics", synonyms: ["sales model"] }`
+- **THEN** the AI context is persisted in the YAML file as a structured object
+
+### Requirement: Dataset Files
+
+Each dataset SHALL be stored as a separate YAML file at `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/<modelName>/<datasetName>.yaml` containing: `name` (string, required), `source` (string, e.g. `<connection>.<schema>.<table>`), `primary_key` (string array), `unique_keys` (array of string arrays), `description`, `ai_context`, `fields` (array of inline field objects), and `custom_extensions` (optional array of `{ vendor_name, data }` objects).
+
+#### Scenario: Dataset with composite primary key
+
+- **WHEN** a dataset file is written with `primary_key: ["item_sk", "ticket_number"]`
+- **THEN** the composite primary key is stored in the dataset YAML at `data_models/<modelName>/<datasetName>.yaml`
+
+#### Scenario: Dataset source reference
+
+- **WHEN** a dataset is saved with `source: "tpcds.public.store_sales"`
+- **THEN** the fully-qualified `<connection>.<schema>.<table>` reference is stored under `data_models/`
+
+#### Scenario: Dataset with custom extensions
+
+- **WHEN** a dataset is saved with `custom_extensions: [{ vendor_name: COMMON, data: '{"graph_x": 100}' }]`
+- **THEN** the custom extensions are stored alongside the other dataset properties in the YAML file under `data_models/`
+
+### Requirement: SemanticModelFileService
+
+The system SHALL provide a `SemanticModelFileService` class that manages all YAML file I/O for semantic models. Source files live under `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/`. It SHALL expose: `list(projectId)` — read all models in a project, `get(projectId, name)` — assemble a full model from root + dataset files, `getDataset(projectId, modelName, datasetName)` — read a single dataset file, `write(projectId, model)` — split and write root + dataset files with atomic writes (temp file + rename), `delete(projectId, name)` — remove root file and dataset directory, `exists(projectId, name)`. The service SHALL check for the `data_models/` subdirectory first and fall back to the legacy `src/` subdirectory and then the legacy root-level layout for backward compatibility during migration.
+
+#### Scenario: List models reads YAML files from data_models directory
+
+- **WHEN** `list("proj1")` is called
+- **THEN** all `.yaml` files in `<ARCHMAX_DATA_DIR>/proj1/data_models/` are read, parsed, and returned as assembled models
+
+#### Scenario: Get assembles from split files
+
+- **WHEN** `get("proj1", "sales")` is called and a `data_models/sales/` subdirectory exists
+- **THEN** the root file `data_models/sales.yaml` is read for model-level data
+- **AND** each `.yaml` in `data_models/sales/` is read as a dataset
+- **AND** the full assembled model is returned
+
+#### Scenario: Get falls back to single-file format
+
+- **WHEN** `get("proj1", "legacy")` is called and no `data_models/legacy/` subdirectory exists
+- **THEN** the root file `data_models/legacy.yaml` is parsed as a complete model including inline datasets
+
+#### Scenario: Legacy layout fallback
+
+- **WHEN** `list("proj1")` is called and `<ARCHMAX_DATA_DIR>/proj1/data_models/` does not exist
+- **AND** YAML files exist under a legacy `src/` subdirectory or directly under `<ARCHMAX_DATA_DIR>/proj1/`
+- **THEN** the service reads from the legacy location
+
+#### Scenario: Write splits model into files under data_models
+
+- **WHEN** `write("proj1", model)` is called
+- **THEN** the root file is written to `data_models/` without datasets
+- **AND** each dataset is written as a separate file in `data_models/<modelName>/`
+- **AND** stale dataset files no longer in the model are deleted
+
+#### Scenario: Delete removes root and dataset directory
+
+- **WHEN** `delete("proj1", "sales")` is called
+- **THEN** `data_models/sales.yaml` is deleted
+- **AND** the `data_models/sales/` directory is recursively removed
+
+### Requirement: Data Models Directory Layout Migration
+
+The system SHALL provide a migration script at `apps/api/src/scripts/migrate-data-models-layout.ts` that moves semantic model files into the `data_models/` subdirectory (`<projectId>/data_models/<model>.yaml`) from either the legacy `src/` subdirectory (`<projectId>/src/<model>.yaml`) or the legacy root-level layout (`<projectId>/<model>.yaml`). The migration SHALL preserve the `uploads/` directory and any agent-scaffold directories (`commands/`, `agents/`, `skills/`, `hooks/`, `scripts/`) and `.mcp.json` if they exist. The migration SHALL run automatically on app startup for any project directory that lacks a `data_models/` subdirectory but contains YAML model files under `src/` or at the root level.
+
+#### Scenario: Migration moves files from src to data_models
+
+- **WHEN** the migration detects a `<ARCHMAX_DATA_DIR>/projects/<projectId>/src/` directory with model files
+- **AND** no `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/` directory exists
+- **THEN** `src/model.yaml` is moved to `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/model.yaml`
+- **AND** the `src/model/` dataset directory (if present) is moved to `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/model/`
+- **AND** `AGENTS.md` remains at `<ARCHMAX_DATA_DIR>/projects/<projectId>/AGENTS.md` (project root)
+
+#### Scenario: Migration moves files from legacy root layout
+
+- **WHEN** the migration detects YAML files at `<ARCHMAX_DATA_DIR>/projects/<projectId>/model.yaml`
+- **AND** neither a `data_models/` nor a `src/` subdirectory exists
+- **THEN** `model.yaml` is moved to `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/model.yaml`
+- **AND** the `model/` dataset directory (if present) is moved to `<ARCHMAX_DATA_DIR>/projects/<projectId>/data_models/model/`
+
+#### Scenario: Migration preserves uploads and scaffold directories
+
+- **WHEN** the migration runs on a project with an existing `uploads/` directory and scaffold directories (`skills/`, `hooks/`)
+- **THEN** the `uploads/` directory remains at `<ARCHMAX_DATA_DIR>/projects/<projectId>/uploads/` (not moved)
+- **AND** the scaffold directories remain at the project root (not moved)
+
+#### Scenario: Migration is idempotent
+
+- **WHEN** the migration runs on a project that already has a `data_models/` subdirectory
+- **THEN** no files are moved and the project layout is unchanged
+
+### Requirement: Git Directory Exclusion
+
+All file listing operations SHALL exclude entries whose names start with `.` (dotfiles and dotdirs). This applies to: `SemanticModelFileService.list()` and `get()` directory traversals, `DocumentFileService.list()`, the agent filesystem `listFiles` operation, and the publish `collectFiles` helper. Specifically, the `.git/` directory and its contents SHALL never appear in model listings, document listings, agent file operations, or published content.
+
+#### Scenario: Model listing excludes .git
+
+- **WHEN** a project directory contains `data_models/sales.yaml`, `data_models/.git/`, and `data_models/.hidden.yaml`
+- **THEN** `SemanticModelFileService.list()` returns only the `sales` model
+- **AND** `.git` directory contents are not traversed
+
+#### Scenario: Agent filesystem excludes dotfiles
+
+- **WHEN** the agent lists files in the project directory
+- **THEN** `.git/` and other dotfiles/dotdirs are not included in the listing
+
+#### Scenario: Publish assembly excludes dotfiles
+
+- **WHEN** the publish build assembly processes the project directory
+- **THEN** `.git/` contents are not included in the build output
+
+### Requirement: Merge Conflict Detection in YAML
+
+The `SemanticModelFileService` SHALL detect Git merge conflict markers (`<<<<<<<`, `=======`, `>>>>>>>`) in YAML files. When listing models, files with conflict markers SHALL still appear in the model list with a `hasConflicts: true` flag, but their parsed content SHALL be marked as invalid. The `get()` method SHALL return the raw file content alongside the conflict flag so the frontend can display it.
+
+#### Scenario: List models with a conflicted file
+
+- **WHEN** `list()` is called and `data_models/sales.yaml` contains Git conflict markers
+- **THEN** the model `sales` appears in the list with `hasConflicts: true`
+- **AND** other valid models are returned normally
+
+#### Scenario: Get a conflicted model
+
+- **WHEN** `get("sales")` is called and the file contains conflict markers
+- **THEN** the response includes `hasConflicts: true` and the raw YAML content
+- **AND** parsed fields (datasets, relationships, metrics) are empty or absent
diff --git a/openspec/changes/restructure-agent-platform/tasks.md b/openspec/changes/restructure-agent-platform/tasks.md
index 8c5f433..2f6d7f5 100644
--- a/openspec/changes/restructure-agent-platform/tasks.md
+++ b/openspec/changes/restructure-agent-platform/tasks.md
@@ -26,6 +26,7 @@
 
 ## 4. Agent scaffold
 
+- [ ] 4.0 Rename model storage `src/` → `data_models/`: update `SemanticModelFileService` (path constant + legacy `src/`/root fallbacks), the agent `FilesystemBackend` prompt guidance, and publish assembly; replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (idempotent startup move from `src/` or root, preserving `uploads/` and scaffold dirs); update/extend tests
 - [ ] 4.1 `.mcp.json` seeding service: create on project creation, recreate-if-missing on builder start, update on slug change, preserve foreign entries; placeholder token only; unit tests
 - [ ] 4.2 JSON syntax validation on `write_file`/`edit_file` for `.json` paths in the builder filesystem backend (mirroring YAML validation); tests
 - [ ] 4.3 Extend the builder system prompt with the scaffold layout and skills-over-commands guidance

From d400443a29ebb15d669558b7084bc353b5e9c91a Mon Sep 17 00:00:00 2001
From: Tobias Grosse-Puppendahl <tobias@grosse-puppendahl.de>
Date: Thu, 11 Jun 2026 10:34:47 +0200
Subject: [PATCH 3/5] docs(openspec): address Bugbot findings on platform
 restructure

- Partition Builder/Agent conversation histories by the playground flag so
  playground chats can't leak into Build history (replaces testAgent:null).
- latest-results returns inputMessage + unmetFacts so the refine prefill is
  buildable from the mandated data source alone.
- Move per-project builder/agentConfigured off the global unauthenticated
  /api/config onto project-scoped authenticated surfaces.
- Backfill TestRun.testAgentName before soft-deleting TestAgents so legacy
  run lists keep names without populating deleted refs.
- LLM settings GETs return non-secret metadata only (apiKeySet/apiKeySource);
  never return key material, masked or otherwise, in responses or placeholders.
- .mcp.json writes reject literal credentials (env-var placeholders only).
- Scaffold export excludes secret files and fails closed on a content scan.

Co-authored-by: Cursor <cursoragent@cursor.com>
---
 .../restructure-agent-platform/design.md      |  8 +++--
 .../restructure-agent-platform/proposal.md    |  4 +--
 .../specs/agent-scaffold/spec.md              | 26 +++++++++++++---
 .../specs/project-management/spec.md          | 20 ++++++++----
 .../specs/semantic-models/spec.md             |  4 +--
 .../specs/testing-suite/spec.md               | 31 +++++++++++++------
 .../restructure-agent-platform/tasks.md       | 22 ++++++-------
 7 files changed, 77 insertions(+), 38 deletions(-)

diff --git a/openspec/changes/restructure-agent-platform/design.md b/openspec/changes/restructure-agent-platform/design.md
index 5ce2a28..647f3ac 100644
--- a/openspec/changes/restructure-agent-platform/design.md
+++ b/openspec/changes/restructure-agent-platform/design.md
@@ -54,7 +54,9 @@ The sidebar loses both entries; the Data Sources header hosts Browser (icon+text
 
 ### D6 — Migration drops all TestAgents (user decision: manual reconfiguration)
 
-A schema migration soft-deletes every `TestAgent` document and unsets `TestCase.testAgent`. `TestRun.testAgent` becomes optional; historical runs remain readable (legacy agent name shown when present). New runs snapshot the project agent's `llmModel` per run instead of referencing an agent document. Conversations: playground conversations are identified by a `playground: true` flag going forward; legacy `testAgent` references remain readable.
+A schema migration **first backfills `TestRun.testAgentName`** from each run's `testAgent.name`, then soft-deletes every `TestAgent` document and unsets `TestCase.testAgent`. `TestRun.testAgent` becomes optional and is never populated afterwards (the soft-deleted ref would return nothing); run lists/detail render the `llmModel` snapshot for new runs and the `testAgentName` snapshot for legacy runs, falling back to a neutral "Legacy agent" label. Conversations: playground conversations are identified by a `playground: true` flag going forward; legacy `testAgent` references remain readable. Builder and Agent conversation histories are partitioned by the `playground` flag (Builder filters `playground: { $ne: true }`, replacing the old `testAgent: null` filter, so playground chats cannot leak into Build history).
+
+Per-project agent/builder *configured* state is exposed on project-scoped, authenticated endpoints (the llm-settings GETs and the project payload), **not** on the global unauthenticated `/api/config` route, which has no `projectId` and would be wrong under multiple projects. API key material (including env secrets like `AGENT_API_KEY`) is never returned in any form — masked or otherwise; the UI shows only presence/source via `apiKeySet`/`apiKeySource`.
 
 ### D7 — "Failing tests" = latest result per test case
 
@@ -65,12 +67,12 @@ A test case is *failing* when the most recent `TestRun` embedded result for it h
 - **Large rename surface** (routes, labels, docs, e2e selectors) → mitigated by keeping all backend route prefixes except `test-agents` stable, and adding redirects for moved frontend routes.
 - **Removing TestAgent breaks the builder's `list_test_agents` tool and prompt flow** → tool removed and prompt updated in the same change; `create_test_case` is simplified rather than left referencing dead concepts.
 - **Concurrent edit conflict with `add-llm-prompt-caching`** → no overlapping spec requirements; sequence implementation (rebase whichever lands second).
-- **Export could leak secrets** → export reuses an explicit denylist and `.mcp.json` is placeholder-only by construction; a test asserts no `encryptedApiKey`/token material is present in the bundle.
+- **Export could leak secrets** → export uses a runtime denylist *and* secret-file exclusions (`.env*`, key/credential files) *and* a content scan that fails the export closed on secret patterns; `.mcp.json` is placeholder-only by construction, and `.mcp.json` writes reject literal credential values (only `${VAR}` placeholders allowed). Tests assert the fail-closed path and absence of `encryptedApiKey`/token material.
 - **Blocking playground/test-runs until the agent is configured adds first-run friction** → mitigated by prominent empty states deep-linking to Settings → Agent.
 
 ## Migration Plan
 
-1. Ship schema migration `00X-drop-test-agents`: soft-delete all `TestAgent` docs, `$unset` `TestCase.testAgent`, leave `TestRun` documents untouched.
+1. Ship schema migration `00X-drop-test-agents`: backfill `TestRun.testAgentName` from each run's `testAgent.name`, then soft-delete all `TestAgent` docs and `$unset` `TestCase.testAgent`; otherwise leave `TestRun` documents intact.
 2. Ship filesystem migration `migrate-data-models-layout.ts` (replaces `migrate-src-layout.ts`): on startup, for any project dir lacking `data_models/`, move model YAMLs into `data_models/` from the legacy `src/` directory (or, for very old projects, from the project root). `uploads/` and scaffold dirs are left in place. Idempotent and safe to re-run.
 3. Seed `.mcp.json` for existing projects lazily (on first builder-agent start or settings save) and for new projects on creation.
 4. Frontend redirects: `/testing/playground → /agent`, `/testing/agents → /settings/agent`, `/connections/data → /connections?tool=browser`, `/connections/console → /connections?tool=console`, `/data → /connections?tool=browser`.
diff --git a/openspec/changes/restructure-agent-platform/proposal.md b/openspec/changes/restructure-agent-platform/proposal.md
index 3f34142..9033453 100644
--- a/openspec/changes/restructure-agent-platform/proposal.md
+++ b/openspec/changes/restructure-agent-platform/proposal.md
@@ -53,10 +53,10 @@ archmax is repositioning from "a tool that manages semantic descriptions of data
   - `apps/frontend/src/components/layout/app-sidebar.tsx` (nav restructure)
   - `apps/frontend/src/routes/_auth/$projectId/` — `connections/*`, `models.tsx`, `testing/*`, new `agent.tsx`, `settings*` (route moves, overlay dialogs, panel restructure)
   - `packages/core/src/models/` — remove `TestAgent.ts`, edit `Project.ts`, `TestCase.ts`, `TestRun.ts`, `Conversation.ts`
-  - `apps/api/src/routes/` — remove `test-agents.ts`; add `llm-settings.ts`, `scaffold.ts`; edit `test-cases.ts`, `test-runs.ts`, `playground.ts`, `config`
+  - `apps/api/src/routes/` — remove `test-agents.ts`; add `llm-settings.ts`, `scaffold.ts`; edit `test-cases.ts` (latest-results), `test-runs.ts`, `playground.ts`, `projects.ts` (per-project `builder/agentConfigured`; not the global `config` route)
   - `packages/core/src/services/` — `agent.ts`, `playground-agent.ts`, `test-runner.ts`, `agent-tools.ts`, `git.ts` (scaffold ignore rules), `SemanticModelFileService` (`src/` → `data_models/`), filesystem backend validation
   - `apps/api/src/scripts/` — replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (`src/` → `data_models/`)
   - `apps/worker/src/processor.ts` (playground branching without testAgentId)
-  - Schema migration (drop TestAgents, unset `TestCase.testAgent`)
+  - Schema migration (backfill `TestRun.testAgentName`, then drop TestAgents, unset `TestCase.testAgent`)
   - `apps/docs` (navigation, testing, settings, new agent-scaffold guide)
 - Coordination: the active change `add-llm-prompt-caching` also edits `packages/core/src/services/agent.ts` and `playground-agent.ts`. No spec-requirement overlap, but implementation should be sequenced (caching first or rebase this change on it).
diff --git a/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md b/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
index b58be50..a79d88a 100644
--- a/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/agent-scaffold/spec.md
@@ -42,7 +42,7 @@ Scaffold files SHALL be authored **directly by the builder agent** through its e
 
 ### Requirement: Seeded MCP Server Definition
 
-The platform SHALL seed and maintain a `.mcp.json` file at the project root containing an `archmax` MCP server entry pointing at the project's MCP endpoint (derived from the configured application base URL and the project slug). The entry SHALL reference the bearer token via an environment-variable placeholder (e.g. `${ARCHMAX_MCP_TOKEN}`); real token values MUST NOT be written to the file. The file SHALL be created on project creation, recreated if missing when the builder agent starts, and updated when the project slug changes. The builder agent MAY extend the file with additional servers; the platform SHALL preserve unknown entries when updating its own.
+The platform SHALL seed and maintain a `.mcp.json` file at the project root containing an `archmax` MCP server entry pointing at the project's MCP endpoint (derived from the configured application base URL and the project slug). The entry SHALL reference the bearer token via an environment-variable placeholder (e.g. `${ARCHMAX_MCP_TOKEN}`); real token values MUST NOT be written to the file. The file SHALL be created on project creation, recreated if missing when the builder agent starts, and updated when the project slug changes. The builder agent MAY extend the file with additional servers; when updating its own `archmax` entry the platform SHALL preserve foreign entries **only if** they contain no literal secret material (see "Credential-Safe JSON Validation"). If a foreign entry contains a literal credential, the platform SHALL refuse to silently re-persist it and SHALL surface a warning identifying the offending entry rather than writing secret material back to the Git-versioned, exportable file.
 
 #### Scenario: New project gets a seeded .mcp.json
 
@@ -61,24 +61,34 @@ The platform SHALL seed and maintain a `.mcp.json` file at the project root cont
 - **WHEN** `.mcp.json` is written or updated by the platform
 - **THEN** the file contains no literal bearer tokens, API keys, or other secret material
 
-### Requirement: JSON Syntax Validation on Write
+### Requirement: Credential-Safe JSON Validation on Write
 
 The builder agent's filesystem backend SHALL validate JSON syntax before persisting any file whose path ends in `.json` (including `.mcp.json` and `hooks/hooks.json`). When the content is not valid JSON, the `write_file` tool MUST return an error describing the syntax issue instead of writing the file. When an `edit_file` operation on a JSON file produces syntactically invalid content, the tool MUST return an error so the agent can self-correct. This mirrors the existing YAML validation for `.yaml`/`.yml` files.
 
+Because `.mcp.json` is Git-versioned and included in scaffold exports, syntax validation alone is insufficient: writes to `.mcp.json` SHALL additionally reject **literal credential values**. Server entries MAY reference secrets only through environment-variable placeholders (`${VAR}` form). A write SHALL be rejected (with an actionable error) when any `headers`, `env`, or URL field contains a literal value matching a credential pattern — e.g. `Authorization: Bearer <token>` with a non-placeholder token, URL userinfo (`https://user:pass@host`), or an env/header value that is not a `${VAR}` placeholder for keys named like `*_TOKEN`, `*_KEY`, `*_SECRET`, `PASSWORD`, or `AUTHORIZATION`. The same credential-pattern guard SHALL apply to the platform's own seeding/preservation path so secret-looking foreign entries are never written or preserved.
+
 #### Scenario: Invalid JSON rejected
 
 - **WHEN** the builder invokes `write_file` for `hooks/hooks.json` with malformed JSON
 - **THEN** the tool returns an error describing the syntax problem
 - **AND** the file is not written to disk
 
-#### Scenario: Valid JSON written
+#### Scenario: Literal credential in .mcp.json rejected
+
+- **WHEN** the builder invokes `write_file` for `.mcp.json` with a server whose header is `"Authorization": "Bearer sk-live-abc123"`
+- **THEN** the tool returns an error instructing the agent to use a `${VAR}` placeholder
+- **AND** the file is not written to disk
+
+#### Scenario: Valid JSON with placeholders written
 
-- **WHEN** the builder writes syntactically valid JSON to `.mcp.json`
+- **WHEN** the builder writes valid JSON to `.mcp.json` whose credential fields use only `${VAR}` placeholders
 - **THEN** the file is persisted normally
 
 ### Requirement: Scaffold Export API
 
-The API SHALL expose an authenticated `GET /api/projects/:projectId/scaffold/export` endpoint that streams a zip archive of the project's agent scaffold, named `<project-slug>-scaffold.zip`. The archive SHALL contain the project directory contents **excluding** internal entries: `.git/`, `large_tool_results/`, `uploads/`, `duckdb.db` and its side files (`*.wal`, `*.tmp`), and any dotfile temp artifacts. The archive MUST NOT contain secret material; `.mcp.json` is included as seeded (placeholder token only). The endpoint SHALL require admin session auth and return 404 for unknown projects.
+The API SHALL expose an authenticated `GET /api/projects/:projectId/scaffold/export` endpoint that streams a zip archive of the project's agent scaffold, named `<project-slug>-scaffold.zip`. The archive SHALL contain the project directory contents **excluding** internal runtime entries: `.git/`, `large_tool_results/`, `uploads/`, `duckdb.db` and its side files (`*.wal`, `*.tmp`), and any dotfile temp artifacts.
+
+The export MUST fail closed with respect to secrets. In addition to the runtime denylist, the endpoint SHALL exclude well-known secret-bearing files anywhere in the tree — `.env`, `.env.*`, `.npmrc`, `.pypirc`, `.netrc`, `.git-credentials`, and private-key/credential files (`id_*`, `*.pem`, `*.key`, `*.p12`, `*.pfx`, `credentials*`). It SHALL also scan the textual content of each candidate file for secret patterns (e.g. `BETTER_AUTH_SECRET`, `AGENT_API_KEY`, `ENCRYPTION_KEY`, `Authorization: Bearer <token>`, `encryptedApiKey`, and `://user:password@` database URLs). If a candidate file matches a secret pattern, the endpoint SHALL **abort the export with an error** (rather than silently emitting an archive that may contain secrets), naming the offending path; only `${VAR}`-style placeholders are exempt. `.mcp.json` is included as seeded (placeholder token only). The endpoint SHALL require admin session auth and return 404 for unknown projects.
 
 #### Scenario: Export a scaffold
 
@@ -92,6 +102,12 @@ The API SHALL expose an authenticated `GET /api/projects/:projectId/scaffold/exp
 - **WHEN** an exported archive is inspected
 - **THEN** it contains no bearer tokens, API keys, or encrypted credential material
 
+#### Scenario: Secret-bearing file aborts the export
+
+- **WHEN** an agent-created `.env.local` (or a `scripts/` file containing `AGENT_API_KEY=...`) exists in the project directory
+- **THEN** the export aborts with an error naming the offending path
+- **AND** no archive is streamed
+
 #### Scenario: Unauthenticated export rejected
 
 - **WHEN** the request lacks a valid admin session
diff --git a/openspec/changes/restructure-agent-platform/specs/project-management/spec.md b/openspec/changes/restructure-agent-platform/specs/project-management/spec.md
index c01f68d..f277717 100644
--- a/openspec/changes/restructure-agent-platform/specs/project-management/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/project-management/spec.md
@@ -105,20 +105,28 @@ The "Publish History" card SHALL display a list of recent commits from the local
 
 The API SHALL expose project-scoped LLM settings endpoints under `/api/projects/:projectId/llm-settings`:
 
-- `GET /builder` — returns the builder LLM settings with the API key masked (e.g. `sk-...****`) plus an `apiKeySet` boolean and the resolved effective configuration source per field (`project` or `env`)
+- `GET /builder` — returns the builder LLM settings as **non-secret metadata only**: `baseUrl`, `model`, a `configured` boolean (the effective config is usable), and, per field, `apiKeySet` (boolean) plus `apiKeySource` (`project` | `env` | `unset`). It SHALL NOT return any API key string, masked or otherwise.
 - `PUT /builder` — updates `builderLlm` (accepts `baseUrl`, `apiKey`, `model`, each optional; when `apiKey` is provided it is encrypted and replaces the stored key; clearing a field removes the project override)
 - `POST /builder/test-connection` — verifies connectivity against the **effective** builder configuration (project override merged with env fallback) by issuing a lightweight request to the configured endpoint
-- `GET /agent` — returns the agent settings (`baseUrl`, `model`, `systemPrompt`, masked API key, `apiKeySet`) or an unconfigured indicator
+- `GET /agent` — returns the agent settings as **non-secret metadata only**: `baseUrl`, `model`, `systemPrompt`, a `configured` boolean, `apiKeySet`, and `apiKeySource`, or an unconfigured indicator. It SHALL NOT return any API key string, masked or otherwise.
 - `PUT /agent` — creates/updates `agentLlm` (requires `baseUrl`, `model`, `systemPrompt`; `apiKey` required on first save, optional afterwards — omitting it preserves the stored key)
 - `POST /agent/test-connection` — verifies connectivity against the agent configuration
 
-API keys SHALL never be returned in plaintext. Base URLs SHALL be re-validated against the SSRF rules on every PUT and before every outbound test-connection request. All endpoints SHALL require admin session auth. The `/api/config` surface SHALL report per-project configuration state (`builderConfigured`, `agentConfigured`) so the frontend can gate chat inputs and run buttons.
+API keys (whether sourced from a project override or from an environment secret such as `AGENT_API_KEY`) SHALL NEVER be returned in any form — not in plaintext, not masked, and not as a placeholder that reuses any characters of the key. Responses, UI placeholders, server logs, and test-connection error messages SHALL NOT contain key-derived material. Key presence and origin SHALL be communicated only via `apiKeySet`/`apiKeySource`, and any UI hint SHALL be a fixed, non-derived string (e.g. "Using AGENT_API_KEY" for env, "Project key stored" for an override). Base URLs SHALL be re-validated against the SSRF rules on every PUT and before every outbound test-connection request. All endpoints SHALL require admin session auth.
+
+Per-project configuration state SHALL be exposed only on project-scoped, authenticated surfaces: the `GET /builder` and `GET /agent` responses carry the `configured` boolean, and `GET /api/projects/:projectId` MAY also include `builderConfigured` and `agentConfigured` for convenience. The global, unauthenticated `/api/config` route SHALL NOT carry per-project gating flags (it has no `projectId` and would be incorrect in a multi-project deployment). The frontend SHALL gate chat inputs and run buttons from the project-scoped flags.
 
 #### Scenario: Save agent settings
 
 - **WHEN** a PUT request to `/agent` provides `baseUrl`, `apiKey`, `model`, and `systemPrompt`
 - **THEN** the API key is encrypted and stored in `agentLlm.encryptedApiKey`
-- **AND** subsequent GETs return the key masked with `apiKeySet: true`
+- **AND** subsequent GETs return `apiKeySet: true` and `apiKeySource: "project"` with no key characters in the response
+
+#### Scenario: Env-sourced key is never echoed
+
+- **WHEN** a `GET /builder` is made for a project with no key override but with `AGENT_API_KEY` set in the environment
+- **THEN** the response includes `apiKeySet: true` and `apiKeySource: "env"`
+- **AND** the response body, and any UI placeholder derived from it, contain no characters of `AGENT_API_KEY`
 
 #### Scenario: Update agent settings without changing the key
 
@@ -138,7 +146,7 @@ API keys SHALL never be returned in plaintext. Base URLs SHALL be re-validated a
 
 ### Requirement: Builder LLM Settings Page
 
-The frontend SHALL provide a Builder settings page at `/$projectId/settings/builder` with a card containing inline label–input rows for: OpenAI-compatible base URL, API key (password input, masked placeholder when a key is stored), and model name. Each field SHALL indicate its env-default value as placeholder text when no project override is set. The page SHALL provide a "Save" button and a "Test Connection" button. Test Connection SHALL first persist unsaved changes, then call `POST /api/projects/:projectId/llm-settings/builder/test-connection`, showing a success or error toast. Buttons SHALL be disabled with a loading indicator while operations are in flight. A "Reset to defaults" action SHALL clear the project overrides so the env configuration applies again.
+The frontend SHALL provide a Builder settings page at `/$projectId/settings/builder` with a card containing inline label–input rows for: OpenAI-compatible base URL, API key (password input), and model name. The base URL and model fields SHALL show their env-default value as placeholder text when no project override is set. The API key field SHALL NOT display any env or stored key material; instead it SHALL show a fixed, non-derived hint based on `apiKeySet`/`apiKeySource` (e.g. "Using AGENT_API_KEY" or "Project key stored"). The page SHALL provide a "Save" button and a "Test Connection" button. Test Connection SHALL first persist unsaved changes, then call `POST /api/projects/:projectId/llm-settings/builder/test-connection`, showing a success or error toast. Buttons SHALL be disabled with a loading indicator while operations are in flight. A "Reset to defaults" action SHALL clear the project overrides so the env configuration applies again.
 
 #### Scenario: Override the builder model
 
@@ -161,7 +169,7 @@ The frontend SHALL provide a Builder settings page at `/$projectId/settings/buil
 
 ### Requirement: Agent Settings Page
 
-The frontend SHALL provide an Agent settings page at `/$projectId/settings/agent` configuring the project's single agent. The page SHALL contain a card with inline label–input rows for: OpenAI-compatible base URL, API key (password input, masked placeholder when a key is stored), and model name; and a card with the agent's system prompt (textarea). The page SHALL provide "Save" and "Test Connection" buttons with the same save-then-test behavior, disabled states, and loading indicators as the Builder settings page. While the agent is unconfigured, the page SHALL display an informational note that the Agent playground and test runs are unavailable until configuration is saved.
+The frontend SHALL provide an Agent settings page at `/$projectId/settings/agent` configuring the project's single agent. The page SHALL contain a card with inline label–input rows for: OpenAI-compatible base URL, API key (password input, showing a fixed non-derived hint based on `apiKeySet`/`apiKeySource` — never any key characters), and model name; and a card with the agent's system prompt (textarea). The page SHALL provide "Save" and "Test Connection" buttons with the same save-then-test behavior, disabled states, and loading indicators as the Builder settings page. While the agent is unconfigured, the page SHALL display an informational note that the Agent playground and test runs are unavailable until configuration is saved.
 
 #### Scenario: Configure the agent for the first time
 
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
index 50edbab..3be0350 100644
--- a/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
@@ -13,7 +13,7 @@
 The Builder page side panel SHALL include an **Improvements & Testing** accordion section (formerly "Improvement Requests") below the Build section. The section SHALL display two kinds of entries:
 
 1. **Improvement requests** — all improvement suggestions for the project. Each item SHALL show a lightbulb icon, the truncated title, and a checkmark overlay if the improvement has been implemented. Clicking an improvement SHALL navigate to its detail view in the main content area. Each improvement row SHALL show a trash icon on hover that soft-deletes the improvement when clicked, matching the conversation row delete pattern.
-2. **Failing tests** — the project's currently failing test cases, sourced from `GET /api/projects/:projectId/test-cases/latest-results` (entries with `latestStatus` of `failed` or `error`). Each item SHALL show a distinct test/alert icon and the truncated test case title. Clicking a failing-test entry SHALL navigate to the latest run's detail page (`/$projectId/testing/runs/:runId`). Each failing-test row SHALL additionally offer a refine affordance (wand icon on hover) that opens `/$projectId/models/chat/new` with a `prefill` prompt referencing the failing test case and its unmet facts so the builder can improve the model.
+2. **Failing tests** — the project's currently failing test cases, sourced from `GET /api/projects/:projectId/test-cases/latest-results` (entries with `latestStatus` of `failed` or `error`). Each item SHALL show a distinct test/alert icon and the truncated test case title. Clicking a failing-test entry SHALL navigate to the latest run's detail page (`/$projectId/testing/runs/:runId`). Each failing-test row SHALL additionally offer a refine affordance (wand icon on hover) that opens `/$projectId/models/chat/new` with a `prefill` prompt built from the same `latest-results` payload — the case `inputMessage` and its `unmetFacts` — so the builder can improve the model without a second request.
 
 The section header SHALL display a pending-count badge equal to the number of pending improvements plus the number of failing tests.
 
@@ -41,7 +41,7 @@ The section header SHALL display a pending-count badge equal to the number of pe
 #### Scenario: Refine a failing test from the panel
 
 - **WHEN** the user activates the refine affordance on a failing-test entry
-- **THEN** the Build chat opens at `/$projectId/models/chat/new` with a `prefill` prompt describing the failing test case and its unmet expected facts
+- **THEN** the Build chat opens at `/$projectId/models/chat/new` with a `prefill` prompt built from the entry's `inputMessage` and `unmetFacts` (falling back to the error message when an `error`-status case has no `unmetFacts`)
 
 #### Scenario: Empty state
 
diff --git a/openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md b/openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md
index 209dcfa..0dac1b3 100644
--- a/openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/testing-suite/spec.md
@@ -66,8 +66,21 @@ The system SHALL provide an interactive playground chat where the user converses
 
 The tools SHALL read from the current development state of semantic models (YAML files on disk), not from any published snapshot. Playground conversations SHALL be persisted in the existing `Conversation` model and identified by a `playground: true` flag; legacy conversations referencing a deleted `testAgent` SHALL remain readable. Playground interactions SHALL NOT be logged to `McpCallLog`.
 
+Conversation histories SHALL be partitioned by the `playground` flag so that playground chats never leak into the Builder (Build) history and vice versa. The Agent page history list and load endpoints SHALL return only conversations where `playground: true`. The Builder (Build) conversation list and load endpoints SHALL return only conversations where `playground` is absent or `false` (i.e. `playground: { $ne: true }`); the prior `testAgent: null` filter is insufficient because playground conversations also have no `testAgent`. A conversation loaded through the wrong surface (e.g. a playground conversation id requested by the Builder load endpoint) SHALL return 404.
+
 When the project agent is not configured (no `agentLlm` settings), the playground chat endpoint SHALL reject messages with a 400 error indicating that the agent must be configured under Settings → Agent.
 
+#### Scenario: Playground conversations excluded from Builder history
+
+- **WHEN** a project has both Builder conversations and playground conversations (`playground: true`)
+- **THEN** the Builder conversation list returns only the non-playground conversations
+- **AND** the Agent page history returns only the `playground: true` conversations
+
+#### Scenario: Cross-surface load is rejected
+
+- **WHEN** the Builder load endpoint is called with the id of a `playground: true` conversation
+- **THEN** the endpoint responds 404 (and the same applies to the Agent endpoint loading a non-playground conversation)
+
 #### Scenario: Start a playground conversation
 
 - **WHEN** the user sends a message in the playground and the project agent is configured
@@ -151,7 +164,7 @@ The endpoints SHALL NOT accept or filter by a test agent reference. All endpoint
 
 ### Requirement: Test Run Model
 
-The system SHALL provide a `TestRun` Mongoose model representing a batch execution of test cases. Fields: `project` (ObjectId ref to Project, required), `llmModel` (string — snapshot of the project agent's model identifier at run start), `testAgent` (ObjectId, optional — legacy field retained so historical runs remain readable; not set on new runs), `status` (enum: `pending`, `running`, `completed`, `failed`, `cancelled`, required), `cases` (array of embedded results), `startedAt` (Date), `completedAt` (Date), `createdAt` (Date), `updatedAt` (Date). Each embedded case result SHALL contain: `testCase` (ObjectId ref to TestCase), `title` (string — snapshot of test case title), `semanticModel` (string), `inputMessage` (string), `expectedFacts` (array of strings), `maxToolCalls` (number, optional — snapshot of the limit at execution time), `status` (enum: `pending`, `running`, `passed`, `failed`, `error`, `cancelled`), `agentResponse` (string — the agent's final text response), `toolCalls` (array of tool call records), `factResults` (array of `{ fact: string, passed: boolean, reasoning: string }`), `durationMs` (number), `errorMessage` (string, optional).
+The system SHALL provide a `TestRun` Mongoose model representing a batch execution of test cases. Fields: `project` (ObjectId ref to Project, required), `llmModel` (string — snapshot of the project agent's model identifier at run start), `testAgent` (ObjectId, optional — legacy field retained so historical runs remain readable; not set on new runs), `testAgentName` (string, optional — denormalized snapshot of the legacy agent's name, backfilled by the single-agent migration before the `TestAgent` documents are soft-deleted so run lists never depend on populating a deleted reference; not set on new runs), `status` (enum: `pending`, `running`, `completed`, `failed`, `cancelled`, required), `cases` (array of embedded results), `startedAt` (Date), `completedAt` (Date), `createdAt` (Date), `updatedAt` (Date). Each embedded case result SHALL contain: `testCase` (ObjectId ref to TestCase), `title` (string — snapshot of test case title), `semanticModel` (string), `inputMessage` (string), `expectedFacts` (array of strings), `maxToolCalls` (number, optional — snapshot of the limit at execution time), `status` (enum: `pending`, `running`, `passed`, `failed`, `error`, `cancelled`), `agentResponse` (string — the agent's final text response), `toolCalls` (array of tool call records), `factResults` (array of `{ fact: string, passed: boolean, reasoning: string }`), `durationMs` (number), `errorMessage` (string, optional).
 
 #### Scenario: Create a test run
 
@@ -267,7 +280,7 @@ When a test run is cancelled, the system SHALL cooperatively abort in-flight tes
 
 The API SHALL expose endpoints for managing test runs at `/api/projects/:projectId/test-runs`:
 
-- `GET /` -- List all test runs for the project with server-side pagination (`page`, `limit` query params returning `{ items, total, page, limit }`); each item is a summary: id, llmModel snapshot (or legacy agent name when present), case count, passed/failed/error counts, status, timestamps
+- `GET /` -- List all test runs for the project with server-side pagination (`page`, `limit` query params returning `{ items, total, page, limit }`); each item is a summary: id, `llmModel` snapshot for new runs, case count, passed/failed/error counts, status, timestamps. For legacy runs lacking `llmModel`, the summary SHALL surface the denormalized `testAgentName` snapshot, falling back to a neutral `"Legacy agent"` label when neither is present. The endpoint SHALL NOT populate the soft-deleted `testAgent` reference.
 - `GET /:runId` -- Get a single test run with full case results (the full embedded cases array)
 - `POST /` -- Initiate a batch run (accepts `testCaseIds` array); rejects with 400 when the project agent is not configured; returns the new TestRun ID
 - `POST /:runId/cancel` -- Cancel a running or pending test run; marks remaining cases as `cancelled`, aborts in-flight executions, and sets the run status to `cancelled`
@@ -362,7 +375,7 @@ The test case form dialog SHALL include a "Run Test" button in both create and e
 
 ### Requirement: Testing UI — Test Runs List Page
 
-The frontend SHALL provide a Test Runs page at `/$projectId/testing/runs` displaying a server-side-paginated table of all test runs for the project. Columns: status icon, model (the run's `llmModel` snapshot, or the legacy agent name for pre-migration runs), case count, result summary (passed/failed/errors as badges), date. Each row links to the run detail page at `/$projectId/testing/runs/:runId`. The page auto-refreshes while any run is in `pending` or `running` status.
+The frontend SHALL provide a Test Runs page at `/$projectId/testing/runs` displaying a server-side-paginated table of all test runs for the project. Columns: status icon, model (the run's `llmModel` snapshot, or the `testAgentName` snapshot for pre-migration runs, falling back to a `"Legacy agent"` label), case count, result summary (passed/failed/errors as badges), date. Each row links to the run detail page at `/$projectId/testing/runs/:runId`. The page auto-refreshes while any run is in `pending` or `running` status.
 
 The `cancelled` status SHALL be displayed with a neutral grey icon (ban/slash) consistent with the detail page styling.
 
@@ -388,7 +401,7 @@ The `cancelled` status SHALL be displayed with a neutral grey icon (ban/slash) c
 
 ### Requirement: Testing UI — Test Run Detail Page
 
-The frontend SHALL provide a Test Run Detail page at `/$projectId/testing/runs/:runId` showing run metadata (model snapshot — or legacy agent name for pre-migration runs — status, started/completed timestamps, overall pass/fail counts) and a client-side-paginated list of case results. Each case result shows: status icon, title, semantic model badge, input message, agent response (expandable), tool calls (expandable), fact results with pass/fail icons and reasoning, duration, and error message if applicable. The page auto-refreshes (polls every 3 seconds) while the run status is `pending` or `running`. A back link returns to the Test Runs list.
+The frontend SHALL provide a Test Run Detail page at `/$projectId/testing/runs/:runId` showing run metadata (model — the `llmModel` snapshot, or the `testAgentName` snapshot for pre-migration runs, falling back to a `"Legacy agent"` label — status, started/completed timestamps, overall pass/fail counts) and a client-side-paginated list of case results. Each case result shows: status icon, title, semantic model badge, input message, agent response (expandable), tool calls (expandable), fact results with pass/fail icons and reasoning, duration, and error message if applicable. The page auto-refreshes (polls every 3 seconds) while the run status is `pending` or `running`. A back link returns to the Test Runs list.
 
 A "Cancel Run" button SHALL be displayed in the page header next to the status badge when the run status is `pending` or `running`. Clicking the button SHALL call `POST /api/projects/:projectId/test-runs/:runId/cancel`. On success, the button SHALL disappear and the status badge SHALL update to `cancelled`. The button SHALL be disabled while the cancel request is in progress and SHALL display a loading indicator.
 
@@ -458,16 +471,16 @@ A "Refine" button SHALL appear for all completed test cases (passed, failed, or
 
 ### Requirement: Latest Test Case Results API
 
-The API SHALL expose `GET /api/projects/:projectId/test-cases/latest-results` returning, for every non-deleted test case of the project, the most recent embedded run result (if any). Each item SHALL include: `testCaseId`, `title`, `semanticModel`, `latestStatus` (`passed` | `failed` | `error` | `cancelled` | `running` | `pending` | `never_run`), `runId` (the TestRun containing the latest result, when present), and `finishedAt` (when present). The latest result SHALL be determined by the most recent TestRun (by `createdAt`) that contains the test case. The endpoint SHALL require admin session auth.
+The API SHALL expose `GET /api/projects/:projectId/test-cases/latest-results` returning, for every non-deleted test case of the project, the most recent embedded run result (if any). Each item SHALL include: `testCaseId`, `title`, `semanticModel`, `inputMessage` (snapshot of the case input), `latestStatus` (`passed` | `failed` | `error` | `cancelled` | `running` | `pending` | `never_run`), `runId` (the TestRun containing the latest result, when present), `finishedAt` (when present), and `unmetFacts` (array of strings — the expected facts whose latest `factResult.passed` was `false`; empty for non-failing cases). The latest result SHALL be determined by the most recent TestRun (by `createdAt`) that contains the test case. The endpoint SHALL require admin session auth.
 
-A test case SHALL be considered **failing** when its `latestStatus` is `failed` or `error`. This endpoint powers the failing-tests section of the Builder's "Improvements & Testing" panel.
+A test case SHALL be considered **failing** when its `latestStatus` is `failed` or `error`. This endpoint powers the failing-tests section of the Builder's "Improvements & Testing" panel, including the refine flow: `unmetFacts` and `inputMessage` provide everything the panel needs to build the prefill prompt without a second request. (For `error`-status cases with no recorded `factResults`, `unmetFacts` SHALL be empty and the prefill SHALL fall back to the case `inputMessage` plus the error message.)
 
 #### Scenario: Latest results across runs
 
-- **GIVEN** test case A passed in run 1 and failed in run 2 (run 2 is newer), and test case B has never been run
+- **GIVEN** test case A passed in run 1 and failed in run 2 (run 2 is newer) with one unmet expected fact, and test case B has never been run
 - **WHEN** a GET request is made to `/api/projects/:projectId/test-cases/latest-results`
-- **THEN** the response lists A with `latestStatus: "failed"` and the `runId` of run 2
-- **AND** B with `latestStatus: "never_run"` and no `runId`
+- **THEN** the response lists A with `latestStatus: "failed"`, the `runId` of run 2, and `unmetFacts` containing the unmet expected fact
+- **AND** B with `latestStatus: "never_run"`, no `runId`, and empty `unmetFacts`
 
 #### Scenario: Unauthenticated request
 
diff --git a/openspec/changes/restructure-agent-platform/tasks.md b/openspec/changes/restructure-agent-platform/tasks.md
index 2f6d7f5..9b63570 100644
--- a/openspec/changes/restructure-agent-platform/tasks.md
+++ b/openspec/changes/restructure-agent-platform/tasks.md
@@ -5,32 +5,32 @@
 ## 1. Data model & migration
 
 - [ ] 1.1 Extend `Project` model with `builderLlm` and `agentLlm` subdocuments (encryption via `ENCRYPTION_KEY`, SSRF base-URL validation lifted from `TestAgent`); unit tests for validation and encryption
-- [ ] 1.2 Remove `testAgent` from `TestCase`; make `testAgent` optional/legacy on `TestRun` and add `llmModel` snapshot field; add `playground` flag to `Conversation`
-- [ ] 1.3 Schema migration `00X-drop-test-agents`: soft-delete all `TestAgent` docs, `$unset TestCase.testAgent`, set `playground: true` on conversations that have a `testAgent` reference; migration test
+- [ ] 1.2 Remove `testAgent` from `TestCase`; make `testAgent` optional/legacy on `TestRun`, add `llmModel` snapshot field and optional `testAgentName` snapshot; add `playground` flag to `Conversation`
+- [ ] 1.3 Schema migration `00X-drop-test-agents`: **backfill `TestRun.testAgentName` from each run's `testAgent.name` first**, then soft-delete all `TestAgent` docs, `$unset TestCase.testAgent`, set `playground: true` on conversations that have a `testAgent` reference; migration test asserting legacy run names survive
 - [ ] 1.4 Delete `packages/core/src/models/TestAgent.ts` and its exports/usages
 
 ## 2. LLM settings API & resolution
 
-- [ ] 2.1 New `apps/api/src/routes/llm-settings.ts`: GET/PUT `/builder`, GET/PUT `/agent`, POST `/{builder,agent}/test-connection` (masked keys, re-encrypt on new key, SSRF re-validation); integration tests
+- [ ] 2.1 New `apps/api/src/routes/llm-settings.ts`: GET/PUT `/builder`, GET/PUT `/agent`, POST `/{builder,agent}/test-connection`. GET responses return **non-secret metadata only** (`apiKeySet`/`apiKeySource`, never a key string — masked or otherwise); re-encrypt on new key; SSRF re-validation. Add `zValidator` gates: `projectId` ObjectId param, bounded `baseUrl`/`model`/`systemPrompt`/`apiKey` body fields, clear-field semantics for builder overrides, no-body validation for test-connection. Integration tests including one asserting `AGENT_API_KEY` never appears (even partially) in any response/placeholder/log/error
 - [ ] 2.2 Builder LLM resolution helper in core: per-field `Project.builderLlm` → env (`AGENT_API_BASE_URL`/`AGENT_API_KEY`/`AGENT_MODEL`); wire into `createSemlayerAgent(projectId)` and title agent base config
-- [ ] 2.3 Update `/api/config` (or project payload) to expose per-project `builderConfigured` / `agentConfigured`
+- [ ] 2.3 Expose per-project `builderConfigured` / `agentConfigured` on **project-scoped, authenticated** surfaces (the llm-settings GETs' `configured` flag and/or `GET /api/projects/:projectId`); do **not** add per-project flags to the global `/api/config` route
 - [ ] 2.4 Remove `apps/api/src/routes/test-agents.ts` and its app mount
 
 ## 3. Single project agent (playground & test harness)
 
 - [ ] 3.1 Rework `createPlaygroundAgent` to take `projectId` and read `Project.agentLlm` (config, system prompt, all-models scope); 400 path when unconfigured
-- [ ] 3.2 Update playground routes (`chat`, `cancel`, `subscribe`, conversation list) to drop `testAgentId`; persist conversations with `playground: true`
+- [ ] 3.2 Update playground routes (`chat`, `cancel`, `subscribe`, conversation list) to drop `testAgentId`; persist conversations with `playground: true`. Partition histories: Agent endpoints filter `playground: true`, Builder endpoints filter `playground: { $ne: true }` (replace the old `testAgent: null` filter), and cross-surface loads return 404; add tests asserting playground chats do not leak into Builder history
 - [ ] 3.3 Update `apps/worker/src/processor.ts` branching (playground flag instead of `testAgentId`) and `test-runner.ts` to use the project agent; snapshot `llmModel` onto new runs
-- [ ] 3.4 Test-runs API: reject `POST /` with 400 when `agentLlm` is unconfigured; list/detail payloads expose `llmModel` (legacy agent name fallback); update tests
+- [ ] 3.4 Test-runs API: reject `POST /` with 400 when `agentLlm` is unconfigured; list/detail payloads expose `llmModel` for new runs and the `testAgentName` snapshot (falling back to "Legacy agent") for pre-migration runs **without populating the soft-deleted `testAgent`**; update tests
 - [ ] 3.5 Builder agent tools: remove `list_test_agents`, drop `testAgentId` from `create_test_case`, drop `testAgent` from `list_test_cases` output; update `semantic-model-agent.md` prompt accordingly
 
 ## 4. Agent scaffold
 
 - [ ] 4.0 Rename model storage `src/` → `data_models/`: update `SemanticModelFileService` (path constant + legacy `src/`/root fallbacks), the agent `FilesystemBackend` prompt guidance, and publish assembly; replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (idempotent startup move from `src/` or root, preserving `uploads/` and scaffold dirs); update/extend tests
-- [ ] 4.1 `.mcp.json` seeding service: create on project creation, recreate-if-missing on builder start, update on slug change, preserve foreign entries; placeholder token only; unit tests
-- [ ] 4.2 JSON syntax validation on `write_file`/`edit_file` for `.json` paths in the builder filesystem backend (mirroring YAML validation); tests
+- [ ] 4.1 `.mcp.json` seeding service: create on project creation, recreate-if-missing on builder start, update on slug change; preserve foreign entries **only when credential-safe** (warn + refuse to re-persist secret-looking entries); placeholder token only; unit tests
+- [ ] 4.2 Credential-safe JSON validation on `write_file`/`edit_file` for `.json` paths in the builder filesystem backend: syntax check (mirroring YAML validation) **plus** rejection of literal credential values in `.mcp.json` headers/env/URLs (allow only `${VAR}` placeholders); tests including the literal-Bearer-token rejection case
 - [ ] 4.3 Extend the builder system prompt with the scaffold layout and skills-over-commands guidance
-- [ ] 4.4 `GET /api/projects/:projectId/scaffold/export` zip endpoint with exclusion denylist (`.git/`, `large_tool_results/`, `uploads/`, `duckdb.db*`, temp files); integration test asserting exclusions and absence of secret material
+- [ ] 4.4 `GET /api/projects/:projectId/scaffold/export` zip endpoint with `projectId` param validation; runtime denylist (`.git/`, `large_tool_results/`, `uploads/`, `duckdb.db*`, temp files) **plus** secret-file exclusions (`.env*`, `.npmrc`, `.pypirc`, `.netrc`, `.git-credentials`, `id_*`, `*.pem`/`*.key`/`*.p12`/`*.pfx`, `credentials*`) and a content scan that **fails the export closed** on secret patterns; integration tests asserting exclusions, the fail-closed path, and absence of secret material
 - [ ] 4.5 Ensure scaffold directories are covered by Git versioning (review `git.ts` ignore rules) and publish flow
 
 ## 5. Navigation & settings UI
@@ -47,8 +47,8 @@
 ## 7. Builder page restructure
 
 - [ ] 7.1 Side panel sections: Agent Scaffold (Data Models list + Publish + Export action; disabled API Models w/ "soon" tag), Build (renamed Chat), Improvements & Testing
-- [ ] 7.2 `GET /api/projects/:projectId/test-cases/latest-results` endpoint with tests
-- [ ] 7.3 Improvements & Testing panel: failing-test entries (icon, link to latest run, refine prefill affordance), combined pending-count badge, updated empty state
+- [ ] 7.2 `GET /api/projects/:projectId/test-cases/latest-results` endpoint (`projectId` param validation) returning per-case `inputMessage` and `unmetFacts` (unmet expected facts from the latest result) so the panel can build the refine prefill without a second request; tests
+- [ ] 7.3 Improvements & Testing panel: failing-test entries (icon, link to latest run, refine prefill affordance built from `inputMessage` + `unmetFacts`), combined pending-count badge, updated empty state
 - [ ] 7.4 Agent page (`agent.tsx`): playground chat without agent selector, history panel, unconfigured empty state linking to Settings → Agent
 - [ ] 7.5 Update the configuration-missing banner to be project-aware and link to the settings pages
 

From d16b290449bfbfcb08dbeaf63a596dd0398f1dae Mon Sep 17 00:00:00 2001
From: Tobias Grosse-Puppendahl <tobias@grosse-puppendahl.de>
Date: Thu, 11 Jun 2026 10:46:48 +0200
Subject: [PATCH 4/5] docs(openspec): remove the build step; serve MCP from
 live data_models/

The disk build step (PublishService.assemble -> build/) existed only as the
publish boundary for production MCP. Since MCP tools surface models as
markdown (SemanticModelDigest) and assemble in memory on demand, the full
inlined YAML artifact is unnecessary.

- Remove the Build Assembly requirement; production and testing MCP both
  read the live data_models/ via in-memory assembly (no build/, no gate).
- Publish becomes pure Git versioning (commit + PublishEvent + push);
  reframe publish status/dialog away from "make available via MCP".
- Add a startup migration that deletes stale build/ directories; drop
  build/ from the .gitignore template.
- Add semantic-model-publishing, mcp-server, and project-git-versioning
  deltas; update proposal/design/tasks.

Co-authored-by: Cursor <cursoragent@cursor.com>
---
 .../restructure-agent-platform/design.md      |  18 ++-
 .../restructure-agent-platform/proposal.md    |  14 ++-
 .../specs/mcp-server/spec.md                  |  39 +++++++
 .../specs/project-git-versioning/spec.md      |  18 +++
 .../specs/semantic-model-publishing/spec.md   | 107 ++++++++++++++++++
 .../specs/semantic-models/spec.md             |  30 ++++-
 .../restructure-agent-platform/tasks.md       |   7 +-
 7 files changed, 222 insertions(+), 11 deletions(-)
 create mode 100644 openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/project-git-versioning/spec.md
 create mode 100644 openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md

diff --git a/openspec/changes/restructure-agent-platform/design.md b/openspec/changes/restructure-agent-platform/design.md
index 647f3ac..fa9c216 100644
--- a/openspec/changes/restructure-agent-platform/design.md
+++ b/openspec/changes/restructure-agent-platform/design.md
@@ -62,6 +62,17 @@ Per-project agent/builder *configured* state is exposed on project-scoped, authe
 
 A test case is *failing* when the most recent `TestRun` embedded result for it has status `failed` or `error`. A new endpoint aggregates this (`latest-results`), powering the **Improvements & Testing** panel. No new persistent state is introduced — the registry is derived from existing `TestRun` data, so it can never drift.
 
+### D8 — Remove the build step; serve MCP from live `data_models/`
+
+The disk **build step** (`PublishService.assemble()` writing fully-inlined single-file YAMLs to `build/`) is removed entirely. It existed only as the publish boundary so production MCP served a last-published snapshot; everything else (builder agent, playground, test-runner, REST API, model overviews) already reads source directly. MCP model tools surface compact **markdown** via `SemanticModelDigest`, and `execute_query`/`get` assemble a model in memory on demand — none of them need a materialized full-YAML artifact. So:
+
+- Production and testing MCP both read the current `data_models/` via in-memory assembly (the path the test route already uses). There is no `build/` directory and no published-snapshot gate; saved models are immediately visible to MCP.
+- **Publish becomes pure Git versioning**: ensure repo → pull → stage `data_models/`+scaffold → commit → record `PublishEvent` → optional push. `hasUnpublishedChanges` now means "uncommitted changes vs the last `PublishEvent`", a versioning signal rather than an MCP-availability gate. The publish UI copy is reframed accordingly (version/share the scaffold, not "make available via MCP").
+- `PublishService.assemble()`/`cleanStaleFiles()` and the `build/` read in the MCP route are deleted; `computeSourceHash()` is retained but hashes `data_models/` (+ scaffold) source only.
+- A **startup migration** removes any existing `build/` directory from project dirs (mirrors the existing startup `AGENTS.md` cleanup). `.gitignore` no longer needs to exclude `build/`.
+
+- *Alternative considered:* keep a publish gate by reading models from the last committed Git tree — rejected; it reintroduces a snapshot indirection for no benefit now that tools consume markdown/in-memory assembly, and live source matches the "semantic process layer" model where the project dir *is* the deliverable.
+
 ## Risks / Trade-offs
 
 - **Large rename surface** (routes, labels, docs, e2e selectors) → mitigated by keeping all backend route prefixes except `test-agents` stable, and adding redirects for moved frontend routes.
@@ -74,9 +85,10 @@ A test case is *failing* when the most recent `TestRun` embedded result for it h
 
 1. Ship schema migration `00X-drop-test-agents`: backfill `TestRun.testAgentName` from each run's `testAgent.name`, then soft-delete all `TestAgent` docs and `$unset` `TestCase.testAgent`; otherwise leave `TestRun` documents intact.
 2. Ship filesystem migration `migrate-data-models-layout.ts` (replaces `migrate-src-layout.ts`): on startup, for any project dir lacking `data_models/`, move model YAMLs into `data_models/` from the legacy `src/` directory (or, for very old projects, from the project root). `uploads/` and scaffold dirs are left in place. Idempotent and safe to re-run.
-3. Seed `.mcp.json` for existing projects lazily (on first builder-agent start or settings save) and for new projects on creation.
-4. Frontend redirects: `/testing/playground → /agent`, `/testing/agents → /settings/agent`, `/connections/data → /connections?tool=browser`, `/connections/console → /connections?tool=console`, `/data → /connections?tool=browser`.
-5. Rollback: the TestAgent migration is destructive by user decision; the `data_models/` move is reversible by moving files back to `src/`. Rollback otherwise restores routes/UI only.
+3. Ship startup `build/` cleanup: on startup, recursively remove any `build/` directory under each project dir (idempotent; mirrors the existing startup `AGENTS.md` cleanup). Drop `build/` from the `.gitignore` template.
+4. Seed `.mcp.json` for existing projects lazily (on first builder-agent start or settings save) and for new projects on creation.
+5. Frontend redirects: `/testing/playground → /agent`, `/testing/agents → /settings/agent`, `/connections/data → /connections?tool=browser`, `/connections/console → /connections?tool=console`, `/data → /connections?tool=browser`.
+6. Rollback: the TestAgent migration is destructive by user decision; the `data_models/` move is reversible by moving files back to `src/`; the `build/` cleanup is non-destructive to source (build was a derived artifact). Rollback otherwise restores routes/UI only.
 
 ## Open Questions
 
diff --git a/openspec/changes/restructure-agent-platform/proposal.md b/openspec/changes/restructure-agent-platform/proposal.md
index 9033453..68d950d 100644
--- a/openspec/changes/restructure-agent-platform/proposal.md
+++ b/openspec/changes/restructure-agent-platform/proposal.md
@@ -38,6 +38,13 @@ archmax is repositioning from "a tool that manages semantic descriptions of data
 
 - The project directory is formalized as a plugin-style **agent filesystem** authored directly by the builder agent (no generation pipeline): `commands/`, `agents/`, `skills/<name>/SKILL.md`, `hooks/hooks.json`, `scripts/`, and `.mcp.json`, alongside `AGENTS.md` and a dedicated **`data_models/`** directory for the semantic model YAML files.
 - Semantic model files move from the current `src/` directory into `data_models/` (file service, agent backend, and publish assembly updated; a startup migration relocates existing `src/` content). This matches the "Data Models" product label and reserves a future `api_models/` sibling.
+
+### Build step removal (**BREAKING**)
+
+- The disk **build step** (`PublishService.assemble()` → `build/` single-file YAMLs) is removed entirely. MCP tools surface models as **markdown** (`SemanticModelDigest`) and assemble in memory on demand, so the materialized full-YAML artifact is unnecessary.
+- Production MCP now reads the **live `data_models/`** (in-memory assembly, the same path the test endpoint already uses) instead of a published `build/` snapshot — models are available via MCP as soon as they are saved.
+- **Publish becomes pure Git versioning**: commit `data_models/`+scaffold, record a `PublishEvent`, optionally push. `hasUnpublishedChanges` becomes a "pending version-control changes" signal; publish UI copy is reframed away from "make available via MCP".
+- A **startup migration** removes any existing `build/` directory; `.gitignore` no longer excludes `build/`.
 - `.mcp.json` is seeded and maintained by the platform, pointing at the project's MCP endpoint with an env-var token placeholder (never a real token).
 - The builder's file backend gains JSON syntax validation on write (mirroring the existing YAML validation).
 - A scaffold export endpoint (`GET /api/projects/:projectId/scaffold/export`) downloads the scaffold as a zip for use in external Deep-Agents-compatible harnesses; an Export action is available in the Agent Scaffold panel. The existing LangChain Deep Agents playground/test-runner remains the built-in test harness.
@@ -48,14 +55,15 @@ archmax is repositioning from "a tool that manages semantic descriptions of data
 
 ## Impact
 
-- Affected specs: `frontend-shell`, `connection-management-ui`, `data-browser`, `duckdb-console`, `testing-suite`, `semantic-model-agent`, `semantic-models`, `project-management`, `home-dashboard`, and new capability `agent-scaffold`.
+- Affected specs: `frontend-shell`, `connection-management-ui`, `data-browser`, `duckdb-console`, `testing-suite`, `semantic-model-agent`, `semantic-models`, `semantic-model-publishing`, `mcp-server`, `project-git-versioning`, `project-management`, `home-dashboard`, and new capability `agent-scaffold`.
 - Affected code:
   - `apps/frontend/src/components/layout/app-sidebar.tsx` (nav restructure)
   - `apps/frontend/src/routes/_auth/$projectId/` — `connections/*`, `models.tsx`, `testing/*`, new `agent.tsx`, `settings*` (route moves, overlay dialogs, panel restructure)
   - `packages/core/src/models/` — remove `TestAgent.ts`, edit `Project.ts`, `TestCase.ts`, `TestRun.ts`, `Conversation.ts`
   - `apps/api/src/routes/` — remove `test-agents.ts`; add `llm-settings.ts`, `scaffold.ts`; edit `test-cases.ts` (latest-results), `test-runs.ts`, `playground.ts`, `projects.ts` (per-project `builder/agentConfigured`; not the global `config` route)
-  - `packages/core/src/services/` — `agent.ts`, `playground-agent.ts`, `test-runner.ts`, `agent-tools.ts`, `git.ts` (scaffold ignore rules), `SemanticModelFileService` (`src/` → `data_models/`), filesystem backend validation
-  - `apps/api/src/scripts/` — replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (`src/` → `data_models/`)
+  - `packages/core/src/services/` — `agent.ts`, `playground-agent.ts`, `test-runner.ts`, `agent-tools.ts`, `git.ts` (drop `build/` from `.gitignore`, scaffold ignore rules), `SemanticModelFileService` (`src/` → `data_models/`), `publish.ts` (remove `assemble()`/`cleanStaleFiles()`, keep `computeSourceHash()` over source), filesystem backend validation
+  - `apps/api/src/mcp/archmax-route.ts` (read live `data_models/` in both prod and test routes; drop `build/` read and temp-assembly), `apps/api/src/utils/publish-flow.ts` (no assemble before commit)
+  - `apps/api/src/scripts/` — replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (`src/` → `data_models/`); startup `build/` cleanup
   - `apps/worker/src/processor.ts` (playground branching without testAgentId)
   - Schema migration (backfill `TestRun.testAgentName`, then drop TestAgents, unset `TestCase.testAgent`)
   - `apps/docs` (navigation, testing, settings, new agent-scaffold guide)
diff --git a/openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md b/openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md
new file mode 100644
index 0000000..5d6b553
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md
@@ -0,0 +1,39 @@
+## MODIFIED Requirements
+
+### Requirement: Semantic Layer Tools
+
+The MCP server SHALL expose the following tools for AI agent consumption. The MCP server SHALL identify itself as `"archmax"` in the server name field. All tools operate within the scope of the project identified by the URL slug — `projectId` is no longer a tool parameter. Tools that return semantic model data SHALL filter results based on the authenticated token's `scopes` array. Tools SHALL read semantic model data from the project's current `data_models/` source via in-memory assembly (`SemanticModelFileService.get()`), surfacing compact **markdown** through `SemanticModelDigest`; there is no `build/` artifact and no published-snapshot gate. Both the production and testing MCP endpoints SHALL read the same live `data_models/` source — the MCP tool registration, digest generation, and scope filtering code SHALL be shared between both modes with no conditional branches. If a project has no semantic models (the `data_models/` directory is empty or missing), model-related tools SHALL return an informational message indicating that the project has no semantic models yet.
+
+- `list_connections` — List all active connections for the project
+- `list_semantic_models` — List semantic models the token has access to (filtered by scopes, read from `data_models/`)
+- `get_semantic_model_overview` — Get a compact overview of an accessible semantic model (markdown digest of `data_models/` source)
+- `get_dataset_fields` — Get fields for a dataset within an accessible semantic model (read from `data_models/`)
+
+#### Scenario: List semantic models filtered by token scope
+
+- **WHEN** `list_semantic_models` is called with a token scoped to `["shopify"]`
+- **AND** the project has models `shopify`, `datev`, and `hrworks` in `data_models/`
+- **THEN** only the `shopify` model summary is returned
+
+#### Scenario: Access denied for out-of-scope model
+
+- **WHEN** `get_semantic_model_overview` is called for model `datev`
+- **AND** the token's scopes are `["shopify"]`
+- **THEN** an error content response with `isError: true` is returned indicating access denied
+
+#### Scenario: Get dataset fields respects token scope
+
+- **WHEN** `get_dataset_fields` is called for a dataset in an accessible model
+- **THEN** the fields are returned normally from the assembled `data_models/` data
+
+#### Scenario: No semantic models exist
+
+- **WHEN** `list_semantic_models` is called and the project's `data_models/` directory is empty or missing
+- **THEN** a text response is returned indicating that the project has no semantic models yet
+
+#### Scenario: Production and testing endpoints serve live source
+
+- **WHEN** a tool is called through either the production or the testing MCP endpoint
+- **THEN** the current `data_models/` files are assembled on-the-fly in memory
+- **AND** the same tool code, digest logic, and scope filtering is used for both endpoints
+- **AND** the result reflects the latest saved source state
diff --git a/openspec/changes/restructure-agent-platform/specs/project-git-versioning/spec.md b/openspec/changes/restructure-agent-platform/specs/project-git-versioning/spec.md
new file mode 100644
index 0000000..12a326e
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/project-git-versioning/spec.md
@@ -0,0 +1,18 @@
+## MODIFIED Requirements
+
+### Requirement: Automatic Repository Initialization
+
+Each project's data directory (`<ARCHMAX_DATA_DIR>/projects/<projectId>/`) SHALL be a Git repository. If the `.git` directory does not exist when a Git operation is attempted, the system SHALL initialize it with `git init`, create a `.gitignore` (excluding internal/runtime entries — `large_tool_results/`, `duckdb.db` and its side files, and temp artifacts), and create an initial commit with all existing files. The `.gitignore` SHALL NOT exclude `build/` (the build step is removed and no such directory is produced); `data_models/` and the agent-scaffold directories are versioned as source.
+
+#### Scenario: New project gets a Git repo
+
+- **WHEN** a new project is created
+- **THEN** the project directory is initialized as a Git repository
+- **AND** a `.gitignore` file is created excluding `large_tool_results/`, `duckdb.db*`, and `.*tmp` patterns (and not `build/`)
+- **AND** an initial commit is created if any files exist
+
+#### Scenario: Existing project without Git repo (migration)
+
+- **WHEN** a publish or sync is attempted on a project that lacks a `.git` directory
+- **THEN** the system initializes the repository with all existing files as an initial commit
+- **AND** subsequent operations proceed normally
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md
new file mode 100644
index 0000000..79eb3ba
--- /dev/null
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md
@@ -0,0 +1,107 @@
+## MODIFIED Requirements
+
+### Requirement: Publish Event Model
+
+The system SHALL provide a `PublishEvent` Mongoose model with: `project` (ObjectId ref to Project, required, indexed), `message` (string, required — user-provided publish message), `modelNames` (string array — names of models included in the publish), `contentHash` (string — SHA-256 hash of the project's source content at time of publish, computed over `data_models/` and the agent-scaffold files, excluding derived/internal entries), `createdAt` (Date, auto), `updatedAt` (Date, auto).
+
+#### Scenario: Publish event is recorded
+
+- **WHEN** a user publishes semantic models with message "Added customer lifetime value metric"
+- **THEN** a `PublishEvent` document is created with the message, the list of model names, and the content hash
+
+#### Scenario: Query publish history
+
+- **WHEN** the publish history for a project is queried
+- **THEN** events are returned sorted by `createdAt` descending
+
+### Requirement: Publish API
+
+The API SHALL expose a POST endpoint at `/api/projects/:projectId/publish` that accepts `{ message: string }`. Publishing is **Git versioning only** — it does not assemble or write any build artifact. The endpoint SHALL: validate the message is non-empty, ensure the project directory is a Git repository (lazy init if needed), pull/merge upstream changes if a remote is configured (abort on conflicts), stage all changes and create a Git commit with the user-provided message, create a `PublishEvent`, and push to the remote if configured. The endpoint SHALL return the created `PublishEvent` along with any push warnings. Conflict errors SHALL return a 409 status with the list of conflicted file paths.
+
+#### Scenario: Successful publish (local only)
+
+- **WHEN** a POST request is made with `{ message: "Release v2 with new metrics" }` on a project without a remote
+- **THEN** the current `data_models/` and scaffold files are staged and committed via `isomorphic-git` (no `build/` is produced)
+- **AND** a `PublishEvent` is created
+- **AND** the response includes the event with status 201
+
+#### Scenario: Publish with upstream sync and push
+
+- **WHEN** a publish is triggered on a project with a configured remote
+- **THEN** upstream changes are pulled and merged first
+- **AND** local changes are committed
+- **AND** the commit is pushed to the remote
+- **AND** a `PublishEvent` is created
+
+#### Scenario: Publish aborted due to merge conflicts
+
+- **WHEN** a publish is triggered but the upstream pull results in merge conflicts
+- **THEN** a 409 error is returned with `{ conflicts: true, files: [...] }`
+- **AND** no commit or `PublishEvent` is created
+- **AND** the conflicted files remain on disk with conflict markers for manual resolution
+
+#### Scenario: Publish with empty message
+
+- **WHEN** a POST request is made with an empty or missing message
+- **THEN** a 400 error is returned
+
+#### Scenario: Push failure does not block publish
+
+- **WHEN** a publish is triggered and the commit succeeds but the push to remote fails
+- **THEN** the local commit and `PublishEvent` creation still succeed
+- **AND** the response includes a warning about the push failure with the specific error message
+
+### Requirement: Publish Status API
+
+The API SHALL expose a GET endpoint at `/api/projects/:projectId/publish/status` that returns `{ hasUnpublishedChanges: boolean, lastPublishedAt: string | null, lastMessage: string | null, hasConflicts: boolean }`. Unpublished (uncommitted) changes are detected by comparing a SHA-256 hash of the current source content (`data_models/` + scaffold files, excluding derived/internal entries) against the `contentHash` of the most recent `PublishEvent`. The `hasConflicts` field SHALL be `true` if any YAML file in `data_models/` contains Git conflict markers. Because MCP serves the live `data_models/`, `hasUnpublishedChanges` reflects pending **version-control** changes, not MCP availability.
+
+#### Scenario: No previous publish
+
+- **WHEN** status is requested for a project that has never been published
+- **AND** source models exist
+- **THEN** `hasUnpublishedChanges` is `true`, `lastPublishedAt` is `null`, `lastMessage` is `null`
+
+#### Scenario: No changes since last publish
+
+- **WHEN** status is requested and the current source hash matches the last publish event's `contentHash`
+- **THEN** `hasUnpublishedChanges` is `false`
+
+#### Scenario: Changes exist since last publish
+
+- **WHEN** status is requested and the current source hash differs from the last publish event's `contentHash`
+- **THEN** `hasUnpublishedChanges` is `true`
+
+#### Scenario: No source models exist
+
+- **WHEN** status is requested for a project with no source models
+- **THEN** `hasUnpublishedChanges` is `false`
+
+#### Scenario: Conflict markers detected
+
+- **WHEN** status is requested and a YAML file in `data_models/` contains `<<<<<<<` conflict markers
+- **THEN** `hasConflicts` is `true`
+
+### Requirement: Publish Overlay Dialog
+
+Clicking the publish button SHALL open a modal overlay with: a textarea for the publish message, a "Publish" confirmation button, and a "Cancel" button. The dialog copy SHALL frame publishing as committing a versioned snapshot of the project (and pushing it to the connected repository when configured), not as a gate for MCP availability. The confirmation button SHALL be disabled until the message is non-empty. On confirmation, the publish API is called and the dialog closes on success.
+
+#### Scenario: Open publish dialog
+
+- **WHEN** the user clicks the enabled publish button
+- **THEN** a modal overlay appears with a message textarea, a "Publish" button, and a "Cancel" button
+
+#### Scenario: Submit publish
+
+- **WHEN** the user enters a message and clicks "Publish"
+- **THEN** the publish API is called with the entered message
+- **AND** on success, the dialog closes, a success toast is shown, and the publish button becomes disabled
+
+#### Scenario: Cancel publish
+
+- **WHEN** the user clicks "Cancel" in the publish dialog
+- **THEN** the dialog closes without making any API call
+
+#### Scenario: Publish button disabled during submission
+
+- **WHEN** the publish API call is in progress
+- **THEN** the "Publish" button in the dialog shows a loading state and is not clickable
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
index 3be0350..660e452 100644
--- a/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-models/spec.md
@@ -1,3 +1,10 @@
+## REMOVED Requirements
+
+### Requirement: Build Assembly
+
+**Reason**: The disk build step is removed. `PublishService.assemble()` materialized fully-inlined single-file YAMLs under `build/` solely so production MCP could serve a last-published snapshot. MCP now reads the current `data_models/` directly (in-memory assembly + markdown digests), so no `build/` artifact is needed. See `mcp-server` and `semantic-model-publishing` deltas.
+**Migration**: A startup cleanup (see "Build Directory Cleanup" below) removes any existing `build/` directory. `PublishService.assemble()`/`cleanStaleFiles()` are deleted; in-memory assembly continues via `SemanticModelFileService.get()`/`getRawYaml()`.
+
 ## RENAMED Requirements
 
 - FROM: `### Requirement: Improvements UI in Semantic Models Sidebar`
@@ -174,10 +181,10 @@ All file listing operations SHALL exclude entries whose names start with `.` (do
 - **WHEN** the agent lists files in the project directory
 - **THEN** `.git/` and other dotfiles/dotdirs are not included in the listing
 
-#### Scenario: Publish assembly excludes dotfiles
+#### Scenario: Publish file collection excludes dotfiles
 
-- **WHEN** the publish build assembly processes the project directory
-- **THEN** `.git/` contents are not included in the build output
+- **WHEN** the publish file collection (`collectFiles`) processes the project directory
+- **THEN** `.git/` contents are not included in the committed snapshot
 
 ### Requirement: Merge Conflict Detection in YAML
 
@@ -194,3 +201,20 @@ The `SemanticModelFileService` SHALL detect Git merge conflict markers (`<<<<<<<
 - **WHEN** `get("sales")` is called and the file contains conflict markers
 - **THEN** the response includes `hasConflicts: true` and the raw YAML content
 - **AND** parsed fields (datasets, relationships, metrics) are empty or absent
+
+## ADDED Requirements
+
+### Requirement: Build Directory Cleanup
+
+On application startup, the system SHALL remove any `build/` directory found at the root of a project's data directory (`<ARCHMAX_DATA_DIR>/projects/<projectId>/build/`). The former build step that produced these directories is removed (see the REMOVED "Build Assembly" requirement); the directory contained only derived single-file YAMLs and holds no source of record. The cleanup SHALL be idempotent and SHALL run per existing project directory. Source files under `data_models/`, `uploads/`, and the agent-scaffold directories SHALL NOT be affected.
+
+#### Scenario: Stale build directory removed on startup
+
+- **WHEN** the app starts and a project directory contains a `build/` directory
+- **THEN** the `build/` directory is recursively deleted
+- **AND** `data_models/`, `uploads/`, and scaffold directories remain untouched
+
+#### Scenario: Cleanup is idempotent
+
+- **WHEN** the app starts and a project directory has no `build/` directory
+- **THEN** no action is taken and no error is raised
diff --git a/openspec/changes/restructure-agent-platform/tasks.md b/openspec/changes/restructure-agent-platform/tasks.md
index 9b63570..a5e3a37 100644
--- a/openspec/changes/restructure-agent-platform/tasks.md
+++ b/openspec/changes/restructure-agent-platform/tasks.md
@@ -26,7 +26,10 @@
 
 ## 4. Agent scaffold
 
-- [ ] 4.0 Rename model storage `src/` → `data_models/`: update `SemanticModelFileService` (path constant + legacy `src/`/root fallbacks), the agent `FilesystemBackend` prompt guidance, and publish assembly; replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (idempotent startup move from `src/` or root, preserving `uploads/` and scaffold dirs); update/extend tests
+- [ ] 4.0 Rename model storage `src/` → `data_models/`: update `SemanticModelFileService` (path constant + legacy `src/`/root fallbacks) and the agent `FilesystemBackend` prompt guidance; replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (idempotent startup move from `src/` or root, preserving `uploads/` and scaffold dirs); update/extend tests
+- [ ] 4.0a Remove the build step: delete `PublishService.assemble()`/`cleanStaleFiles()`; point both production and testing MCP routes (`archmax-route.ts`) at the live `data_models/` via in-memory `SemanticModelFileService.get()` (drop the `build/` read and temp-assembly); update `archmax-route`/`mcp-tools` empty-state message; remove the `build/` read path; update MCP tests
+- [ ] 4.0b Publish = Git versioning only: remove `assemble()` from `finalizePublish` (`publish-flow.ts`) and the revert path (`git.ts`); keep/retarget `computeSourceHash()` to hash `data_models/` + scaffold source (exclude derived/internal); drop `build/` from `DEFAULT_GITIGNORE`; reframe publish dialog copy (version/share, not "make available via MCP"); update publish + git tests
+- [ ] 4.0c Startup `build/` cleanup: idempotently remove any `build/` directory under each project dir on startup (mirror the existing `AGENTS.md` cleanup); test
 - [ ] 4.1 `.mcp.json` seeding service: create on project creation, recreate-if-missing on builder start, update on slug change; preserve foreign entries **only when credential-safe** (warn + refuse to re-persist secret-looking entries); placeholder token only; unit tests
 - [ ] 4.2 Credential-safe JSON validation on `write_file`/`edit_file` for `.json` paths in the builder filesystem backend: syntax check (mirroring YAML validation) **plus** rejection of literal credential values in `.mcp.json` headers/env/URLs (allow only `${VAR}` placeholders); tests including the literal-Bearer-token rejection case
 - [ ] 4.3 Extend the builder system prompt with the scaffold layout and skills-over-commands guidance
@@ -36,7 +39,7 @@
 ## 5. Navigation & settings UI
 
 - [ ] 5.1 Restructure `app-sidebar.tsx`: Connections group (Data Sources + disabled APIs w/ "soon" tag), Builder leaf, Agent leaf, Testing group (Cases, Runs), Settings group (General, Builder, Agent)
-- [ ] 5.2 New routes `settings/builder.tsx` and `settings/agent.tsx` (inline label–input grids, masked key fields, env-default placeholders for builder, save-then-test connection buttons, reset-to-defaults for builder); move existing settings page to General
+- [ ] 5.2 New routes `settings/builder.tsx` and `settings/agent.tsx` (inline label–input grids; API-key fields show a fixed non-derived hint from `apiKeySet`/`apiKeySource`, never key material; env-default placeholders for builder baseUrl/model only; save-then-test connection buttons; reset-to-defaults for builder); move existing settings page to General
 - [ ] 5.3 Route moves & redirects: `testing/playground → /agent`, `testing/agents → /settings/agent`, `connections/data` + legacy `/data` → `/connections?tool=browser`, `connections/console → /connections?tool=console`; delete the Test Agents page and `testing/agents.tsx`
 
 ## 6. Data Sources header tools

From 2b88e5fe2f9c158b3d2c04d76efed7569a5b7fb4 Mon Sep 17 00:00:00 2001
From: Tobias Grosse-Puppendahl <tobias@grosse-puppendahl.de>
Date: Thu, 11 Jun 2026 11:03:51 +0200
Subject: [PATCH 5/5] docs(openspec): keep production MCP gated by publishing
 (Git HEAD)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reverse the "serve live data_models/" decision: production MCP must only
expose explicitly published models. The build artifact stays removed, but
the publish gate is now the Git commit rather than a build/ directory —
production reads the committed data_models/ from Git HEAD in memory, while
the testing endpoint reads the live working directory. Restores the
"publish from admin UI" empty state and the "available via MCP" publish copy.

Co-authored-by: Cursor <cursoragent@cursor.com>
---
 .../restructure-agent-platform/design.md      | 14 ++++----
 .../restructure-agent-platform/proposal.md    |  6 ++--
 .../specs/mcp-server/spec.md                  | 35 +++++++++++--------
 .../specs/semantic-model-publishing/spec.md   |  6 ++--
 .../restructure-agent-platform/tasks.md       |  4 +--
 5 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/openspec/changes/restructure-agent-platform/design.md b/openspec/changes/restructure-agent-platform/design.md
index fa9c216..e130835 100644
--- a/openspec/changes/restructure-agent-platform/design.md
+++ b/openspec/changes/restructure-agent-platform/design.md
@@ -62,16 +62,18 @@ Per-project agent/builder *configured* state is exposed on project-scoped, authe
 
 A test case is *failing* when the most recent `TestRun` embedded result for it has status `failed` or `error`. A new endpoint aggregates this (`latest-results`), powering the **Improvements & Testing** panel. No new persistent state is introduced — the registry is derived from existing `TestRun` data, so it can never drift.
 
-### D8 — Remove the build step; serve MCP from live `data_models/`
+### D8 — Remove the build artifact, but keep the publish gate via Git HEAD
 
-The disk **build step** (`PublishService.assemble()` writing fully-inlined single-file YAMLs to `build/`) is removed entirely. It existed only as the publish boundary so production MCP served a last-published snapshot; everything else (builder agent, playground, test-runner, REST API, model overviews) already reads source directly. MCP model tools surface compact **markdown** via `SemanticModelDigest`, and `execute_query`/`get` assemble a model in memory on demand — none of them need a materialized full-YAML artifact. So:
+The disk **build step** (`PublishService.assemble()` writing fully-inlined single-file YAMLs to `build/`) is removed entirely. It only ever materialized a derived snapshot; MCP model tools surface compact **markdown** via `SemanticModelDigest`, and `execute_query`/`get` assemble a model in memory on demand — none of them need a materialized full-YAML artifact. **Production MCP remains gated by publishing**, but the gate is the Git commit rather than a `build/` directory:
 
-- Production and testing MCP both read the current `data_models/` via in-memory assembly (the path the test route already uses). There is no `build/` directory and no published-snapshot gate; saved models are immediately visible to MCP.
-- **Publish becomes pure Git versioning**: ensure repo → pull → stage `data_models/`+scaffold → commit → record `PublishEvent` → optional push. `hasUnpublishedChanges` now means "uncommitted changes vs the last `PublishEvent`", a versioning signal rather than an MCP-availability gate. The publish UI copy is reframed accordingly (version/share the scaffold, not "make available via MCP").
-- `PublishService.assemble()`/`cleanStaleFiles()` and the `build/` read in the MCP route are deleted; `computeSourceHash()` is retained but hashes `data_models/` (+ scaffold) source only.
+- **Production MCP serves the last published state** = the project repo's latest commit (HEAD). Tools assemble models in memory from the committed `data_models/` tree (read via `isomorphic-git`), so uncommitted working-directory edits are NOT exposed. There is no `build/` artifact.
+- **Testing MCP serves the live working directory** `data_models/` (in-memory assembly), so it reflects the latest unpublished edits. The tool registration, digest generation, and scope filtering code stay shared between both modes; only the source (committed tree vs working dir) differs.
+- **Publish = Git commit (the gate)**: ensure repo → pull → stage `data_models/`+scaffold → commit → record `PublishEvent` → optional push. Committing is what makes models visible to production MCP. `hasUnpublishedChanges` means "working-directory `data_models/` differs from HEAD/last `PublishEvent`" — i.e. there are models not yet published to MCP. Publish UI copy keeps the "make available via MCP" meaning.
+- `PublishService.assemble()`/`cleanStaleFiles()` and the `build/` read in the MCP route are deleted; production MCP instead reads the committed tree. `computeSourceHash()` is retained but hashes `data_models/` (+ scaffold) source only.
 - A **startup migration** removes any existing `build/` directory from project dirs (mirrors the existing startup `AGENTS.md` cleanup). `.gitignore` no longer needs to exclude `build/`.
 
-- *Alternative considered:* keep a publish gate by reading models from the last committed Git tree — rejected; it reintroduces a snapshot indirection for no benefit now that tools consume markdown/in-memory assembly, and live source matches the "semantic process layer" model where the project dir *is* the deliverable.
+- *Alternative considered:* serve production MCP from the live working dir (no gate) — rejected by product requirement; production must only expose explicitly published (committed) models. Reading the committed tree preserves that gate while still eliminating the derived `build/` artifact.
+- *Alternative considered:* keep writing a `build/` snapshot on publish — rejected; the materialized full-YAML artifact is redundant now that tools consume markdown/in-memory assembly, and the Git commit already is the publish boundary.
 
 ## Risks / Trade-offs
 
diff --git a/openspec/changes/restructure-agent-platform/proposal.md b/openspec/changes/restructure-agent-platform/proposal.md
index 68d950d..9eaad2c 100644
--- a/openspec/changes/restructure-agent-platform/proposal.md
+++ b/openspec/changes/restructure-agent-platform/proposal.md
@@ -42,8 +42,8 @@ archmax is repositioning from "a tool that manages semantic descriptions of data
 ### Build step removal (**BREAKING**)
 
 - The disk **build step** (`PublishService.assemble()` → `build/` single-file YAMLs) is removed entirely. MCP tools surface models as **markdown** (`SemanticModelDigest`) and assemble in memory on demand, so the materialized full-YAML artifact is unnecessary.
-- Production MCP now reads the **live `data_models/`** (in-memory assembly, the same path the test endpoint already uses) instead of a published `build/` snapshot — models are available via MCP as soon as they are saved.
-- **Publish becomes pure Git versioning**: commit `data_models/`+scaffold, record a `PublishEvent`, optionally push. `hasUnpublishedChanges` becomes a "pending version-control changes" signal; publish UI copy is reframed away from "make available via MCP".
+- **Production MCP stays gated by publishing**, but the gate is the Git commit, not a `build/` snapshot: production reads the published `data_models/` from the repo's latest commit (Git HEAD) in memory, so uncommitted edits are not exposed. The testing endpoint reads the live working-directory `data_models/`.
+- **Publish = Git commit** (the gate): commit `data_models/`+scaffold, record a `PublishEvent`, optionally push. `hasUnpublishedChanges` means saved models not yet committed/available via production MCP; publish UI keeps the "make available via MCP" meaning.
 - A **startup migration** removes any existing `build/` directory; `.gitignore` no longer excludes `build/`.
 - `.mcp.json` is seeded and maintained by the platform, pointing at the project's MCP endpoint with an env-var token placeholder (never a real token).
 - The builder's file backend gains JSON syntax validation on write (mirroring the existing YAML validation).
@@ -62,7 +62,7 @@ archmax is repositioning from "a tool that manages semantic descriptions of data
   - `packages/core/src/models/` — remove `TestAgent.ts`, edit `Project.ts`, `TestCase.ts`, `TestRun.ts`, `Conversation.ts`
   - `apps/api/src/routes/` — remove `test-agents.ts`; add `llm-settings.ts`, `scaffold.ts`; edit `test-cases.ts` (latest-results), `test-runs.ts`, `playground.ts`, `projects.ts` (per-project `builder/agentConfigured`; not the global `config` route)
   - `packages/core/src/services/` — `agent.ts`, `playground-agent.ts`, `test-runner.ts`, `agent-tools.ts`, `git.ts` (drop `build/` from `.gitignore`, scaffold ignore rules), `SemanticModelFileService` (`src/` → `data_models/`), `publish.ts` (remove `assemble()`/`cleanStaleFiles()`, keep `computeSourceHash()` over source), filesystem backend validation
-  - `apps/api/src/mcp/archmax-route.ts` (read live `data_models/` in both prod and test routes; drop `build/` read and temp-assembly), `apps/api/src/utils/publish-flow.ts` (no assemble before commit)
+  - `apps/api/src/mcp/archmax-route.ts` (production reads committed `data_models/` from Git HEAD; testing reads working-dir `data_models/`; drop `build/` read and temp-assembly), `apps/api/src/utils/publish-flow.ts` (no assemble before commit)
   - `apps/api/src/scripts/` — replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (`src/` → `data_models/`); startup `build/` cleanup
   - `apps/worker/src/processor.ts` (playground branching without testAgentId)
   - Schema migration (backfill `TestRun.testAgentName`, then drop TestAgents, unset `TestCase.testAgent`)
diff --git a/openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md b/openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md
index 5d6b553..a9df15d 100644
--- a/openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/mcp-server/spec.md
@@ -2,17 +2,17 @@
 
 ### Requirement: Semantic Layer Tools
 
-The MCP server SHALL expose the following tools for AI agent consumption. The MCP server SHALL identify itself as `"archmax"` in the server name field. All tools operate within the scope of the project identified by the URL slug — `projectId` is no longer a tool parameter. Tools that return semantic model data SHALL filter results based on the authenticated token's `scopes` array. Tools SHALL read semantic model data from the project's current `data_models/` source via in-memory assembly (`SemanticModelFileService.get()`), surfacing compact **markdown** through `SemanticModelDigest`; there is no `build/` artifact and no published-snapshot gate. Both the production and testing MCP endpoints SHALL read the same live `data_models/` source — the MCP tool registration, digest generation, and scope filtering code SHALL be shared between both modes with no conditional branches. If a project has no semantic models (the `data_models/` directory is empty or missing), model-related tools SHALL return an informational message indicating that the project has no semantic models yet.
+The MCP server SHALL expose the following tools for AI agent consumption. The MCP server SHALL identify itself as `"archmax"` in the server name field. All tools operate within the scope of the project identified by the URL slug — `projectId` is no longer a tool parameter. Tools that return semantic model data SHALL filter results based on the authenticated token's `scopes` array. Tools SHALL assemble semantic models in memory and surface compact **markdown** through `SemanticModelDigest`; there is no `build/` artifact. **Production remains gated by publishing**: the production endpoint SHALL read the published `data_models/` from the project repository's latest commit (Git HEAD, via `isomorphic-git`), so uncommitted working-directory edits are not exposed. The testing endpoint SHALL read the live working-directory `data_models/`, reflecting the latest unpublished edits. The MCP tool registration, digest generation, and scope filtering code SHALL be shared between both modes with no conditional branches; only the source (committed tree vs working directory) differs. If a project has no published models (no commit exists, or the committed `data_models/` is empty), production model-related tools SHALL return an informational message indicating that the project has no published models and instructing the user to publish from the admin UI.
 
 - `list_connections` — List all active connections for the project
-- `list_semantic_models` — List semantic models the token has access to (filtered by scopes, read from `data_models/`)
-- `get_semantic_model_overview` — Get a compact overview of an accessible semantic model (markdown digest of `data_models/` source)
-- `get_dataset_fields` — Get fields for a dataset within an accessible semantic model (read from `data_models/`)
+- `list_semantic_models` — List semantic models the token has access to (filtered by scopes; production reads the committed `data_models/`)
+- `get_semantic_model_overview` — Get a compact overview of an accessible semantic model (markdown digest of the published `data_models/`)
+- `get_dataset_fields` — Get fields for a dataset within an accessible semantic model (read from the published `data_models/`)
 
 #### Scenario: List semantic models filtered by token scope
 
 - **WHEN** `list_semantic_models` is called with a token scoped to `["shopify"]`
-- **AND** the project has models `shopify`, `datev`, and `hrworks` in `data_models/`
+- **AND** the project has published models `shopify`, `datev`, and `hrworks`
 - **THEN** only the `shopify` model summary is returned
 
 #### Scenario: Access denied for out-of-scope model
@@ -23,17 +23,22 @@ The MCP server SHALL expose the following tools for AI agent consumption. The MC
 
 #### Scenario: Get dataset fields respects token scope
 
-- **WHEN** `get_dataset_fields` is called for a dataset in an accessible model
-- **THEN** the fields are returned normally from the assembled `data_models/` data
+- **WHEN** `get_dataset_fields` is called for a dataset in an accessible published model
+- **THEN** the fields are returned normally from the assembled published data
 
-#### Scenario: No semantic models exist
+#### Scenario: No published models exist
 
-- **WHEN** `list_semantic_models` is called and the project's `data_models/` directory is empty or missing
-- **THEN** a text response is returned indicating that the project has no semantic models yet
+- **WHEN** `list_semantic_models` is called on the production endpoint and the project has no commit (or the committed `data_models/` is empty)
+- **THEN** a text response is returned indicating "No published models. Publish your semantic models from the admin UI to make them available here."
 
-#### Scenario: Production and testing endpoints serve live source
+#### Scenario: Production serves the last published state, not live edits
 
-- **WHEN** a tool is called through either the production or the testing MCP endpoint
-- **THEN** the current `data_models/` files are assembled on-the-fly in memory
-- **AND** the same tool code, digest logic, and scope filtering is used for both endpoints
-- **AND** the result reflects the latest saved source state
+- **WHEN** a model is edited and saved but not yet published, and a production tool is called
+- **THEN** the tool reflects the committed (last-published) `data_models/`, not the uncommitted edit
+
+#### Scenario: Testing endpoint serves the live working directory
+
+- **WHEN** a tool is called through the testing MCP endpoint
+- **THEN** the current working-directory `data_models/` files are assembled on-the-fly in memory
+- **AND** the same tool code, digest logic, and scope filtering is used as in production
+- **AND** the result reflects the latest saved source state, including unpublished edits
diff --git a/openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md b/openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md
index 79eb3ba..e2ec201 100644
--- a/openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md
+++ b/openspec/changes/restructure-agent-platform/specs/semantic-model-publishing/spec.md
@@ -16,7 +16,7 @@ The system SHALL provide a `PublishEvent` Mongoose model with: `project` (Object
 
 ### Requirement: Publish API
 
-The API SHALL expose a POST endpoint at `/api/projects/:projectId/publish` that accepts `{ message: string }`. Publishing is **Git versioning only** — it does not assemble or write any build artifact. The endpoint SHALL: validate the message is non-empty, ensure the project directory is a Git repository (lazy init if needed), pull/merge upstream changes if a remote is configured (abort on conflicts), stage all changes and create a Git commit with the user-provided message, create a `PublishEvent`, and push to the remote if configured. The endpoint SHALL return the created `PublishEvent` along with any push warnings. Conflict errors SHALL return a 409 status with the list of conflicted file paths.
+The API SHALL expose a POST endpoint at `/api/projects/:projectId/publish` that accepts `{ message: string }`. Publishing creates a **Git commit** that becomes the state served by production MCP — it does NOT assemble or write any `build/` artifact. The endpoint SHALL: validate the message is non-empty, ensure the project directory is a Git repository (lazy init if needed), pull/merge upstream changes if a remote is configured (abort on conflicts), stage all changes and create a Git commit with the user-provided message, create a `PublishEvent`, and push to the remote if configured. The endpoint SHALL return the created `PublishEvent` along with any push warnings. Conflict errors SHALL return a 409 status with the list of conflicted file paths.
 
 #### Scenario: Successful publish (local only)
 
@@ -53,7 +53,7 @@ The API SHALL expose a POST endpoint at `/api/projects/:projectId/publish` that
 
 ### Requirement: Publish Status API
 
-The API SHALL expose a GET endpoint at `/api/projects/:projectId/publish/status` that returns `{ hasUnpublishedChanges: boolean, lastPublishedAt: string | null, lastMessage: string | null, hasConflicts: boolean }`. Unpublished (uncommitted) changes are detected by comparing a SHA-256 hash of the current source content (`data_models/` + scaffold files, excluding derived/internal entries) against the `contentHash` of the most recent `PublishEvent`. The `hasConflicts` field SHALL be `true` if any YAML file in `data_models/` contains Git conflict markers. Because MCP serves the live `data_models/`, `hasUnpublishedChanges` reflects pending **version-control** changes, not MCP availability.
+The API SHALL expose a GET endpoint at `/api/projects/:projectId/publish/status` that returns `{ hasUnpublishedChanges: boolean, lastPublishedAt: string | null, lastMessage: string | null, hasConflicts: boolean }`. Unpublished changes are detected by comparing a SHA-256 hash of the current working-directory source content (`data_models/` + scaffold files, excluding derived/internal entries) against the `contentHash` of the most recent `PublishEvent`. The `hasConflicts` field SHALL be `true` if any YAML file in `data_models/` contains Git conflict markers. Because production MCP serves the last published (committed) state, `hasUnpublishedChanges: true` means there are saved models not yet available via production MCP.
 
 #### Scenario: No previous publish
 
@@ -83,7 +83,7 @@ The API SHALL expose a GET endpoint at `/api/projects/:projectId/publish/status`
 
 ### Requirement: Publish Overlay Dialog
 
-Clicking the publish button SHALL open a modal overlay with: a textarea for the publish message, a "Publish" confirmation button, and a "Cancel" button. The dialog copy SHALL frame publishing as committing a versioned snapshot of the project (and pushing it to the connected repository when configured), not as a gate for MCP availability. The confirmation button SHALL be disabled until the message is non-empty. On confirmation, the publish API is called and the dialog closes on success.
+Clicking the publish button SHALL open a modal overlay with: a textarea for the publish message, a "Publish" confirmation button, and a "Cancel" button. The dialog copy SHALL frame publishing as committing the current models so they become available via production MCP (and pushing to the connected repository when configured). The confirmation button SHALL be disabled until the message is non-empty. On confirmation, the publish API is called and the dialog closes on success.
 
 #### Scenario: Open publish dialog
 
diff --git a/openspec/changes/restructure-agent-platform/tasks.md b/openspec/changes/restructure-agent-platform/tasks.md
index a5e3a37..de48dee 100644
--- a/openspec/changes/restructure-agent-platform/tasks.md
+++ b/openspec/changes/restructure-agent-platform/tasks.md
@@ -27,8 +27,8 @@
 ## 4. Agent scaffold
 
 - [ ] 4.0 Rename model storage `src/` → `data_models/`: update `SemanticModelFileService` (path constant + legacy `src/`/root fallbacks) and the agent `FilesystemBackend` prompt guidance; replace `migrate-src-layout.ts` with `migrate-data-models-layout.ts` (idempotent startup move from `src/` or root, preserving `uploads/` and scaffold dirs); update/extend tests
-- [ ] 4.0a Remove the build step: delete `PublishService.assemble()`/`cleanStaleFiles()`; point both production and testing MCP routes (`archmax-route.ts`) at the live `data_models/` via in-memory `SemanticModelFileService.get()` (drop the `build/` read and temp-assembly); update `archmax-route`/`mcp-tools` empty-state message; remove the `build/` read path; update MCP tests
-- [ ] 4.0b Publish = Git versioning only: remove `assemble()` from `finalizePublish` (`publish-flow.ts`) and the revert path (`git.ts`); keep/retarget `computeSourceHash()` to hash `data_models/` + scaffold source (exclude derived/internal); drop `build/` from `DEFAULT_GITIGNORE`; reframe publish dialog copy (version/share, not "make available via MCP"); update publish + git tests
+- [ ] 4.0a Remove the build artifact, keep the publish gate: delete `PublishService.assemble()`/`cleanStaleFiles()` and the `build/` read; production MCP route reads the committed `data_models/` from Git HEAD (via `isomorphic-git`) and assembles in memory; testing MCP route reads the working-dir `data_models/` via `SemanticModelFileService.get()`; share tool/digest/scope code across both, differing only in source; restore the "No published models — publish from the admin UI" empty-state on production; update MCP tests (incl. one asserting unpublished edits are not visible in production but are in testing)
+- [ ] 4.0b Publish = Git commit (the gate): remove `assemble()` from `finalizePublish` (`publish-flow.ts`) and the revert path (`git.ts`) — commit only; keep/retarget `computeSourceHash()` to hash working-dir `data_models/` + scaffold source (exclude derived/internal) for `hasUnpublishedChanges`; drop `build/` from `DEFAULT_GITIGNORE`; keep publish dialog copy as "publish to make available via MCP"; update publish + git tests
 - [ ] 4.0c Startup `build/` cleanup: idempotently remove any `build/` directory under each project dir on startup (mirror the existing `AGENTS.md` cleanup); test
 - [ ] 4.1 `.mcp.json` seeding service: create on project creation, recreate-if-missing on builder start, update on slug change; preserve foreign entries **only when credential-safe** (warn + refuse to re-persist secret-looking entries); placeholder token only; unit tests
 - [ ] 4.2 Credential-safe JSON validation on `write_file`/`edit_file` for `.json` paths in the builder filesystem backend: syntax check (mirroring YAML validation) **plus** rejection of literal credential values in `.mcp.json` headers/env/URLs (allow only `${VAR}` placeholders); tests including the literal-Bearer-token rejection case