Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
0008a3b
feat(wizard): generalize into a reusable primitive (text steps, optio…
agjs Jul 3, 2026
1a08927
refactor(setup,scaffold): call the generalized runWizard via its opti…
agjs Jul 3, 2026
d5c9bc7
test(wizard): update actionFor decode tests for the text-input contract
agjs Jul 3, 2026
f373077
test(wizard): real-pty e2e (single-select → text edit → apply) in the…
agjs Jul 3, 2026
f6b8b10
docs: generic-wizard + config-ux design specs and the wizard implemen…
agjs Jul 3, 2026
8808f30
fix(wizard): type spaces in text fields; honest text-step hints (PR #…
agjs Jul 3, 2026
0248358
feat(cli): /config settings menu (switch / add a model) on the generi…
agjs Jul 3, 2026
5f5ea77
feat(cli): comprehensive /config settings hub + fix wizard quit-on-ca…
agjs Jul 3, 2026
ec16222
fix(config): one-line menu values + web tools on by default (interact…
agjs Jul 3, 2026
c5fc64e
feat(config): make /config comprehensive; delete ENV cruft
agjs Jul 3, 2026
20ec814
fix(config): stop double-typed text in /config; trim to human settings
agjs Jul 3, 2026
44859ea
docs: fix staleness found in a full page-by-page source cross-reference
agjs Jul 3, 2026
a9c6059
docs(spec): in-harness capability browser (feature discoverability)
agjs Jul 3, 2026
5e1e41f
docs(plan): capability browser implementation plan (7 tasks, TDD)
agjs Jul 3, 2026
9d8c05c
feat(cli): capability registry + anti-drift test
agjs Jul 3, 2026
eb76fff
refactor(render): extract generic owned-stdin menu driver; /config us…
agjs Jul 3, 2026
72ba263
feat(cli): capability browser menu (runCapabilityMenu)
agjs Jul 3, 2026
7080d01
fix(cli): reuse owned-menu driver in /help browser (close() affordanc…
agjs Jul 3, 2026
4bc5ca4
feat(cli): in-REPL scaffold launcher (boringstack/astro/vite)
agjs Jul 3, 2026
bda01c7
feat(cli): in-REPL recipe picker
agjs Jul 3, 2026
7e7da4a
feat(cli): /help opens the capability browser (TTY); text fallback of…
agjs Jul 3, 2026
fab0134
feat(render): inline menu overlay for config + foundation for future …
agjs Jul 3, 2026
ce77614
feat(/help): migrate to inline menu + remove passive capabilities
agjs Jul 4, 2026
96833ad
fix(/help): drop unused suspend/resume from capability-menu deps
agjs Jul 4, 2026
f46842d
refactor(cli): recipe picker on inline menu; delete dead owned-menu
agjs Jul 4, 2026
d9127b0
fix(render): inline menu — stop stacking, style only the selected row…
agjs Jul 4, 2026
890a920
feat(cli): / palette renders inline (like @/help); fix lingering slas…
agjs Jul 4, 2026
7cb278b
chore(config): delete dead deprecated renderMenu (no-deprecated lint)…
agjs Jul 4, 2026
ffd489a
fix(cli): /help command selection double-slashed the name (//sessions)
agjs Jul 4, 2026
acfe52f
fix(editor): preserve trailing bytes after a bracketed paste (P1)
agjs Jul 4, 2026
94e09e0
fix(rules): write generated rule-docs where the reader imports it (P1)
agjs Jul 4, 2026
d15bb01
fix(scaffold): scaffold into a named dir under cwd, not a throwaway t…
agjs Jul 4, 2026
579c0fa
test(wizard): lock the manageInput raw-mode ownership rule (P2)
agjs Jul 4, 2026
ebb57ed
docs(harness): sync subsystem manifest with current code (P3)
agjs Jul 4, 2026
6d70768
feat(cli): gradient TSFORGE banner, clean startup, persistent › prompt
agjs Jul 4, 2026
41ec0fc
feat(cli): chat-style message bubbles + kill the agent-response gap
agjs Jul 4, 2026
494605e
fix(cli): stray › prompt in scrollback + agent text spilling past the…
agjs Jul 4, 2026
902ad25
fix(cli): agent rail wrap must match true display width + leave a margin
agjs Jul 4, 2026
5050621
refactor(cli): extract agent-card rail wrapper + lock it with tests
agjs Jul 4, 2026
bf0cac0
fix(cli): seal the agent card before post-turn hints (rail no longer …
agjs Jul 4, 2026
9adcb55
feat(cli): styled plan-mode footer instead of a plain parenthetical
agjs Jul 4, 2026
49e9208
refactor(lib): env-gated trace() for silent degrade paths (review ite…
agjs Jul 4, 2026
7e73bc7
refactor(gate): split the 1049-line detect-gate.ts god file (review i…
agjs Jul 4, 2026
586ecc5
feat(gate): structured per-stage web gate output (review item 3)
agjs Jul 4, 2026
e82f61c
feat(gate): drop the I-prefix requirement for web code (review item 2)
agjs Jul 4, 2026
631b7d6
refactor(loop): split settleGate into composable, testable steps (rev…
agjs Jul 4, 2026
38c0135
refactor(loop): group ILoopCtx into ctx.tool + ctx.gate (review item 6)
agjs Jul 4, 2026
c7317e7
fix(cli): --version and --help print-and-exit (bug found during refac…
agjs Jul 4, 2026
3383754
refactor(e2e): shared ptyharness.py for the PTY suite
agjs Jul 4, 2026
5d7457e
test(e2e): real-PTY coverage for the editor (typing, paste, @ picker,…
agjs Jul 4, 2026
78b2f65
test(e2e): iTerm2 suite on shared helpers + paste/@-picker interactio…
agjs Jul 4, 2026
70bd98b
test: make staged-gate + settle-steps robust to cwd and machine load
agjs Jul 5, 2026
99e84b9
refactor(cli): split cli.ts (2938 lines) into 7 focused modules
agjs Jul 5, 2026
1b99a88
refactor(loop): extract staged build + askModel decisions from Session
agjs Jul 5, 2026
53a39d5
refactor(editor): extract the @-mention completion state machine
agjs Jul 5, 2026
4be4fec
refactor: trace() the remaining silent degrade paths (CLI, editor, fi…
agjs Jul 5, 2026
af656e1
docs: sync the Astro docs with the shipped surface
agjs Jul 5, 2026
1f7cf2e
fix(render): wrap the --continue replay body inside the agent rail
agjs Jul 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ models.json
/vite.config.ts
/components.json
/src/
__pycache__/
1 change: 1 addition & 0 deletions apps/docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ export default defineConfig({
label: "Reference",
items: [
{ label: "Commands", link: "/reference/commands/" },
{ label: "Input editor", link: "/reference/input-editor/" },
{ label: "Rule catalog", link: "/reference/rules-catalog/" },
{ label: "Roadmap", link: "/reference/roadmap/" },
],
Expand Down
8 changes: 4 additions & 4 deletions apps/docs/src/content/docs/agent/model-agent.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ One approved task can involve many agent cycles until the gate passes or tsforge
| Group | Tools | When |
| --- | --- | --- |
| Core | `read`, `run`, `edit`, `create` | always |
| Line edits | `edit_lines` | when hashline is enabled |
| Line edits | `edit_lines` | always (line-number edits with hash verification) |
| Script | `script` | always (programmatic tool calling — batch multi-step tool use in one program); withhold with `TSFORGE_NO_SCRIPT=1` for eval |
| Navigation | `search`, `symbol_search`, `find_references`, `type_at`, `diagnostics`, `rename_symbol`, `move_file`, `organize_imports` | existing-code repos |
| Git context | `git_context` | existing-code repos (read-only: diff/log/blame/show to scope a change); `TSFORGE_NO_GIT_TOOL=1` to withhold |
| Git context | `git_context` | existing-code repos (read-only: diff/log/blame/show to scope a change) |
| Web | `scaffold_web`, `scaffold_ui`, `scaffold_routes`, `add_dependency` | web builds |
| Web research | `package_info`, `package_docs`, `web_fetch`, `web_search`, `web_browse` | when `TSFORGE_WEB=1` (no required API keys or paid browser/search service) |
| Control | `yield_status` | end turn with a summary |
| Web research | `package_info`, `package_docs`, `web_fetch`, `web_search`, `web_browse` | when **Web tools** is on in `/config` (no required API keys or paid browser/search service) |

On greenfield specs, navigation tools are often withheld so the model focuses on creating files instead of exploring an empty tree. See [TypeScript language server](/lsp/typescript-server/).

Expand Down
7 changes: 5 additions & 2 deletions apps/docs/src/content/docs/cli/interactive.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Most users run `tsforge` and stay in the interactive session.
| `--no-gate` | skip auto gate detection |
| `--web` | pre-scaffold web stack + web gate on first build message |
| `--browser <html>` | append headless render check to gate |
| `--plan` | force plan mode on (already the default for interactive sessions) |
| `--plan` | force plan mode on for an interactive session — plan is the default anyway, so this only matters to override a repo that configured an autonomous `policy.mode`; ignored by one-shot/headless |
| `--continue` / `-c` | resume latest saved session for this dir |
| `--resume <id>` | resume a specific session |
| `--log` | append JSONL event stream to `~/.tsforge/logs/` |
Expand All @@ -41,18 +41,21 @@ Model endpoint overrides: `TSFORGE_BASE_URL`, `TSFORGE_MODEL` — see [Environme
| --- | --- |
| `/help` | list commands |
| `/plan` | toggle plan mode (on by default) |
| `/config` | settings hub — model (switch/add), mode, gate, editable scope, and tools (web, TDD); each with a description + live value |
| `/gate <cmd>` | set gate command (`/gate` alone clears) |
| `/files <globs>` | set editable scope |
| `/review [base]` | review your current change (logic, regressions, edge cases) |
| `/map [status\|forget]` | build a structural map of the repo to prime the agent |
| `/trace [logfile]` | summarize a `--log` run (calls, policy decisions, gate verdicts, turns-to-green) |
| `/setup` | infer + write project conventions (the setup wizard) |
| `/model [name]` | list models or switch active model |
| `/sessions` | list saved sessions |
| `/compact` | summarize conversation to free context |
| `/clear` | reset conversation (keeps workspace + gate) |
| `/cost` | rough token estimate |
| `/metrics` | token totals + generation rate (tok/s) this session |
| `/memory` | show learned failure→fix lessons (`/memory forget` clears them) |
| `/exit` | quit |
| `/exit` | quit (`/quit` is an alias) |

Anything else is sent to the agent. While it runs, type to **steer** the next turn. Ctrl-C interrupts the current run.

Expand Down
2 changes: 1 addition & 1 deletion apps/docs/src/content/docs/cli/plan-mode.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Plan mode is a safety rail for ambiguous work. The model can **read** your repo
- When the plan looks right, reply **`approve`**, **`go`**, or **`lgtm`** — the model implements it
- Web builds also accept **`yes`** / **`ok`** at the design checkpoint

There is no disable *flag*: it's a mode you cycle with Shift+Tab. (`tsforge --plan` still forces it on for a one-off launch.)
There is no disable *flag*: it's a mode you cycle with Shift+Tab. (`tsforge --plan` forces plan mode on for an interactive session even in a repo that configured an autonomous `policy.mode` — one-shot and headless runs are autonomous regardless.)

## What the model can do in plan mode

Expand Down
63 changes: 30 additions & 33 deletions apps/docs/src/content/docs/eval/ab-testing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,17 @@ title: A/B testing
description: Run feature sweeps, compare edit mechanisms, and land defaults from measured wins.
---

Compare [stream rules (TTSR)](/uplift/ttsr/), [hashline](/uplift/hashline/), and [write diagnostics](/uplift/write-diagnostics/) settings across benchmark runs before changing feature defaults. See [Big picture](/big-picture/) for what each feature does.
Compare feature settings across benchmark runs before changing a default. The sweep harness A/Bs any **tool-availability dimension** — whether the model is offered a given tool — by toggling the env var behind it per run. See [Big picture](/big-picture/) for what each feature does.

## Feature flags
## Sweepable dimensions

| Variable | Default | Disable |
| Dimension | On → | Off → |
| --- | --- | --- |
| `TSFORGE_TTSR` | ON | `=0` |
| `TSFORGE_HASHLINE` | ON | `=0` |
| `TSFORGE_LSP_WRITE_FEEDBACK` | ON | `=0` |
| `git` | `git_context` available | `TSFORGE_NO_GIT_TOOL=1` |
| `script` | `script` tool available | `TSFORGE_NO_SCRIPT=1` |
| `web` | web research tools available (`TSFORGE_WEB=1`) | off |

Full flag reference: [Environment variables](/reference/flags/).
Core uplifts ([TTSR](/uplift/ttsr/), [hashline](/uplift/hashline/), [write diagnostics](/uplift/write-diagnostics/)) are always on and no longer sweepable — they landed as defaults from earlier sweeps. Full flag reference: [Environment variables](/reference/flags/).

:::note
Running a sweep drives a real model, so you need an OpenAI-compatible endpoint (the default is local qwen at `http://localhost:8000/v1`; override with `TSFORGE_BASE_URL`/`TSFORGE_MODEL`/`TSFORGE_API_KEY`). The corpus, analysis, and report tooling below ship with the repo and are exercised by the test suite, but the runs themselves need a model.
Expand Down Expand Up @@ -42,29 +42,29 @@ A greenfield seed is regenerated from scratch (the sweep deletes the task's file

`bun run eval:sweep` accepts `TSFORGE_FEATURE_VARIANTS` — a comma-separated list of dimensions to sweep (cartesian product).

### Hashline on/off
### script on/off

```bash
TSFORGE_SEED=math \
TSFORGE_SEED=checkout \
TSFORGE_TEMPS=0 \
TSFORGE_REPEATS=2 \
TSFORGE_FEATURE_VARIANTS=hashline \
TSFORGE_FEATURE_VARIANTS=script \
bun run eval:sweep
```

Creates four runs: `math-hashline=on-t0-...` and `math-hashline=off-t0-...` (two repeats each).
Creates four runs: `checkout-script=on-t0-...` and `checkout-script=off-t0-...` (two repeats each).

### TTSR × hashline
### git × script

```bash
TSFORGE_SEED=orders \
TSFORGE_SEED=fix-regression \
TSFORGE_TEMPS=0.5 \
TSFORGE_REPEATS=3 \
TSFORGE_FEATURE_VARIANTS=ttsr,hashline \
TSFORGE_FEATURE_VARIANTS=git,script \
bun run eval:sweep
```

Runs `3 repeats × 2 temps × 4 variants = 24` runs with IDs like `orders-ttsr=on,hashline=off-t0.5-...`.
Runs `3 repeats × 2 temps × 4 variants = 24` runs with IDs like `fix-regression-git=on,script=off-t0.5-...`.

### git_context on/off

Expand All @@ -86,7 +86,7 @@ Each run directory contains `run.log` (human transcript) and `result.json` (stru

```bash
# newest sweep under evals/runs, comparing every variant to the all-off baseline
TSFORGE_BASELINE="ttsr=off,hashline=off temp=0" bun run eval:report
TSFORGE_BASELINE="git=off,script=off temp=0" bun run eval:report

# or point at a specific sweep file
bun run eval:report evals/runs/sweep-math-20260613-120000.json
Expand All @@ -97,8 +97,8 @@ It prints the table and writes it next to the sweep JSON as `…​.report.md`:
```
| Variant | Runs | Pass | 95% CI | Cycles | Ms | Quality | vs baseline |
| --- | --- | --- | --- | --- | --- | --- | --- |
| ttsr=off,hashline=off temp=0 | 10 | 60% | 31%–83% | 6.1 | 41000 | 3.8 | baseline |
| ttsr=on,hashline=on temp=0 | 10 | 90% | 60%–98% | 4.7 | 33000 | 4.2 | +30% (z=2.13) * |
| git=off,script=off temp=0 | 10 | 60% | 31%–83% | 6.1 | 41000 | 3.8 | baseline |
| git=on,script=on temp=0 | 10 | 90% | 60%–98% | 4.7 | 33000 | 4.2 | +30% (z=2.13) * |
```

Wilson intervals (not naive ±) keep the bounds sane at small N, and the z-test tells you whether a pass-rate gap is signal or noise — the bar for "measured wins" before flipping a default.
Expand All @@ -109,23 +109,21 @@ Pass rate tells you *how often* a variant failed; the **failure breakdown** tell

```
### Failure breakdown
- ttsr=off,hashline=off temp=0: type-error×3, no-progress×1
- ttsr=on,hashline=on temp=0: type-error×1
- git=off,script=off temp=0: type-error×3, no-progress×1
- git=on,script=on temp=0: type-error×1
```

Each failed run is classified from its event stream into one of: `type-error`, `lint-rule`, `hallucinated-import`, `tool-malformed`, `edit-reject`, `degeneration`, `no-progress`, `build-fail`, `browser-fail`, `route-phantom`, or `timeout`. This turns a sweep from "feature X passes more" into "feature X eliminates the `type-error` failures" — pointing at the next rule, prompt, or fixer to build. The same classifier powers the `failure class` line in [`cli-metrics`](/observability/metrics/) for a single `--log` run.

## Compare edit mechanisms

After a sweep, use `bun run eval:benchmark` to compare edit tool performance:
`bun run eval:benchmark` reports edit-tool performance across a set of run directories — useful for spotting how `edit` vs `edit_lines` behave, stale-anchor recovery rates, and token cost across models or seeds:

```bash
bun run eval:benchmark \
evals/money-hashline=on-t0-* \
evals/money-hashline=off-t0-*
bun run eval:benchmark evals/checkout-*
```

Output table compares variants on:
Output table compares runs on:

| Metric | Meaning |
| --- | --- |
Expand All @@ -141,8 +139,7 @@ Output table compares variants on:
```bash
bun run eval:benchmark \
--json evals/comparison.json \
evals/money-hashline=on-t0-* \
evals/money-hashline=off-t0-*
evals/checkout-*
```

## Run artifacts
Expand All @@ -154,10 +151,10 @@ Each run directory contains:

```json
{
"seed": "money",
"runId": "money-hashline=on-t0-20260612-120000-1",
"seed": "checkout",
"runId": "checkout-script=on-t0-20260612-120000-1",
"temperature": 0,
"features": { "TSFORGE_HASHLINE": "1" },
"features": { "TSFORGE_NO_SCRIPT": "0" },
"status": "done",
"cycles": 5,
"ms": 42000,
Expand All @@ -183,11 +180,11 @@ Each run directory contains:

## How to read results

**Edit success** — if `hashline=on` has higher `edit_lines` success than `hashline=off` `edit` success, hashline is reducing rejections.
**Edit success** — higher `edit_lines` success rate (vs `edit` rejections) means the hashline mechanism is reducing stale-anchor failures.

**Stale recovery** — non-zero recovery counts on hashline-on runs show 3-way merge is active; correlate with pass rate.
**Stale recovery** — non-zero recovery counts show the 3-way merge is active; correlate with pass rate.

**Turns to green** — lower on feature-on variants means less loop churn.
**Turns to green** — lower on a variant means less loop churn.

**Token efficiency** — smaller `mean args (bytes)` at similar success rate is better.

Expand Down
3 changes: 2 additions & 1 deletion apps/docs/src/content/docs/guardrails/rule-packs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,15 @@ These load without waiting for a dependency match:

| ID | What it covers |
| --- | --- |
| `generic-ts` | Core TypeScript safety rules for every project (the bundled ESLint safety config) |
| `env-access` | Validated env access, no `process.exit` in libraries |
| `module-boundaries` | Layering, no React in services |
| `code-flow` | Deterministic time/random, early returns |
| `comment-hygiene` | No narration, PR refs, or historical comments |
| `security` | Command injection, ReDoS, DOM XSS, silent catch blocks, no tokens in storage |
| `runtime-boundaries` | Open redirects, SSRF fetches, prototype pollution, webhook verify, upload limits |

`generic-ts` is a detection label only — strict TypeScript comes from `tsc` and the bundled ESLint config.
`generic-ts` runs on every project alongside `tsc`; stack detection layers framework-specific packs (`react`, `elysia`, `nextjs`, …) on top.

## Pack list

Expand Down
8 changes: 4 additions & 4 deletions apps/docs/src/content/docs/integrations/web-tools.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: Web research (no API keys)
description: "Opt-in web_fetch, web_search, package_info, package_docs, and web_browse tools — no paid search/browser API, no required service key."
---

Set `TSFORGE_WEB=1` to give the agent read-only internet research tools. They're built for **no required API keys and no paid vendor coupling**: npm metadata comes from the configured registry, search defaults to DuckDuckGo's keyless HTML endpoint, pages are extracted locally, and browser rendering uses local Playwright/Chromium when available. Off by default, so a run without the flag has no network reach beyond your model endpoint.
Interactive sessions get read-only internet research tools **on by default** (an assistant that can't look things up is silly); toggle them under **Web tools** in [`/config`](/cli/interactive/). They're built for **no required API keys and no paid vendor coupling**: npm metadata comes from the configured registry, search defaults to DuckDuckGo's keyless HTML endpoint, pages are extracted locally, and browser rendering uses local Playwright/Chromium when available. One-shot and eval runs stay **off** unless you set `TSFORGE_WEB=1`, so headless sweeps have no network reach beyond your model endpoint.

```bash
TSFORGE_WEB=1 tsforge "update the deprecated API call — check the library's current docs"
Expand All @@ -29,13 +29,13 @@ For current TypeScript/library work, ask the agent to search the official host f
Check the current TanStack Query docs before changing this hook. Use domain-scoped web search if needed.
```

## Why opt-in
## When they're active

The tools are read-only and offline-safe, but web access is still more reach than the agent has by default — so it's a deliberate flag, not an always-on capability. Under a policy mode that denies `network` (e.g. `ci`), the tools are unavailable even with the flag set. See [Permissions & policy](/guardrails/policy/).
The tools are read-only and offline-safe. Interactive sessions enable them by default, but one-shot and eval runs stay offline unless you opt in — so headless sweeps are deterministic. Under a policy mode that denies `network` (e.g. `ci`), the tools are unavailable even with the flag set. See [Permissions & policy](/guardrails/policy/).

| Env var | Default | Effect |
| --- | --- | --- |
| `TSFORGE_WEB` | off | enable keyless research tools (`=1`) |
| `TSFORGE_WEB` | on interactive, off one-shot/eval | force keyless research tools on (`=1`) or off (`=0`) |
| `TSFORGE_NPM_REGISTRY` | npm registry | registry used by `package_info` / `package_docs` |
| `TSFORGE_SEARXNG_URL` | unset | route `web_search` to a SearXNG instance you already run |
| `TSFORGE_WEB_SEARCH_BACKEND` | auto | `duckduckgo` or `searxng`; `searxng` fails closed if no SearXNG URL is set |
Expand Down
10 changes: 10 additions & 0 deletions apps/docs/src/content/docs/loop/gate-floor.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,16 @@ If the session scaffolds a new browser app (`scaffold_web` or `tsforge --web`),

Details: [Web scaffolding](/scaffold/web/).

### Staged gate progress & failures

The web gate runs as **named stages** rather than one opaque `&&` chain. Each stage prints a `━━ <label> ━━` banner and streams its output live; a passing stage prints `✓ <label>`. On the first failure the runner prints

```
✗ <label> FAILED (exit N)
```

and **stops** — later stages don't run, and the failing stage's exit code is preserved. So when a build goes red, both you and the agent's feedback loop see *which* stage broke (`vite build`, `typecheck`, `lint`, `type-aware lint`, `stub check`, `format`, `tests`, or `browser smoke`) instead of a wall of interleaved output. The core (non-web) gate is short enough that it stays a plain command chain.

Add a one-off page render check with `--browser path/to/index.html`.

### Accessibility, screenshots, and a perf budget
Expand Down
4 changes: 0 additions & 4 deletions apps/docs/src/content/docs/loop/greenfield.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,6 @@ Each role can run on its own model (names from your [models.json](/inference/mod
tsforge run kanban "build a kanban board"
```

## Contract negotiation (experimental)

Set `TSFORGE_CONTRACT=1` to make the generator and evaluator agree a **build contract** for each feature *before* building — the generator proposes "I'll build X, verified by Y" and the evaluator pushes back until it's concrete. The agreed contract then anchors the implementation, and the negotiation is saved to `.tsforge/greenfield/contracts/<feature>.md`. Off by default — it's unproven and adds model calls.

## Unattended runs & scheduling

Greenfield runs are long and headless-friendly. There's no built-in scheduler — wire one with your OS:
Expand Down
2 changes: 1 addition & 1 deletion apps/docs/src/content/docs/loop/spec-runner.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Outputs include per-task status (`done`, `stuck`, interrupted) and a final pass/
```bash
bun run eval:spec

TSFORGE_SEED=money TSFORGE_FEATURE_VARIANTS=hashline \
TSFORGE_SEED=money TSFORGE_FEATURE_VARIANTS=script \
bun run eval:sweep
```

Expand Down
2 changes: 1 addition & 1 deletion apps/docs/src/content/docs/loop/validation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ tsforge's primary stop is **lack of progress, not a raw turn count**. Two guards
- **Same-error persistence** — if one specific error (the same `file` + `rule`) survives **5 consecutive** fix cycles, tsforge stops, even if _other_ errors are changing around it. The stop names the blocker: `stuck on no-explicit-any in src/views/Foo/index.tsx after 5 attempts (last: …)`. Interactively, you get that diagnosis and the prompt back — the session stays alive, so you can re-steer.
- **Whole-set stall** — a coarser net: the entire error set unchanged for 6 cycles.

The **turn cap** is only a runaway backstop now. Interactive sessions ride a high ceiling (≈250 turns) so long, productive back-and-forth is never cut off; headless/eval runs keep a real cap (40, or 180 for web builds) since no human is present to intervene.
The **turn cap** is only a runaway backstop now. Interactive sessions ride a high ceiling (≈250 turns) so long, productive back-and-forth is never cut off; headless/eval runs keep a real cap (40, or 400 for web builds) since no human is present to intervene.

When the gate fails, tsforge sends structured errors (file, line, rule name, message) back to the model, not a generic failure blob. That is what makes repair workable.

Expand Down
4 changes: 2 additions & 2 deletions apps/docs/src/content/docs/lsp/typescript-server.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ Offered when tsforge detects real code to explore (existing repo, resumed sessio
| `move_file` | move/rename a file and rewrite every importer |
| `organize_imports` | sort and clean imports |

Disable navigation tools: `TSFORGE_NO_LSP_TOOLS=1`. Disable write feedback: `TSFORGE_LSP_WRITE_FEEDBACK=0`.
Navigation and write feedback (instant per-file type diagnostics after each edit) are always on for real work; navigation can be withheld for eval/headless runs with `TSFORGE_NO_LSP_TOOLS=1`.

On existing repos the model is also offered `git_context` — read-only, structured access to history and diffs (scope a fix to what changed). It is git-backed, not part of the language server, so `TSFORGE_NO_LSP_TOOLS` does not affect it; disable it with `TSFORGE_NO_GIT_TOOL=1`. See [Git context](/reference/flags/#git-context).
On existing repos the model is also offered `git_context` — read-only, structured access to history and diffs (scope a fix to what changed). It is git-backed, not part of the language server, so `TSFORGE_NO_LSP_TOOLS` does not affect it; withhold it for eval/headless runs with `TSFORGE_NO_GIT_TOOL=1`. See [Git context](/reference/flags/#git-context).

## Safe auto-fixes

Expand Down
Loading
Loading