diff --git a/README.md b/README.md index 4a14326..6d2e28a 100644 --- a/README.md +++ b/README.md @@ -77,7 +77,7 @@ Full install instructions: [`docs/how-to/install.md`](docs/how-to/install.md). F ## Benchmarks -We use a canonical prompt — an AI-driven roguelike POC — to spot regressions as the system evolves. See [`benchmarks/`](benchmarks/) for the prompt, expected output shape, and a `run.sh` to re-run it. +Canonical prompts for regression-spotting as the system evolves live under [`benchmarks/`](benchmarks/). See that directory for the layout convention. ## Contributing diff --git a/benchmarks/README.md b/benchmarks/README.md index a5416ca..8510048 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -2,23 +2,26 @@ Canonical prompts we run against the decision-record planning pipeline to catch regressions as the system evolves. -| Benchmark | Prompt | Effort | Purpose | -|---|---|---|---| -| [roguelike-ai-poc](roguelike-ai-poc/) | AI-driven roguelike where the agent plays the game | `poc` | Exercises all five pipeline phases on a small, well-bounded problem. The original dogfood case. | +_(No public benchmarks committed yet. Add new ones as `benchmarks//` with a `prompt.md`, a `reference/` artifact snapshot, and a `run.sh` runner. See the structure described below.)_ -## How to run a benchmark +## Benchmark layout + +Each benchmark lives in its own directory: + +``` +benchmarks// +├── prompt.md # the exact idea, effort level, and what "good output" looks like +├── reference/ # a baseline artifact snapshot from a canonical run +└── run.sh # one-shot runner that fires the CLI against a fresh tmp dir +``` + +## How to run ```bash cd benchmarks/ ./run.sh ``` -Each benchmark has: - -- `prompt.md` — the exact idea, effort level, and what "good output" looks like -- `reference/` — a baseline artifact snapshot from a canonical run -- `run.sh` — one-shot runner that fires the CLI against a fresh tmp dir - ## What we look for when comparing runs Each benchmark's `prompt.md` defines its own success criteria. Generally: diff --git a/benchmarks/roguelike-ai-poc/prompt.md b/benchmarks/roguelike-ai-poc/prompt.md deleted file mode 100644 index 745bdb9..0000000 --- a/benchmarks/roguelike-ai-poc/prompt.md +++ /dev/null @@ -1,63 +0,0 @@ -# Benchmark: roguelike-ai-poc - -This is the canonical benchmark for the decision-record planning pipeline. We re-run it as the system evolves to spot regressions in plan quality, gate behavior, agent prompts, and rendering. - -## The prompt - -**Idea (free-form):** - -> A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area. - -**Effort level:** `poc` - -## Invocation - -```bash -decision-record \ - --title "AI-driven roguelike POC" \ - --description "$(cat <<'EOF' -A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area. -EOF -)" \ - --effort poc \ - --cwd ./tmp-roguelike-bench \ - --yes -``` - -Or the one-shot wrapper: `./run.sh` (creates a fresh tmp dir, runs the CLI, prints where the artifacts landed). - -## What "good output" looks like - -A run is healthy if the produced plan: - -- **Pipeline reaches `handed-off`** — every gate passes, sign-offs recorded, project finalized. -- **3-5 significant decisions** are proposed and accepted — language, world representation, agent action contract, tick-loop control. (Not 1; not 12.) -- **5-8 vertical-slice tasks** — bootstrap → world → renderer → agent client → action handlers → game loop → CLI entry. Every leaf ≤ 16h (poc cap). Every task references at least one accepted DR. -- **The seed library is consulted** for at least the language decision (`dr_seed_search` + `dr_seed_load` on `language-choice`). -- **Graph validates clean** — no cycles, no orphan deps, no missing decision refs. -- **Artifacts emitted** — `dr/project.json`, `dr/decisions/*.json`, `dr/tasks/*.json`, rendered `.md` siblings, `dr/index.html`. `.dr/events.jsonl` contains a coherent audit trail. - -## Reference snapshot - -`./reference/` holds the artifacts from the canonical run produced by hand-driving the MCP tools (2026-05-16, the dogfood test that originally produced this benchmark). Treat it as a "this is what good looks like" baseline, not a strict equality target — different agent runs will pick slightly different positions, phrasing, and task decomposition, and that's fine. - -When comparing a new run against `./reference/`: - -- **Same final phase, gate decisions, event mix** → no regression. -- **More/fewer decisions or tasks** → check whether the new run is denser/sparser appropriately or whether the agent over- or under-decomposed. -- **Different selected positions** → fine if defensible; concerning if the argument is weaker. -- **Missing seed usage** → bug or prompt drift; the agent should reach for `language-choice` here. -- **Tasks without decision refs** → regression. Every task must link to a DR. -- **Validation failures** → regression. The graph must validate. - -## What this benchmark exercises - -| Surface | Coverage | -|---|---| -| Phase machine | All five transitions: intake → scoping → deciding → decomposing → handing-off → handed-off | -| Seed library | At least one `dr_seed_load` (language-choice) | -| Decision lifecycle | propose → update with position + argument → accept (no review under poc preset) | -| Task graph | Multi-node dependency chain with decision_refs | -| Gates | `min_tasks=3`, `max_task_estimate_hours=16`, `require_human_signoff_phases=['handing-off']` | -| Render | Markdown per record + static HTML index | -| Handoff | Filesystem path (Linear path is exercised by separate live test) | diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.json b/benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.json deleted file mode 100644 index f07d744..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.json +++ /dev/null @@ -1,115 +0,0 @@ -{ - "id": "0001-choose-the-implementation-language", - "number": 1, - "slug": "choose-the-implementation-language", - "title": "Choose the implementation language", - "status": "accepted", - "template_variant": "architecture", - "created_at": "2026-05-17T04:13:38.681Z", - "updated_at": "2026-05-17T04:13:38.685Z", - "summary": "Decide the primary implementation language for the project.", - "issue": "Every other foundational decision (runtime, package manager, framework choices, testing tools) flows from the language choice. Picking this early and explicitly avoids drift.", - "assumptions": [ - "Team has existing language strengths to lean on.", - "Project lifespan is long enough that hiring and onboarding matter.", - "Ecosystem maturity matters for the project's domain." - ], - "constraints": [ - "Team's current expertise.", - "Target runtime environments (browser, server, native, embedded).", - "Performance and memory budgets.", - "Licensing or compliance restrictions on language ecosystems." - ], - "positions": [ - { - "title": "TypeScript", - "description": "Strongly typed JavaScript. Best for full-stack web work, ubiquitous tooling.", - "pros": [ - "Ubiquitous in web", - "Strong types catch errors early", - "Massive ecosystem", - "Frontend/backend code sharing" - ], - "cons": [ - "Build step overhead", - "Type system can be over-engineered", - "Slower than native languages for hot paths" - ], - "links": [] - }, - { - "title": "Python", - "description": "Dynamic, batteries-included. Best for data work, scripting, ML, fast prototypes.", - "pros": [ - "Excellent ML/data ecosystem", - "Fast to write", - "Readable", - "Huge stdlib" - ], - "cons": [ - "Slow runtime without C extensions", - "GIL limits concurrency", - "Dynamic typing → runtime errors" - ], - "links": [] - }, - { - "title": "Go", - "description": "Statically typed, compiled, built for concurrent services.", - "pros": [ - "Simple language", - "Single binary deployment", - "Strong concurrency primitives", - "Fast compile times" - ], - "cons": [ - "Generics still maturing", - "Verbose error handling", - "Less rich third-party ecosystem than JS/Python" - ], - "links": [] - }, - { - "title": "Rust", - "description": "Memory-safe systems language. Best for performance-critical or systems work.", - "pros": [ - "No GC, predictable performance", - "Memory safety", - "Excellent tooling (cargo)", - "Strong types" - ], - "cons": [ - "Steep learning curve", - "Slower to ship initial features", - "Compile times can be long" - ], - "links": [] - } - ], - "opinions": [], - "argument": "Python is fastest to write for a single-script game-loop POC. The OpenAI SDK + a tiny terminal renderer fit naturally; no build step or transpile loop slows iteration. Team is comfortable with Python and the project never needs to leave a single repo.", - "selected_position": "Python", - "implications": [ - "Use the official openai Python SDK for agent calls.", - "Single-file or small-module layout; no package manager beyond pip/uv.", - "Pin to Python 3.11+ for ergonomic match-statement parsing of agent actions." - ], - "depends_on": [], - "related_decisions": [], - "related_artifacts": [], - "review": [], - "sign_off": { - "by": "human", - "actor": "kj", - "at": "2026-05-17T04:13:38.685Z", - "notes": "poc preset, no review required" - }, - "seed_origin": "language-choice", - "tags": [ - "foundation", - "poc", - "foundation", - "architecture", - "stack" - ] -} diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.md b/benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.md deleted file mode 100644 index 8a3a4b3..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0001-choose-the-implementation-language.md +++ /dev/null @@ -1,120 +0,0 @@ -# 0001-choose-the-implementation-language — Choose the implementation language - -| Field | Value | -| --- | --- | -| Status | `accepted` | -| Template | `architecture` | -| Updated | 2026-05-17T04:13:38.685Z | -| Selected | **Python** | -| Depends on | _(none)_ | - -## Summary - -Decide the primary implementation language for the project. - -## Issue - -Every other foundational decision (runtime, package manager, framework choices, testing tools) flows from the language choice. Picking this early and explicitly avoids drift. - -## Assumptions - -- Team has existing language strengths to lean on. -- Project lifespan is long enough that hiring and onboarding matter. -- Ecosystem maturity matters for the project's domain. - -## Constraints - -- Team's current expertise. -- Target runtime environments (browser, server, native, embedded). -- Performance and memory budgets. -- Licensing or compliance restrictions on language ecosystems. - -## Positions - -### TypeScript - -Strongly typed JavaScript. Best for full-stack web work, ubiquitous tooling. - -**Pros** - -- Ubiquitous in web -- Strong types catch errors early -- Massive ecosystem -- Frontend/backend code sharing - -**Cons** - -- Build step overhead -- Type system can be over-engineered -- Slower than native languages for hot paths - -### Python ✅ - -Dynamic, batteries-included. Best for data work, scripting, ML, fast prototypes. - -**Pros** - -- Excellent ML/data ecosystem -- Fast to write -- Readable -- Huge stdlib - -**Cons** - -- Slow runtime without C extensions -- GIL limits concurrency -- Dynamic typing → runtime errors - -### Go - -Statically typed, compiled, built for concurrent services. - -**Pros** - -- Simple language -- Single binary deployment -- Strong concurrency primitives -- Fast compile times - -**Cons** - -- Generics still maturing -- Verbose error handling -- Less rich third-party ecosystem than JS/Python - -### Rust - -Memory-safe systems language. Best for performance-critical or systems work. - -**Pros** - -- No GC, predictable performance -- Memory safety -- Excellent tooling (cargo) -- Strong types - -**Cons** - -- Steep learning curve -- Slower to ship initial features -- Compile times can be long - -## Argument - -Python is fastest to write for a single-script game-loop POC. The OpenAI SDK + a tiny terminal renderer fit naturally; no build step or transpile loop slows iteration. Team is comfortable with Python and the project never needs to leave a single repo. - -## Implications - -- Use the official openai Python SDK for agent calls. -- Single-file or small-module layout; no package manager beyond pip/uv. -- Pin to Python 3.11+ for ergonomic match-statement parsing of agent actions. - -## Sign-off - -- **By:** kj (human) -- **At:** 2026-05-17T04:13:38.685Z -- **Notes:** poc preset, no review required - ---- - -_Instantiated from seed: `language-choice`_ diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.json b/benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.json deleted file mode 100644 index 7afe41a..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.json +++ /dev/null @@ -1,85 +0,0 @@ -{ - "id": "0002-define-the-world-representation-and-renderer", - "number": 2, - "slug": "define-the-world-representation-and-renderer", - "title": "Define the world representation and renderer", - "status": "accepted", - "template_variant": "data-model", - "created_at": "2026-05-17T04:13:38.686Z", - "updated_at": "2026-05-17T04:13:38.688Z", - "summary": "How the room is stored in memory and rendered to the terminal each tick.", - "issue": "The world is small (one 10×10 room) but the representation must support: easy frame rendering, fast collision/hazard checks, and a stable serialization that the agent can read on each tick. Pick a model now so the action handlers and renderer can converge.", - "assumptions": [ - "10×10 fixed grid", - "Single player entity", - "Static tiles set at startup", - "Frame fits in a single terminal redraw" - ], - "constraints": [ - "Frame must be readable both by humans and the LLM", - "No external graphics libraries" - ], - "positions": [ - { - "title": "Nested list of chars", - "description": "world: list[list[str]] indexed by [y][x]. Player position stored separately.", - "pros": [ - "Simplest possible", - "Trivial to mutate", - "Renders by row-join" - ], - "cons": [ - "No type safety on tile semantics", - "Have to scan grid for entity positions" - ], - "links": [] - }, - { - "title": "Tile-grid + entity dict", - "description": "static_tiles: list[list[str]] for walls/floor/hazard/exit; entities: dict[id, {pos, hp, glyph}] overlaid at render time.", - "pros": [ - "Separates static map from dynamic state", - "Easy to add entities later if needed", - "Clean serialization to JSON" - ], - "cons": [ - "Two structures to keep consistent", - "Slightly more code" - ], - "links": [] - }, - { - "title": "Single 2D numpy array + glyph table", - "description": "Each cell is an int; render by mapping ints to glyphs.", - "pros": [ - "Compact", - "Fast", - "Numpy is familiar" - ], - "cons": [ - "Numpy is overkill for 10×10", - "Adds a dep we do not otherwise need", - "Less Pythonic for tiny data" - ], - "links": [] - } - ], - "opinions": [], - "argument": "Static map + entity overlay is the simplest model that survives the day-2 question can we add a second entity? without a rewrite. It serializes naturally to JSON for the LLM payload and keeps render code in one row-join.", - "selected_position": "Tile-grid + entity dict", - "implications": [ - "Tile glyphs: # wall, . floor, X hazard, > exit; entities overlay (@ for player).", - "Each tick the renderer composes static_tiles + entity glyphs at their positions.", - "JSON state sent to the agent: { frame: [], hp, tick, exit_pos, player_pos }." - ], - "depends_on": [], - "related_decisions": [], - "related_artifacts": [], - "review": [], - "sign_off": { - "by": "human", - "actor": "kj", - "at": "2026-05-17T04:13:38.688Z" - }, - "tags": [] -} diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.md b/benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.md deleted file mode 100644 index dfbf675..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0002-define-the-world-representation-and-renderer.md +++ /dev/null @@ -1,92 +0,0 @@ -# 0002-define-the-world-representation-and-renderer — Define the world representation and renderer - -| Field | Value | -| --- | --- | -| Status | `accepted` | -| Template | `data-model` | -| Updated | 2026-05-17T04:13:38.688Z | -| Selected | **Tile-grid + entity dict** | -| Depends on | _(none)_ | - -## Summary - -How the room is stored in memory and rendered to the terminal each tick. - -## Issue - -The world is small (one 10×10 room) but the representation must support: easy frame rendering, fast collision/hazard checks, and a stable serialization that the agent can read on each tick. Pick a model now so the action handlers and renderer can converge. - -## Assumptions - -- 10×10 fixed grid -- Single player entity -- Static tiles set at startup -- Frame fits in a single terminal redraw - -## Constraints - -- Frame must be readable both by humans and the LLM -- No external graphics libraries - -## Positions - -### Nested list of chars - -world: list[list[str]] indexed by [y][x]. Player position stored separately. - -**Pros** - -- Simplest possible -- Trivial to mutate -- Renders by row-join - -**Cons** - -- No type safety on tile semantics -- Have to scan grid for entity positions - -### Tile-grid + entity dict ✅ - -static_tiles: list[list[str]] for walls/floor/hazard/exit; entities: dict[id, {pos, hp, glyph}] overlaid at render time. - -**Pros** - -- Separates static map from dynamic state -- Easy to add entities later if needed -- Clean serialization to JSON - -**Cons** - -- Two structures to keep consistent -- Slightly more code - -### Single 2D numpy array + glyph table - -Each cell is an int; render by mapping ints to glyphs. - -**Pros** - -- Compact -- Fast -- Numpy is familiar - -**Cons** - -- Numpy is overkill for 10×10 -- Adds a dep we do not otherwise need -- Less Pythonic for tiny data - -## Argument - -Static map + entity overlay is the simplest model that survives the day-2 question can we add a second entity? without a rewrite. It serializes naturally to JSON for the LLM payload and keeps render code in one row-join. - -## Implications - -- Tile glyphs: # wall, . floor, X hazard, > exit; entities overlay (@ for player). -- Each tick the renderer composes static_tiles + entity glyphs at their positions. -- JSON state sent to the agent: { frame: [], hp, tick, exit_pos, player_pos }. - -## Sign-off - -- **By:** kj (human) -- **At:** 2026-05-17T04:13:38.688Z diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.json b/benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.json deleted file mode 100644 index 0e98040..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.json +++ /dev/null @@ -1,83 +0,0 @@ -{ - "id": "0003-define-the-agent-action-contract", - "number": 3, - "slug": "define-the-agent-action-contract", - "title": "Define the agent action contract", - "status": "accepted", - "template_variant": "architecture", - "created_at": "2026-05-17T04:13:38.689Z", - "updated_at": "2026-05-17T04:13:38.690Z", - "summary": "How the LLM receives the world state per tick and how it returns the chosen action.", - "issue": "The agent must produce a structured, validated action every tick. We need the protocol pinned so the game loop never has to guess what the agent meant.", - "assumptions": [ - "OpenAI-compatible API is the LLM transport", - "Strategy prompt is supplied once at startup", - "Per-tick latency budget ~2-5s is acceptable" - ], - "constraints": [ - "Action set is small (move N/S/E/W + noop)", - "Agent must not stall the game with malformed output", - "Must be debuggable from logs" - ], - "positions": [ - { - "title": "Plain-text response parsing", - "description": "Agent returns N/S/E/W/noop as plain text; we parse first token.", - "pros": [ - "Lowest token cost", - "Works with any model" - ], - "cons": [ - "Brittle to extra punctuation/prose", - "No reasoning surface", - "Hard to audit why" - ], - "links": [] - }, - { - "title": "Tool-call (function calling) with one tool: do_action(direction)", - "description": "Define a single OpenAI tool; agent invokes it once per tick with a strict enum direction.", - "pros": [ - "Schema-validated", - "Free reasoning text alongside the call", - "Easy to extend with new actions later" - ], - "cons": [ - "Slightly more tokens per call", - "Requires a model that supports function calling" - ], - "links": [] - }, - { - "title": "JSON-only response with output_config", - "description": "Force agent to emit {\"action\":\"N\",\"reason\":\"…\"} via structured outputs.", - "pros": [ - "Schema-validated", - "Reasoning captured in same payload" - ], - "cons": [ - "Some providers do not honor strict mode", - "Slightly more setup than tool-call" - ], - "links": [] - } - ], - "opinions": [], - "argument": "Tool-calling is the cleanest contract: the model gets free-form reasoning in `content` AND a strict-enum action in `tool_calls`. We can log both, and extending to new actions later is just adding enum values. Plain-text parsing trades 100 tokens of savings for a constant brittleness tax.", - "selected_position": "Tool-call (function calling) with one tool: do_action(direction)", - "implications": [ - "Define tool `do_action` with input_schema requiring `direction` in {N,S,E,W,noop}.", - "Use tool_choice=\"required\" each tick to force a call.", - "Log the assistant message text (the reasoning) alongside the chosen direction for replay/debug." - ], - "depends_on": [], - "related_decisions": [], - "related_artifacts": [], - "review": [], - "sign_off": { - "by": "human", - "actor": "kj", - "at": "2026-05-17T04:13:38.690Z" - }, - "tags": [] -} diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.md b/benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.md deleted file mode 100644 index 1bd6e3a..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0003-define-the-agent-action-contract.md +++ /dev/null @@ -1,90 +0,0 @@ -# 0003-define-the-agent-action-contract — Define the agent action contract - -| Field | Value | -| --- | --- | -| Status | `accepted` | -| Template | `architecture` | -| Updated | 2026-05-17T04:13:38.690Z | -| Selected | **Tool-call (function calling) with one tool: do_action(direction)** | -| Depends on | _(none)_ | - -## Summary - -How the LLM receives the world state per tick and how it returns the chosen action. - -## Issue - -The agent must produce a structured, validated action every tick. We need the protocol pinned so the game loop never has to guess what the agent meant. - -## Assumptions - -- OpenAI-compatible API is the LLM transport -- Strategy prompt is supplied once at startup -- Per-tick latency budget ~2-5s is acceptable - -## Constraints - -- Action set is small (move N/S/E/W + noop) -- Agent must not stall the game with malformed output -- Must be debuggable from logs - -## Positions - -### Plain-text response parsing - -Agent returns N/S/E/W/noop as plain text; we parse first token. - -**Pros** - -- Lowest token cost -- Works with any model - -**Cons** - -- Brittle to extra punctuation/prose -- No reasoning surface -- Hard to audit why - -### Tool-call (function calling) with one tool: do_action(direction) ✅ - -Define a single OpenAI tool; agent invokes it once per tick with a strict enum direction. - -**Pros** - -- Schema-validated -- Free reasoning text alongside the call -- Easy to extend with new actions later - -**Cons** - -- Slightly more tokens per call -- Requires a model that supports function calling - -### JSON-only response with output_config - -Force agent to emit {"action":"N","reason":"…"} via structured outputs. - -**Pros** - -- Schema-validated -- Reasoning captured in same payload - -**Cons** - -- Some providers do not honor strict mode -- Slightly more setup than tool-call - -## Argument - -Tool-calling is the cleanest contract: the model gets free-form reasoning in `content` AND a strict-enum action in `tool_calls`. We can log both, and extending to new actions later is just adding enum values. Plain-text parsing trades 100 tokens of savings for a constant brittleness tax. - -## Implications - -- Define tool `do_action` with input_schema requiring `direction` in {N,S,E,W,noop}. -- Use tool_choice="required" each tick to force a call. -- Log the assistant message text (the reasoning) alongside the chosen direction for replay/debug. - -## Sign-off - -- **By:** kj (human) -- **At:** 2026-05-17T04:13:38.690Z diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.json b/benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.json deleted file mode 100644 index 4f6becd..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.json +++ /dev/null @@ -1,68 +0,0 @@ -{ - "id": "0004-define-the-tick-loop-and-termination-conditions", - "number": 4, - "slug": "define-the-tick-loop-and-termination-conditions", - "title": "Define the tick loop and termination conditions", - "status": "accepted", - "template_variant": "architecture", - "created_at": "2026-05-17T04:13:38.691Z", - "updated_at": "2026-05-17T04:13:38.692Z", - "summary": "How the game advances tick by tick, when it stops, and how the user observes it.", - "issue": "With an LLM in the loop, each tick is slow (~2-5s). We need a predictable loop with hard stops so the POC always terminates and is always watchable.", - "assumptions": [ - "One-player synchronous game", - "User runs the script in a terminal and watches frames", - "LLM calls happen on the same thread" - ], - "constraints": [ - "Must terminate on win, death, or step limit", - "Frame must visibly update each tick", - "Must not deadlock on a stuck agent" - ], - "positions": [ - { - "title": "Synchronous loop with step cap", - "description": "while not terminal: render → ask agent → apply → check win/death. Hard cap at N steps (e.g., 50).", - "pros": [ - "Simplest mental model", - "Easy to log", - "Predictable termination" - ], - "cons": [ - "UI freezes during LLM call (acceptable for POC)" - ], - "links": [] - }, - { - "title": "Async loop with timeout per tick", - "description": "Wrap each agent call in a 10s timeout; on timeout, treat as noop.", - "pros": [ - "Robust to slow API", - "Game keeps moving" - ], - "cons": [ - "More complex", - "Asyncio inside a CLI script is heavier than warranted" - ], - "links": [] - } - ], - "opinions": [], - "argument": "For a single-window terminal demo, synchronous is fine. Adding asyncio doubles the code size for no demo-visible benefit. The step cap protects against an agent that wanders forever and ensures every run terminates.", - "selected_position": "Synchronous loop with step cap", - "implications": [ - "Step cap = 50; on cap, exit with status \"timeout\" and final HP.", - "Use time.sleep(0.05) after each render so the user can see the frames advance.", - "Loop logs each tick to stdout: frame, action, reasoning, hp, tick#." - ], - "depends_on": [], - "related_decisions": [], - "related_artifacts": [], - "review": [], - "sign_off": { - "by": "human", - "actor": "kj", - "at": "2026-05-17T04:13:38.692Z" - }, - "tags": [] -} diff --git a/benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.md b/benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.md deleted file mode 100644 index 0d83a25..0000000 --- a/benchmarks/roguelike-ai-poc/reference/decisions/0004-define-the-tick-loop-and-termination-conditions.md +++ /dev/null @@ -1,74 +0,0 @@ -# 0004-define-the-tick-loop-and-termination-conditions — Define the tick loop and termination conditions - -| Field | Value | -| --- | --- | -| Status | `accepted` | -| Template | `architecture` | -| Updated | 2026-05-17T04:13:38.692Z | -| Selected | **Synchronous loop with step cap** | -| Depends on | _(none)_ | - -## Summary - -How the game advances tick by tick, when it stops, and how the user observes it. - -## Issue - -With an LLM in the loop, each tick is slow (~2-5s). We need a predictable loop with hard stops so the POC always terminates and is always watchable. - -## Assumptions - -- One-player synchronous game -- User runs the script in a terminal and watches frames -- LLM calls happen on the same thread - -## Constraints - -- Must terminate on win, death, or step limit -- Frame must visibly update each tick -- Must not deadlock on a stuck agent - -## Positions - -### Synchronous loop with step cap ✅ - -while not terminal: render → ask agent → apply → check win/death. Hard cap at N steps (e.g., 50). - -**Pros** - -- Simplest mental model -- Easy to log -- Predictable termination - -**Cons** - -- UI freezes during LLM call (acceptable for POC) - -### Async loop with timeout per tick - -Wrap each agent call in a 10s timeout; on timeout, treat as noop. - -**Pros** - -- Robust to slow API -- Game keeps moving - -**Cons** - -- More complex -- Asyncio inside a CLI script is heavier than warranted - -## Argument - -For a single-window terminal demo, synchronous is fine. Adding asyncio doubles the code size for no demo-visible benefit. The step cap protects against an agent that wanders forever and ensures every run terminates. - -## Implications - -- Step cap = 50; on cap, exit with status "timeout" and final HP. -- Use time.sleep(0.05) after each render so the user can see the frames advance. -- Loop logs each tick to stdout: frame, action, reasoning, hp, tick#. - -## Sign-off - -- **By:** kj (human) -- **At:** 2026-05-17T04:13:38.692Z diff --git a/benchmarks/roguelike-ai-poc/reference/events.jsonl b/benchmarks/roguelike-ai-poc/reference/events.jsonl deleted file mode 100644 index 42ab62f..0000000 --- a/benchmarks/roguelike-ai-poc/reference/events.jsonl +++ /dev/null @@ -1,33 +0,0 @@ -{"at":"2026-05-17T04:12:02.030Z","actor":"agent","kind":"project_initialized","entity_kind":"project","entity_id":"ai-driven-roguelike-poc","payload":{"effort_level":"poc"}} -{"at":"2026-05-17T04:12:40.988Z","actor":"agent","kind":"phase_advanced","entity_kind":"phase","entity_id":"scoping","payload":{"from":"intake","to":"scoping"}} -{"at":"2026-05-17T04:12:40.991Z","actor":"agent","kind":"scope_updated","entity_kind":"project","entity_id":"ai-driven-roguelike-poc","payload":{"scope":{"in_scope":["A 10×10 ASCII-rendered single room with walls (#), floor (.), player (@), exit (>), and a hazard tile (X)","Tick-based game loop: each tick prints the frame, then queries the agent for one action","A small action vocabulary: move N/S/E/W and noop","Player has HP; stepping on hazard removes HP; reaching exit = win, HP=0 = death","Strategy prompt provided once at startup, fed to the agent as system prompt for every tick","LLM agent receives current frame + HP + tick number, returns a single action"],"out_of_scope":["Multiple rooms, dungeon generation, procedural levels","Combat with enemies, NPCs, monsters","Inventory, items, equipment","Save/load, persistence","Visual UI beyond ASCII to terminal","Multiplayer, networking","Self-improving agent loops or RL training"],"success_criteria":["A user can run a single command, supply a strategy prompt, and watch the agent play until win or death","Win and death paths both observed in manual playtests","Different strategy prompts produce visibly different agent behavior","End-to-end run completes in under 60 seconds wall time on a typical OpenAI API call"],"nice_to_have":["Configurable room layout from a text file","Replay log written to disk for post-hoc inspection","A few preset strategy prompts to demo (cautious, greedy, exploratory)"]}}} -{"at":"2026-05-17T04:12:40.991Z","actor":"agent","kind":"phase_advanced","entity_kind":"phase","entity_id":"deciding","payload":{"from":"scoping","to":"deciding"}} -{"at":"2026-05-17T04:13:38.681Z","actor":"agent","kind":"seed_loaded","entity_kind":"decision","entity_id":"0001-choose-the-implementation-language","payload":{"seed_name":"language-choice"}} -{"at":"2026-05-17T04:13:38.684Z","actor":"agent","kind":"decision_updated","entity_kind":"decision","entity_id":"0001-choose-the-implementation-language","payload":{"changed":["argument","selected_position","implications"]}} -{"at":"2026-05-17T04:13:38.685Z","actor":"human","actor_name":"kj","kind":"decision_accepted","entity_kind":"decision","entity_id":"0001-choose-the-implementation-language"} -{"at":"2026-05-17T04:13:38.686Z","actor":"agent","kind":"decision_proposed","entity_kind":"decision","entity_id":"0002-define-the-world-representation-and-renderer","payload":{"template_variant":"data-model"}} -{"at":"2026-05-17T04:13:38.687Z","actor":"agent","kind":"decision_updated","entity_kind":"decision","entity_id":"0002-define-the-world-representation-and-renderer","payload":{"changed":["argument","selected_position","implications"]}} -{"at":"2026-05-17T04:13:38.688Z","actor":"human","actor_name":"kj","kind":"decision_accepted","entity_kind":"decision","entity_id":"0002-define-the-world-representation-and-renderer"} -{"at":"2026-05-17T04:13:38.689Z","actor":"agent","kind":"decision_proposed","entity_kind":"decision","entity_id":"0003-define-the-agent-action-contract","payload":{"template_variant":"architecture"}} -{"at":"2026-05-17T04:13:38.689Z","actor":"agent","kind":"decision_updated","entity_kind":"decision","entity_id":"0003-define-the-agent-action-contract","payload":{"changed":["argument","selected_position","implications"]}} -{"at":"2026-05-17T04:13:38.690Z","actor":"human","actor_name":"kj","kind":"decision_accepted","entity_kind":"decision","entity_id":"0003-define-the-agent-action-contract"} -{"at":"2026-05-17T04:13:38.691Z","actor":"agent","kind":"decision_proposed","entity_kind":"decision","entity_id":"0004-define-the-tick-loop-and-termination-conditions","payload":{"template_variant":"architecture"}} -{"at":"2026-05-17T04:13:38.692Z","actor":"agent","kind":"decision_updated","entity_kind":"decision","entity_id":"0004-define-the-tick-loop-and-termination-conditions","payload":{"changed":["argument","selected_position","implications"]}} -{"at":"2026-05-17T04:13:38.692Z","actor":"human","actor_name":"kj","kind":"decision_accepted","entity_kind":"decision","entity_id":"0004-define-the-tick-loop-and-termination-conditions"} -{"at":"2026-05-17T04:13:38.694Z","actor":"agent","kind":"phase_advanced","entity_kind":"phase","entity_id":"decomposing","payload":{"from":"deciding","to":"decomposing"}} -{"at":"2026-05-17T04:14:22.524Z","actor":"agent","kind":"task_proposed","entity_kind":"task","entity_id":"T0001-bootstrap-repository","payload":{"decision_refs":["0001-choose-the-implementation-language"],"depends_on":[]}} -{"at":"2026-05-17T04:14:22.526Z","actor":"agent","kind":"task_proposed","entity_kind":"task","entity_id":"T0002-implement-world-module-tile-grid-entity-dict","payload":{"decision_refs":["0002-define-the-world-representation-and-renderer"],"depends_on":["T0001-bootstrap-repository"]}} -{"at":"2026-05-17T04:14:22.527Z","actor":"agent","kind":"task_proposed","entity_kind":"task","entity_id":"T0003-implement-frame-renderer","payload":{"decision_refs":["0002-define-the-world-representation-and-renderer"],"depends_on":["T0002-implement-world-module-tile-grid-entity-dict"]}} -{"at":"2026-05-17T04:14:22.528Z","actor":"agent","kind":"task_proposed","entity_kind":"task","entity_id":"T0004-implement-openai-agent-client","payload":{"decision_refs":["0003-define-the-agent-action-contract"],"depends_on":["T0001-bootstrap-repository"]}} -{"at":"2026-05-17T04:14:22.529Z","actor":"agent","kind":"task_proposed","entity_kind":"task","entity_id":"T0005-implement-action-handlers-and-termination-checks","payload":{"decision_refs":["0002-define-the-world-representation-and-renderer"],"depends_on":["T0002-implement-world-module-tile-grid-entity-dict"]}} -{"at":"2026-05-17T04:14:22.530Z","actor":"agent","kind":"task_proposed","entity_kind":"task","entity_id":"T0006-implement-the-tick-based-game-loop","payload":{"decision_refs":["0004-define-the-tick-loop-and-termination-conditions","0002-define-the-world-representation-and-renderer"],"depends_on":["T0003-implement-frame-renderer","T0004-implement-openai-agent-client","T0005-implement-action-handlers-and-termination-checks"]}} -{"at":"2026-05-17T04:14:22.532Z","actor":"agent","kind":"task_proposed","entity_kind":"task","entity_id":"T0007-implement-cli-entry-script","payload":{"decision_refs":["0001-choose-the-implementation-language","0004-define-the-tick-loop-and-termination-conditions"],"depends_on":["T0006-implement-the-tick-based-game-loop"]}} -{"at":"2026-05-17T04:14:22.534Z","actor":"agent","kind":"graph_validated","payload":{"valid":true,"task_count":7,"error_count":0,"warning_count":0}} -{"at":"2026-05-17T04:14:30.972Z","actor":"agent","kind":"graph_validated","payload":{"valid":true,"task_count":7,"error_count":0,"warning_count":0}} -{"at":"2026-05-17T04:14:37.477Z","actor":"agent","kind":"graph_validated","payload":{"valid":true,"task_count":7,"error_count":0,"warning_count":0}} -{"at":"2026-05-17T04:14:44.523Z","actor":"human","actor_name":"kj","kind":"phase_advanced","entity_kind":"phase","entity_id":"handing-off","payload":{"from":"decomposing","to":"handing-off","notes":"All decisions accepted, graph validates clean."}} -{"at":"2026-05-17T04:14:44.523Z","actor":"human","actor_name":"kj","kind":"sign_off_recorded","entity_kind":"phase","entity_id":"handing-off"} -{"at":"2026-05-17T04:14:44.538Z","actor":"agent","kind":"render_run","payload":{"decisions":4,"tasks":7}} -{"at":"2026-05-17T04:14:44.540Z","actor":"human","actor_name":"kj","kind":"export_started","entity_kind":"project","entity_id":"ai-driven-roguelike-poc","payload":{"target":"filesystem"}} -{"at":"2026-05-17T04:14:44.540Z","actor":"human","actor_name":"kj","kind":"export_completed","entity_kind":"project","entity_id":"ai-driven-roguelike-poc","payload":{"target":"filesystem","issue_count":7,"document_count":4}} -{"at":"2026-05-17T04:14:44.544Z","actor":"agent","kind":"render_run","payload":{"decisions":4,"tasks":7}} diff --git a/benchmarks/roguelike-ai-poc/reference/index.html b/benchmarks/roguelike-ai-poc/reference/index.html deleted file mode 100644 index 75276fc..0000000 --- a/benchmarks/roguelike-ai-poc/reference/index.html +++ /dev/null @@ -1,231 +0,0 @@ - - - - - -AI-driven roguelike POC — Decision Record - - - -
- -
-
ai-driven-roguelike-poc
-

AI-driven roguelike POC

-
- Phase: handed-off - Effort: poc - Updated: 2026-05-17T04:14:44.540Z - Decisions: 4 (4 accepted) - Tasks: 7 (0 done) -
-
- -

A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area.

- -
-

Scope

-
-
-

In scope

-
  • A 10×10 ASCII-rendered single room with walls (#), floor (.), player (@), exit (>), and a hazard tile (X)
  • Tick-based game loop: each tick prints the frame, then queries the agent for one action
  • A small action vocabulary: move N/S/E/W and noop
  • Player has HP; stepping on hazard removes HP; reaching exit = win, HP=0 = death
  • Strategy prompt provided once at startup, fed to the agent as system prompt for every tick
  • LLM agent receives current frame + HP + tick number, returns a single action
-
-

Success criteria

-
  • A user can run a single command, supply a strategy prompt, and watch the agent play until win or death
  • Win and death paths both observed in manual playtests
  • Different strategy prompts produce visibly different agent behavior
  • End-to-end run completes in under 60 seconds wall time on a typical OpenAI API call
-
-

Out of scope

-
  • Multiple rooms, dungeon generation, procedural levels
  • Combat with enemies, NPCs, monsters
  • Inventory, items, equipment
  • Save/load, persistence
  • Visual UI beyond ASCII to terminal
  • Multiplayer, networking
  • Self-improving agent loops or RL training
-
-

Nice to have

-
  • Configurable room layout from a text file
  • Replay log written to disk for post-hoc inspection
  • A few preset strategy prompts to demo (cautious, greedy, exploratory)
-
-
-
-
-

Handed off

-
- Target: filesystem - At: 2026-05-17T04:14:44.540Z - - -
-
- -

Decisions

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDTitleStatusSelectedDepends on
0001-choose-the-implementation-languageChoose the implementation language [architecture]acceptedPython
0002-define-the-world-representation-and-rendererDefine the world representation and renderer [data-model]acceptedTile-grid + entity dict
0003-define-the-agent-action-contractDefine the agent action contract [architecture]acceptedTool-call (function calling) with one tool: do_action(direction)
0004-define-the-tick-loop-and-termination-conditionsDefine the tick loop and termination conditions [architecture]acceptedSynchronous loop with step cap
- -

Task graph

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IDTitleStatusPriEstimateDepends onDecision refs
T0001-bootstrap-repositoryBootstrap repositoryreadyp01h0001-choose-the-implementation-language
T0002-implement-world-module-tile-grid-entity-dictImplement world module (tile grid + entity dict)openp02hT0001-bootstrap-repository0002-define-the-world-representation-and-renderer
T0003-implement-frame-rendererImplement frame rendereropenp01hT0002-implement-world-module-tile-grid-entity-dict0002-define-the-world-representation-and-renderer
T0004-implement-openai-agent-clientImplement OpenAI agent clientopenp02hT0001-bootstrap-repository0003-define-the-agent-action-contract
T0005-implement-action-handlers-and-termination-checksImplement action handlers and termination checksopenp01hT0002-implement-world-module-tile-grid-entity-dict0002-define-the-world-representation-and-renderer
T0006-implement-the-tick-based-game-loopImplement the tick-based game loopopenp02hT0003-implement-frame-renderer T0004-implement-openai-agent-client T0005-implement-action-handlers-and-termination-checks0004-define-the-tick-loop-and-termination-conditions 0002-define-the-world-representation-and-renderer
T0007-implement-cli-entry-scriptImplement CLI entry scriptopenp01hT0006-implement-the-tick-based-game-loop0001-choose-the-implementation-language 0004-define-the-tick-loop-and-termination-conditions
- -
- Generated by decision-record · - Last render: 2026-05-17T04:14:44.544Z -
- -
- - \ No newline at end of file diff --git a/benchmarks/roguelike-ai-poc/reference/project.json b/benchmarks/roguelike-ai-poc/reference/project.json deleted file mode 100644 index 3b4c9fb..0000000 --- a/benchmarks/roguelike-ai-poc/reference/project.json +++ /dev/null @@ -1,64 +0,0 @@ -{ - "id": "ai-driven-roguelike-poc", - "title": "AI-driven roguelike POC", - "description": "A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area.", - "created_at": "2026-05-17T04:12:02.030Z", - "updated_at": "2026-05-17T04:14:44.540Z", - "effort_level": "poc", - "status": "handed-off", - "scope": { - "in_scope": [ - "A 10×10 ASCII-rendered single room with walls (#), floor (.), player (@), exit (>), and a hazard tile (X)", - "Tick-based game loop: each tick prints the frame, then queries the agent for one action", - "A small action vocabulary: move N/S/E/W and noop", - "Player has HP; stepping on hazard removes HP; reaching exit = win, HP=0 = death", - "Strategy prompt provided once at startup, fed to the agent as system prompt for every tick", - "LLM agent receives current frame + HP + tick number, returns a single action" - ], - "out_of_scope": [ - "Multiple rooms, dungeon generation, procedural levels", - "Combat with enemies, NPCs, monsters", - "Inventory, items, equipment", - "Save/load, persistence", - "Visual UI beyond ASCII to terminal", - "Multiplayer, networking", - "Self-improving agent loops or RL training" - ], - "success_criteria": [ - "A user can run a single command, supply a strategy prompt, and watch the agent play until win or death", - "Win and death paths both observed in manual playtests", - "Different strategy prompts produce visibly different agent behavior", - "End-to-end run completes in under 60 seconds wall time on a typical OpenAI API call" - ], - "nice_to_have": [ - "Configurable room layout from a text file", - "Replay log written to disk for post-hoc inspection", - "A few preset strategy prompts to demo (cautious, greedy, exploratory)" - ] - }, - "sign_offs": [ - { - "phase": "handing-off", - "by": "human", - "actor": "kj", - "at": "2026-05-17T04:14:44.523Z", - "notes": "All decisions accepted, graph validates clean." - }, - { - "phase": "handing-off", - "by": "human", - "actor": "kj", - "at": "2026-05-17T04:14:44.540Z" - } - ], - "handoff": { - "target": "filesystem", - "exported_at": "2026-05-17T04:14:44.540Z", - "issue_count": 7, - "document_count": 4 - }, - "gate_config": { - "preset": "poc" - }, - "tags": [] -} diff --git a/benchmarks/roguelike-ai-poc/reference/project.md b/benchmarks/roguelike-ai-poc/reference/project.md deleted file mode 100644 index 538b476..0000000 --- a/benchmarks/roguelike-ai-poc/reference/project.md +++ /dev/null @@ -1,64 +0,0 @@ -# AI-driven roguelike POC - -| Field | Value | -| --- | --- | -| ID | `ai-driven-roguelike-poc` | -| Status | `handed-off` | -| Effort level | `poc` | -| Created | 2026-05-17T04:12:02.030Z | -| Updated | 2026-05-17T04:14:44.540Z | -| Decisions | 4 | -| Tasks | 7 | - -## Description - -A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area. - -## Scope - -**In scope** - -- A 10×10 ASCII-rendered single room with walls (#), floor (.), player (@), exit (>), and a hazard tile (X) -- Tick-based game loop: each tick prints the frame, then queries the agent for one action -- A small action vocabulary: move N/S/E/W and noop -- Player has HP; stepping on hazard removes HP; reaching exit = win, HP=0 = death -- Strategy prompt provided once at startup, fed to the agent as system prompt for every tick -- LLM agent receives current frame + HP + tick number, returns a single action - -**Success criteria** - -- A user can run a single command, supply a strategy prompt, and watch the agent play until win or death -- Win and death paths both observed in manual playtests -- Different strategy prompts produce visibly different agent behavior -- End-to-end run completes in under 60 seconds wall time on a typical OpenAI API call - -**Out of scope** - -- Multiple rooms, dungeon generation, procedural levels -- Combat with enemies, NPCs, monsters -- Inventory, items, equipment -- Save/load, persistence -- Visual UI beyond ASCII to terminal -- Multiplayer, networking -- Self-improving agent loops or RL training - -**Nice to have** - -- Configurable room layout from a text file -- Replay log written to disk for post-hoc inspection -- A few preset strategy prompts to demo (cautious, greedy, exploratory) - -## Sign-offs - -- **handing-off** by kj (human) at 2026-05-17T04:14:44.523Z — All decisions accepted, graph validates clean. - -- **handing-off** by kj (human) at 2026-05-17T04:14:44.540Z - -## Handoff - -| Field | Value | -| --- | --- | -| Target | `filesystem` | -| Exported at | 2026-05-17T04:14:44.540Z | -| Target ID | — | -| Target URL | — | diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.json b/benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.json deleted file mode 100644 index c433a10..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.json +++ /dev/null @@ -1,30 +0,0 @@ -{ - "id": "T0001-bootstrap-repository", - "number": 1, - "slug": "bootstrap-repository", - "title": "Bootstrap repository", - "description": "Initialize the Python project layout: pyproject.toml or requirements.txt with openai pin, a src/ module path, a README stub, and a .gitignore. Verify a `python -c \"import openai\"` succeeds in a fresh venv.", - "status": "ready", - "estimate": { - "unit": "hours", - "value": 1, - "confidence": "high" - }, - "acceptance_criteria": [ - "pyproject.toml or requirements.txt committed", - "openai SDK installable in a venv", - "README explains 30-second quickstart", - "python -c \"from src import __init__\" runs" - ], - "depends_on": [], - "decision_refs": [ - "0001-choose-the-implementation-language" - ], - "priority": "p0", - "labels": [ - "foundation" - ], - "assignee_hint": "agent", - "created_at": "2026-05-17T04:14:22.524Z", - "updated_at": "2026-05-17T04:14:22.524Z" -} diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.md b/benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.md deleted file mode 100644 index 09effaa..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0001-bootstrap-repository.md +++ /dev/null @@ -1,23 +0,0 @@ -# T0001-bootstrap-repository — Bootstrap repository - -| Field | Value | -| --- | --- | -| Status | `ready` | -| Priority | `p0` | -| Estimate | 1 hours (high confidence) | -| Depends on | _(none)_ | -| Decision refs | `0001-choose-the-implementation-language` — Choose the implementation language | -| Assignee hint | agent | -| Labels | `foundation` | -| Updated | 2026-05-17T04:14:22.524Z | - -## Description - -Initialize the Python project layout: pyproject.toml or requirements.txt with openai pin, a src/ module path, a README stub, and a .gitignore. Verify a `python -c "import openai"` succeeds in a fresh venv. - -## Acceptance criteria - -- [ ] pyproject.toml or requirements.txt committed -- [ ] openai SDK installable in a venv -- [ ] README explains 30-second quickstart -- [ ] python -c "from src import __init__" runs diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.json b/benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.json deleted file mode 100644 index c7a6c75..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.json +++ /dev/null @@ -1,32 +0,0 @@ -{ - "id": "T0002-implement-world-module-tile-grid-entity-dict", - "number": 2, - "slug": "implement-world-module-tile-grid-entity-dict", - "title": "Implement world module (tile grid + entity dict)", - "description": "Build src/world.py: World dataclass with static_tiles: list[list[str]] and entities: dict[str, dict]. Provide constructors for a default 10×10 room (walls border, one hazard, one exit). Pure data and helpers; no rendering, no game logic.", - "status": "open", - "estimate": { - "unit": "hours", - "value": 2, - "confidence": "med" - }, - "acceptance_criteria": [ - "World.default_room() returns a valid 10x10 with #, ., X, > tiles", - "entities dict contains a player at a known spawn", - "is_walkable(x,y) returns False for walls, True for floor and hazard", - "unit test: default room is fully walkable from spawn to exit" - ], - "depends_on": [ - "T0001-bootstrap-repository" - ], - "decision_refs": [ - "0002-define-the-world-representation-and-renderer" - ], - "priority": "p0", - "labels": [ - "core" - ], - "assignee_hint": "agent", - "created_at": "2026-05-17T04:14:22.526Z", - "updated_at": "2026-05-17T04:14:22.526Z" -} diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.md b/benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.md deleted file mode 100644 index ff06ca3..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0002-implement-world-module-tile-grid-entity-dict.md +++ /dev/null @@ -1,23 +0,0 @@ -# T0002-implement-world-module-tile-grid-entity-dict — Implement world module (tile grid + entity dict) - -| Field | Value | -| --- | --- | -| Status | `open` | -| Priority | `p0` | -| Estimate | 2 hours (med confidence) | -| Depends on | `T0001-bootstrap-repository` | -| Decision refs | `0002-define-the-world-representation-and-renderer` — Define the world representation and renderer | -| Assignee hint | agent | -| Labels | `core` | -| Updated | 2026-05-17T04:14:22.526Z | - -## Description - -Build src/world.py: World dataclass with static_tiles: list[list[str]] and entities: dict[str, dict]. Provide constructors for a default 10×10 room (walls border, one hazard, one exit). Pure data and helpers; no rendering, no game logic. - -## Acceptance criteria - -- [ ] World.default_room() returns a valid 10x10 with #, ., X, > tiles -- [ ] entities dict contains a player at a known spawn -- [ ] is_walkable(x,y) returns False for walls, True for floor and hazard -- [ ] unit test: default room is fully walkable from spawn to exit diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.json b/benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.json deleted file mode 100644 index 0caf6b1..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.json +++ /dev/null @@ -1,32 +0,0 @@ -{ - "id": "T0003-implement-frame-renderer", - "number": 3, - "slug": "implement-frame-renderer", - "title": "Implement frame renderer", - "description": "Build src/render.py: render_frame(world) -> list[str]. Compose static_tiles + entity glyphs (entity overrides tile). Provide a small HUD line below the frame showing tick number, HP, and last action. Return as list of strings so the game loop can join + print or send to LLM.", - "status": "open", - "estimate": { - "unit": "hours", - "value": 1, - "confidence": "high" - }, - "acceptance_criteria": [ - "render_frame returns 10 strings of length 10", - "player @ is visible at its current position", - "HUD line includes tick, hp, last_action", - "manual visual check: frame looks like a roguelike room" - ], - "depends_on": [ - "T0002-implement-world-module-tile-grid-entity-dict" - ], - "decision_refs": [ - "0002-define-the-world-representation-and-renderer" - ], - "priority": "p0", - "labels": [ - "core" - ], - "assignee_hint": "agent", - "created_at": "2026-05-17T04:14:22.527Z", - "updated_at": "2026-05-17T04:14:22.527Z" -} diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.md b/benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.md deleted file mode 100644 index 8bfc535..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0003-implement-frame-renderer.md +++ /dev/null @@ -1,23 +0,0 @@ -# T0003-implement-frame-renderer — Implement frame renderer - -| Field | Value | -| --- | --- | -| Status | `open` | -| Priority | `p0` | -| Estimate | 1 hours (high confidence) | -| Depends on | `T0002-implement-world-module-tile-grid-entity-dict` | -| Decision refs | `0002-define-the-world-representation-and-renderer` — Define the world representation and renderer | -| Assignee hint | agent | -| Labels | `core` | -| Updated | 2026-05-17T04:14:22.527Z | - -## Description - -Build src/render.py: render_frame(world) -> list[str]. Compose static_tiles + entity glyphs (entity overrides tile). Provide a small HUD line below the frame showing tick number, HP, and last action. Return as list of strings so the game loop can join + print or send to LLM. - -## Acceptance criteria - -- [ ] render_frame returns 10 strings of length 10 -- [ ] player @ is visible at its current position -- [ ] HUD line includes tick, hp, last_action -- [ ] manual visual check: frame looks like a roguelike room diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.json b/benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.json deleted file mode 100644 index cdc8821..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.json +++ /dev/null @@ -1,34 +0,0 @@ -{ - "id": "T0004-implement-openai-agent-client", - "number": 4, - "slug": "implement-openai-agent-client", - "title": "Implement OpenAI agent client", - "description": "Build src/agent.py: AgentClient class with constructor(strategy_prompt, model, api_key). Single method choose_action(world_state_json, tick, hp) → (direction, reasoning). Uses tool-calling with one tool do_action(direction in {N,S,E,W,noop}); tool_choice=\"required\". Returns the chosen direction and the assistant message content as reasoning.", - "status": "open", - "estimate": { - "unit": "hours", - "value": 2, - "confidence": "med" - }, - "acceptance_criteria": [ - "AgentClient instantiates without making a call", - "choose_action returns a valid direction enum", - "reasoning is captured as a string (may be empty)", - "malformed responses raise a clear error (does not silently noop)", - "strategy_prompt is in the system role on every call" - ], - "depends_on": [ - "T0001-bootstrap-repository" - ], - "decision_refs": [ - "0003-define-the-agent-action-contract" - ], - "priority": "p0", - "labels": [ - "llm", - "core" - ], - "assignee_hint": "agent", - "created_at": "2026-05-17T04:14:22.528Z", - "updated_at": "2026-05-17T04:14:22.528Z" -} diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.md b/benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.md deleted file mode 100644 index 0244119..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0004-implement-openai-agent-client.md +++ /dev/null @@ -1,24 +0,0 @@ -# T0004-implement-openai-agent-client — Implement OpenAI agent client - -| Field | Value | -| --- | --- | -| Status | `open` | -| Priority | `p0` | -| Estimate | 2 hours (med confidence) | -| Depends on | `T0001-bootstrap-repository` | -| Decision refs | `0003-define-the-agent-action-contract` — Define the agent action contract | -| Assignee hint | agent | -| Labels | `llm`, `core` | -| Updated | 2026-05-17T04:14:22.528Z | - -## Description - -Build src/agent.py: AgentClient class with constructor(strategy_prompt, model, api_key). Single method choose_action(world_state_json, tick, hp) → (direction, reasoning). Uses tool-calling with one tool do_action(direction in {N,S,E,W,noop}); tool_choice="required". Returns the chosen direction and the assistant message content as reasoning. - -## Acceptance criteria - -- [ ] AgentClient instantiates without making a call -- [ ] choose_action returns a valid direction enum -- [ ] reasoning is captured as a string (may be empty) -- [ ] malformed responses raise a clear error (does not silently noop) -- [ ] strategy_prompt is in the system role on every call diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.json b/benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.json deleted file mode 100644 index 20ad30f..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.json +++ /dev/null @@ -1,33 +0,0 @@ -{ - "id": "T0005-implement-action-handlers-and-termination-checks", - "number": 5, - "slug": "implement-action-handlers-and-termination-checks", - "title": "Implement action handlers and termination checks", - "description": "Build src/actions.py: apply_action(world, direction) -> ActionResult. Moves the player one cell if walkable; otherwise noop. Compute side effects: HP-1 when stepping onto hazard, win flag when player_pos == exit_pos, dead flag when HP <= 0. Return ActionResult dataclass with new_world, hp_delta, terminal, terminal_reason.", - "status": "open", - "estimate": { - "unit": "hours", - "value": 1, - "confidence": "high" - }, - "acceptance_criteria": [ - "Moving into a wall is a noop with no HP change", - "Moving onto hazard triggers hp_delta = -1", - "Moving onto exit triggers terminal=\"win\"", - "HP reaching 0 triggers terminal=\"death\"", - "Unit tests for each transition" - ], - "depends_on": [ - "T0002-implement-world-module-tile-grid-entity-dict" - ], - "decision_refs": [ - "0002-define-the-world-representation-and-renderer" - ], - "priority": "p0", - "labels": [ - "core" - ], - "assignee_hint": "agent", - "created_at": "2026-05-17T04:14:22.529Z", - "updated_at": "2026-05-17T04:14:22.529Z" -} diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.md b/benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.md deleted file mode 100644 index 5ad2496..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0005-implement-action-handlers-and-termination-checks.md +++ /dev/null @@ -1,24 +0,0 @@ -# T0005-implement-action-handlers-and-termination-checks — Implement action handlers and termination checks - -| Field | Value | -| --- | --- | -| Status | `open` | -| Priority | `p0` | -| Estimate | 1 hours (high confidence) | -| Depends on | `T0002-implement-world-module-tile-grid-entity-dict` | -| Decision refs | `0002-define-the-world-representation-and-renderer` — Define the world representation and renderer | -| Assignee hint | agent | -| Labels | `core` | -| Updated | 2026-05-17T04:14:22.529Z | - -## Description - -Build src/actions.py: apply_action(world, direction) -> ActionResult. Moves the player one cell if walkable; otherwise noop. Compute side effects: HP-1 when stepping onto hazard, win flag when player_pos == exit_pos, dead flag when HP <= 0. Return ActionResult dataclass with new_world, hp_delta, terminal, terminal_reason. - -## Acceptance criteria - -- [ ] Moving into a wall is a noop with no HP change -- [ ] Moving onto hazard triggers hp_delta = -1 -- [ ] Moving onto exit triggers terminal="win" -- [ ] HP reaching 0 triggers terminal="death" -- [ ] Unit tests for each transition diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.json b/benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.json deleted file mode 100644 index 129cd6b..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.json +++ /dev/null @@ -1,35 +0,0 @@ -{ - "id": "T0006-implement-the-tick-based-game-loop", - "number": 6, - "slug": "implement-the-tick-based-game-loop", - "title": "Implement the tick-based game loop", - "description": "Build src/loop.py: run_game(world, agent_client, max_steps=50). Each iteration: render frame, call agent_client.choose_action, apply action, check terminal, sleep 0.05s, repeat. Logs each tick: tick#, frame, action, reasoning excerpt, hp. Exits on terminal or step cap; returns final state + reason.", - "status": "open", - "estimate": { - "unit": "hours", - "value": 2, - "confidence": "med" - }, - "acceptance_criteria": [ - "Loop terminates on win, death, or step cap (≤50)", - "Each tick prints the frame and HUD to stdout", - "Final summary line shows reason and step count", - "No exceptions leak from agent timeouts/errors (logged and treated as noop)" - ], - "depends_on": [ - "T0003-implement-frame-renderer", - "T0004-implement-openai-agent-client", - "T0005-implement-action-handlers-and-termination-checks" - ], - "decision_refs": [ - "0004-define-the-tick-loop-and-termination-conditions", - "0002-define-the-world-representation-and-renderer" - ], - "priority": "p0", - "labels": [ - "core" - ], - "assignee_hint": "agent", - "created_at": "2026-05-17T04:14:22.530Z", - "updated_at": "2026-05-17T04:14:22.530Z" -} diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.md b/benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.md deleted file mode 100644 index 3338646..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0006-implement-the-tick-based-game-loop.md +++ /dev/null @@ -1,23 +0,0 @@ -# T0006-implement-the-tick-based-game-loop — Implement the tick-based game loop - -| Field | Value | -| --- | --- | -| Status | `open` | -| Priority | `p0` | -| Estimate | 2 hours (med confidence) | -| Depends on | `T0003-implement-frame-renderer`, `T0004-implement-openai-agent-client`, `T0005-implement-action-handlers-and-termination-checks` | -| Decision refs | `0004-define-the-tick-loop-and-termination-conditions` — Define the tick loop and termination conditions; `0002-define-the-world-representation-and-renderer` — Define the world representation and renderer | -| Assignee hint | agent | -| Labels | `core` | -| Updated | 2026-05-17T04:14:22.530Z | - -## Description - -Build src/loop.py: run_game(world, agent_client, max_steps=50). Each iteration: render frame, call agent_client.choose_action, apply action, check terminal, sleep 0.05s, repeat. Logs each tick: tick#, frame, action, reasoning excerpt, hp. Exits on terminal or step cap; returns final state + reason. - -## Acceptance criteria - -- [ ] Loop terminates on win, death, or step cap (≤50) -- [ ] Each tick prints the frame and HUD to stdout -- [ ] Final summary line shows reason and step count -- [ ] No exceptions leak from agent timeouts/errors (logged and treated as noop) diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.json b/benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.json deleted file mode 100644 index 030f430..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.json +++ /dev/null @@ -1,33 +0,0 @@ -{ - "id": "T0007-implement-cli-entry-script", - "number": 7, - "slug": "implement-cli-entry-script", - "title": "Implement CLI entry script", - "description": "Build src/__main__.py: argparse for --strategy (or read from stdin), --model (default gpt-4o), --max-steps (default 50). Construct AgentClient, build default room, call run_game. Print the final outcome. Document the env vars (OPENAI_API_KEY) and a sample invocation in README.", - "status": "open", - "estimate": { - "unit": "hours", - "value": 1, - "confidence": "high" - }, - "acceptance_criteria": [ - "python -m src --strategy \"cautious explorer\" runs end-to-end", - "README has a complete example invocation", - "--help prints usage", - "Exit code 0 on win/timeout, 1 on death (so scripts can chain)" - ], - "depends_on": [ - "T0006-implement-the-tick-based-game-loop" - ], - "decision_refs": [ - "0001-choose-the-implementation-language", - "0004-define-the-tick-loop-and-termination-conditions" - ], - "priority": "p0", - "labels": [ - "cli" - ], - "assignee_hint": "agent", - "created_at": "2026-05-17T04:14:22.532Z", - "updated_at": "2026-05-17T04:14:22.532Z" -} diff --git a/benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.md b/benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.md deleted file mode 100644 index ba9f268..0000000 --- a/benchmarks/roguelike-ai-poc/reference/tasks/T0007-implement-cli-entry-script.md +++ /dev/null @@ -1,23 +0,0 @@ -# T0007-implement-cli-entry-script — Implement CLI entry script - -| Field | Value | -| --- | --- | -| Status | `open` | -| Priority | `p0` | -| Estimate | 1 hours (high confidence) | -| Depends on | `T0006-implement-the-tick-based-game-loop` | -| Decision refs | `0001-choose-the-implementation-language` — Choose the implementation language; `0004-define-the-tick-loop-and-termination-conditions` — Define the tick loop and termination conditions | -| Assignee hint | agent | -| Labels | `cli` | -| Updated | 2026-05-17T04:14:22.532Z | - -## Description - -Build src/__main__.py: argparse for --strategy (or read from stdin), --model (default gpt-4o), --max-steps (default 50). Construct AgentClient, build default room, call run_game. Print the final outcome. Document the env vars (OPENAI_API_KEY) and a sample invocation in README. - -## Acceptance criteria - -- [ ] python -m src --strategy "cautious explorer" runs end-to-end -- [ ] README has a complete example invocation -- [ ] --help prints usage -- [ ] Exit code 0 on win/timeout, 1 on death (so scripts can chain) diff --git a/benchmarks/roguelike-ai-poc/run.sh b/benchmarks/roguelike-ai-poc/run.sh deleted file mode 100755 index 67915d1..0000000 --- a/benchmarks/roguelike-ai-poc/run.sh +++ /dev/null @@ -1,35 +0,0 @@ -#!/usr/bin/env bash -# Run the roguelike-ai-poc benchmark prompt against a fresh tmp dir. -# Requires OPENAI_API_KEY in the environment. -# Usage: -# ./run.sh # run with defaults -# OUT=./my-output ./run.sh # specify output dir -# MODEL=gpt-4o-mini ./run.sh # override model - -set -euo pipefail - -if [[ -z "${OPENAI_API_KEY:-}" ]]; then - echo "OPENAI_API_KEY not set — refusing to run." >&2 - exit 2 -fi - -HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -REPO_ROOT="$(cd "$HERE/../.." && pwd)" -OUT="${OUT:-$(mktemp -d -t dr-bench-roguelike-XXXX)}" - -DESCRIPTION="A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area." - -cd "$REPO_ROOT/server" -[[ -f dist/cli.js ]] || npm run build >&2 - -node dist/cli.js \ - --title "AI-driven roguelike POC" \ - --description "$DESCRIPTION" \ - --effort poc \ - --cwd "$OUT" \ - --yes \ - ${MODEL:+--model "$MODEL"} - -echo "" -echo "── Benchmark artifacts at: $OUT" -echo "Compare with: $HERE/reference/" diff --git a/docs/README.md b/docs/README.md index 2063fb4..8de24b6 100644 --- a/docs/README.md +++ b/docs/README.md @@ -22,7 +22,7 @@ The decision-record docs follow the [Diátaxis](https://diataxis.fr) framework ## Index ### Tutorials -- [Your first plan](tutorials/your-first-plan.md) — run the roguelike benchmark prompt end-to-end +- [Your first plan](tutorials/your-first-plan.md) — run the pipeline end-to-end on a small idea ### How-to guides - [Install the plugin or CLI](how-to/install.md) diff --git a/docs/reference/data-model.md b/docs/reference/data-model.md index 420fc42..1235b96 100644 --- a/docs/reference/data-model.md +++ b/docs/reference/data-model.md @@ -146,7 +146,7 @@ One JSON line per pipeline action. Append-only audit log. | Entity | Format | Example | |---|---|---| | Decision | `<4-digit>-` | `0003-define-the-agent-action-contract` | -| Task | `T<4-digit>-` | `T0006-implement-the-tick-based-game-loop` | -| Project | kebab-slug | `ai-driven-roguelike-poc` | +| Task | `T<4-digit>-` | `T0006-implement-the-rate-limiter` | +| Project | kebab-slug | `contact-list-deduper` | Slugs are 2–64 chars, lower-case alphanumerics + dashes, no leading/trailing dash. diff --git a/docs/tutorials/your-first-plan.md b/docs/tutorials/your-first-plan.md index 7f60435..afe9bd8 100644 --- a/docs/tutorials/your-first-plan.md +++ b/docs/tutorials/your-first-plan.md @@ -2,7 +2,7 @@ By the end of this tutorial you will have used decision-record to turn a one-line idea into a complete, scoped, decision-backed, task-decomposed MVP plan — and you will have looked at every artifact the system produces. This takes about 15 minutes. -We will use the **roguelike-ai-poc** benchmark idea — a small but real planning problem — so you can see the system handle something other than `hello world`. +We'll use a small, neutral example idea — a CLI tool that deduplicates contact lists — so you can see the system handle something real without much setup. ## Before you start @@ -24,7 +24,7 @@ You do **not** need the Claude Code plugin installed for this tutorial. We will ## Step 1: Pick a working directory -The system writes artifacts into a target project directory. We will create a fresh one: +The system writes artifacts into a target project directory. Create a fresh one: ```bash mkdir -p ~/dev/my-first-plan @@ -40,7 +40,7 @@ From the `decision-record/server/` directory: export OPENAI_API_KEY=sk-… # if you haven't already node dist/cli.js \ - --idea "a CLI tool that converts QuickBooks CSV exports into a normalized double-entry ledger" \ + --idea "a CLI tool that reads CSVs of contacts and merges fuzzy duplicates" \ --effort poc \ --cwd ~/dev/my-first-plan ``` @@ -56,13 +56,13 @@ The CLI will print colored progress to stderr as each phase runs. You will see s Target: /Users/you/dev/my-first-plan Model: gpt-4o ━━━ Phase: Intake ━━━ -✓ Initialized 'a-cli-tool-that-converts-quickbooks-csv-export…' at effort_level=poc +✓ Initialized 'a-cli-tool-that-reads-csvs-of-contacts…' at effort_level=poc ✓ Advanced: intake → scoping ━━━ Phase: Scoping ━━━ Running scoping agent… ✓ Scoping agent finished (3 tool calls). ──────────────────────────────────────────────────────────── -Scope set. in_scope: read QuickBooks CSV, parse rows… +Scope set. in_scope: read CSV, normalize fields, detect duplicates… … ──────────────────────────────────────────────────────────── ✓ Advanced: scoping → deciding @@ -121,7 +121,7 @@ Pick one. For example: cat ~/dev/my-first-plan/dr/decisions/0001-*.md ``` -You will see the full record: issue, positions considered, the selected position, the argument for why it won, the implications, and five lens reviews from the skeptic. +You will see the full record: issue, positions considered, the selected position, the argument for why it won, the implications, and (under `mvp`/`full` presets) lens reviews from the skeptic. ```bash cat ~/dev/my-first-plan/dr/decisions/0001-*.json | jq .