Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: test

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
defaults:
run:
working-directory: server
strategy:
matrix:
node-version: [20, 22]
steps:
- uses: actions/checkout@v4

- name: Set up Node ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: npm
cache-dependency-path: server/package-lock.json

- name: Install dependencies
run: npm ci

- name: Type check
run: npm run typecheck

- name: Build
run: npm run build

- name: Unit tests
run: npm run test:unit

- name: Flow tests
run: npm run test:flow
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,5 @@ references:
repository-code: 'https://github.com/joelparkerhenderson/decision-record/'
abstract: >-
The canonical concept, template, and teamwork model for decision
records — preserved in this fork at docs/upstream-canon.md and
templates/canonical.md.
records — preserved in this fork at docs/explanation/why-decision-records.md
and templates/canonical.md.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ This repo is the planning system itself. We deliberately stop at the handoff —

## Attribution

The conceptual core derives from Joel Parker Henderson's [canonical decision-record repo](https://github.com/joelparkerhenderson/decision-record). Preserve attribution to upstream in any rework of `docs/upstream-canon.md` or `templates/canonical.md`.
The conceptual core derives from Joel Parker Henderson's [canonical decision-record repo](https://github.com/joelparkerhenderson/decision-record). Preserve attribution to upstream in any rework of `docs/explanation/why-decision-records.md` or `templates/canonical.md`.

## License

Expand Down
4 changes: 2 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ SOFTWARE.

---

The preserved canonical material in `docs/upstream-canon.md` and the
canonical decision record template at `templates/canonical.md` derive from
The preserved canonical material in `docs/explanation/why-decision-records.md`
and the canonical decision record template at `templates/canonical.md` derive from
the upstream work of Joel Parker Henderson:
<https://github.com/joelparkerhenderson/decision-record>. That material
should be attributed to its original author; see CITATION.cff.
27 changes: 22 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

This repository is a Claude Code plugin + bundled MCP server. It runs inside a fresh or template repo, partners with a human and an AI agent, and produces an executable MVP plan: a scoped manifest, a set of accepted decision records, and a dependency-aware task graph. Output goes to Linear (primary) or stays as filesystem artifacts (fallback).

This project is a derivative of [Joel Parker Henderson's canonical decision-record repo](https://github.com/joelparkerhenderson/decision-record). The canonical explanation of what a DR is and why it matters is preserved at [`docs/upstream-canon.md`](docs/upstream-canon.md). What this fork adds is **enforcement**: workflows, tools, and a state machine that make DRs a non-skippable part of planning with an agentic system.
This project is a derivative of [Joel Parker Henderson's canonical decision-record repo](https://github.com/joelparkerhenderson/decision-record). The canonical explanation of what a DR is and why it matters is preserved at [`docs/explanation/why-decision-records.md`](docs/explanation/why-decision-records.md). What this fork adds is **enforcement**: workflows, tools, and a state machine that make DRs a non-skippable part of planning with an agentic system.

## What you get

Expand All @@ -17,7 +17,16 @@ This project is a derivative of [Joel Parker Henderson's canonical decision-reco

## Status

Active development — first usable cut is in. The pipeline is functional end-to-end (intake → scope → decisions → tasks → handoff to filesystem or Linear). See [`docs/quickstart.md`](docs/quickstart.md) for the five-minute walkthrough, [`docs/usage.md`](docs/usage.md) for the full interaction model, and [`docs/architecture.md`](docs/architecture.md) for the data model.
Active development — first usable cut is in. The pipeline is functional end-to-end (intake → scope → decisions → tasks → handoff to filesystem or Linear). A standalone CLI (`decision-record`) ships alongside the Claude Code plugin and MCP server.

## Documentation

Docs follow the [Diátaxis](https://diataxis.fr) framework — start at [`docs/README.md`](docs/README.md) to orient.

- **Brand new?** → [`docs/tutorials/your-first-plan.md`](docs/tutorials/your-first-plan.md) is a 15-minute end-to-end walkthrough.
- **How do I do X?** → [`docs/how-to/`](docs/how-to/) (install, run the CLI, configure providers, hand off to Linear, calibrate gates).
- **What's the exact spec?** → [`docs/reference/`](docs/reference/) (CLI flags, MCP tools, data model, gates).
- **Why is it built this way?** → [`docs/explanation/`](docs/explanation/) (design rationale, the five phases, why decision records).

## How it's structured

Expand Down Expand Up @@ -58,18 +67,26 @@ npm install
npm run build
```

Then either link as a Claude Code plugin (symlink the repo into `~/.claude/plugins/decision-record/`) or run the MCP server standalone via `node /path/to/decision-record/server/dist/index.js`. Full instructions: [`docs/quickstart.md`](docs/quickstart.md).
Then either:
- Use the **standalone CLI**: `export OPENAI_API_KEY=… && node dist/cli.js --idea "your idea here"`
- Use the **Claude Code plugin**: symlink the repo into `~/.claude/plugins/decision-record/` and run `/plan` inside Claude Code.

Full install instructions: [`docs/how-to/install.md`](docs/how-to/install.md). First-run walkthrough: [`docs/tutorials/your-first-plan.md`](docs/tutorials/your-first-plan.md).

(A published marketplace release is on the roadmap.)

## Benchmarks

We use a canonical prompt — an AI-driven roguelike POC — to spot regressions as the system evolves. See [`benchmarks/`](benchmarks/) for the prompt, expected output shape, and a `run.sh` to re-run it.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Issues and pull requests welcome.

## Acknowledgments

The conceptual core — what a decision record is, the canonical template structure, the teamwork model around DRs — is the work of [Joel Parker Henderson](https://joelparkerhenderson.com). See [`docs/upstream-canon.md`](docs/upstream-canon.md) for the preserved canonical material, and [CITATION.cff](CITATION.cff) for citation metadata.
The conceptual core — what a decision record is, the canonical template structure, the teamwork model around DRs — is the work of [Joel Parker Henderson](https://joelparkerhenderson.com). See [`docs/explanation/why-decision-records.md`](docs/explanation/why-decision-records.md) for the preserved canonical material, and [CITATION.cff](CITATION.cff) for citation metadata.

## License

[MIT](LICENSE) — for the code, schemas, and tooling in this repository. The preserved canonical content in `docs/upstream-canon.md` and the canonical template at `templates/canonical.md` derive from upstream and should be attributed to Joel Parker Henderson per CITATION.cff.
[MIT](LICENSE) — for the code, schemas, and tooling in this repository. The preserved canonical content in `docs/explanation/why-decision-records.md` and the canonical template at `templates/canonical.md` derive from upstream and should be attributed to Joel Parker Henderson per CITATION.cff.
32 changes: 32 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Benchmarks

Canonical prompts we run against the decision-record planning pipeline to catch regressions as the system evolves.

| Benchmark | Prompt | Effort | Purpose |
|---|---|---|---|
| [roguelike-ai-poc](roguelike-ai-poc/) | AI-driven roguelike where the agent plays the game | `poc` | Exercises all five pipeline phases on a small, well-bounded problem. The original dogfood case. |

## How to run a benchmark

```bash
cd benchmarks/<name>
./run.sh
```

Each benchmark has:

- `prompt.md` — the exact idea, effort level, and what "good output" looks like
- `reference/` — a baseline artifact snapshot from a canonical run
- `run.sh` — one-shot runner that fires the CLI against a fresh tmp dir

## What we look for when comparing runs

Each benchmark's `prompt.md` defines its own success criteria. Generally:

- Pipeline reaches `handed-off`
- Decision count and shape match expectations for the effort tier
- Tasks are vertical slices, every leaf has a decision ref, graph validates
- Render artifacts are emitted (Markdown + HTML)
- Event log is coherent

These benchmarks are **not unit tests** — they're regression observability. Different runs will produce slightly different plans and that's by design. Treat the reference as "shape we expect," not "bytes we require."
63 changes: 63 additions & 0 deletions benchmarks/roguelike-ai-poc/prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Benchmark: roguelike-ai-poc

This is the canonical benchmark for the decision-record planning pipeline. We re-run it as the system evolves to spot regressions in plan quality, gate behavior, agent prompts, and rendering.

## The prompt

**Idea (free-form):**

> A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area.

**Effort level:** `poc`

## Invocation

```bash
decision-record \
--title "AI-driven roguelike POC" \
--description "$(cat <<'EOF'
A minimal roguelike where the player primes an AI agent with a strategy, then the agent autonomously navigates a single ASCII-rendered room over a tick system until it wins the objective or dies. Goal: prove the agent-as-player concept with the smallest viable surface area.
EOF
)" \
--effort poc \
--cwd ./tmp-roguelike-bench \
--yes
```

Or the one-shot wrapper: `./run.sh` (creates a fresh tmp dir, runs the CLI, prints where the artifacts landed).

## What "good output" looks like

A run is healthy if the produced plan:

- **Pipeline reaches `handed-off`** — every gate passes, sign-offs recorded, project finalized.
- **3-5 significant decisions** are proposed and accepted — language, world representation, agent action contract, tick-loop control. (Not 1; not 12.)
- **5-8 vertical-slice tasks** — bootstrap → world → renderer → agent client → action handlers → game loop → CLI entry. Every leaf ≤ 16h (poc cap). Every task references at least one accepted DR.
- **The seed library is consulted** for at least the language decision (`dr_seed_search` + `dr_seed_load` on `language-choice`).
- **Graph validates clean** — no cycles, no orphan deps, no missing decision refs.
- **Artifacts emitted** — `dr/project.json`, `dr/decisions/*.json`, `dr/tasks/*.json`, rendered `.md` siblings, `dr/index.html`. `.dr/events.jsonl` contains a coherent audit trail.

## Reference snapshot

`./reference/` holds the artifacts from the canonical run produced by hand-driving the MCP tools (2026-05-16, the dogfood test that originally produced this benchmark). Treat it as a "this is what good looks like" baseline, not a strict equality target — different agent runs will pick slightly different positions, phrasing, and task decomposition, and that's fine.

When comparing a new run against `./reference/`:

- **Same final phase, gate decisions, event mix** → no regression.
- **More/fewer decisions or tasks** → check whether the new run is denser/sparser appropriately or whether the agent over- or under-decomposed.
- **Different selected positions** → fine if defensible; concerning if the argument is weaker.
- **Missing seed usage** → bug or prompt drift; the agent should reach for `language-choice` here.
- **Tasks without decision refs** → regression. Every task must link to a DR.
- **Validation failures** → regression. The graph must validate.

## What this benchmark exercises

| Surface | Coverage |
|---|---|
| Phase machine | All five transitions: intake → scoping → deciding → decomposing → handing-off → handed-off |
| Seed library | At least one `dr_seed_load` (language-choice) |
| Decision lifecycle | propose → update with position + argument → accept (no review under poc preset) |
| Task graph | Multi-node dependency chain with decision_refs |
| Gates | `min_tasks=3`, `max_task_estimate_hours=16`, `require_human_signoff_phases=['handing-off']` |
| Render | Markdown per record + static HTML index |
| Handoff | Filesystem path (Linear path is exercised by separate live test) |
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
{
"id": "0001-choose-the-implementation-language",
"number": 1,
"slug": "choose-the-implementation-language",
"title": "Choose the implementation language",
"status": "accepted",
"template_variant": "architecture",
"created_at": "2026-05-17T04:13:38.681Z",
"updated_at": "2026-05-17T04:13:38.685Z",
"summary": "Decide the primary implementation language for the project.",
"issue": "Every other foundational decision (runtime, package manager, framework choices, testing tools) flows from the language choice. Picking this early and explicitly avoids drift.",
"assumptions": [
"Team has existing language strengths to lean on.",
"Project lifespan is long enough that hiring and onboarding matter.",
"Ecosystem maturity matters for the project's domain."
],
"constraints": [
"Team's current expertise.",
"Target runtime environments (browser, server, native, embedded).",
"Performance and memory budgets.",
"Licensing or compliance restrictions on language ecosystems."
],
"positions": [
{
"title": "TypeScript",
"description": "Strongly typed JavaScript. Best for full-stack web work, ubiquitous tooling.",
"pros": [
"Ubiquitous in web",
"Strong types catch errors early",
"Massive ecosystem",
"Frontend/backend code sharing"
],
"cons": [
"Build step overhead",
"Type system can be over-engineered",
"Slower than native languages for hot paths"
],
"links": []
},
{
"title": "Python",
"description": "Dynamic, batteries-included. Best for data work, scripting, ML, fast prototypes.",
"pros": [
"Excellent ML/data ecosystem",
"Fast to write",
"Readable",
"Huge stdlib"
],
"cons": [
"Slow runtime without C extensions",
"GIL limits concurrency",
"Dynamic typing → runtime errors"
],
"links": []
},
{
"title": "Go",
"description": "Statically typed, compiled, built for concurrent services.",
"pros": [
"Simple language",
"Single binary deployment",
"Strong concurrency primitives",
"Fast compile times"
],
"cons": [
"Generics still maturing",
"Verbose error handling",
"Less rich third-party ecosystem than JS/Python"
],
"links": []
},
{
"title": "Rust",
"description": "Memory-safe systems language. Best for performance-critical or systems work.",
"pros": [
"No GC, predictable performance",
"Memory safety",
"Excellent tooling (cargo)",
"Strong types"
],
"cons": [
"Steep learning curve",
"Slower to ship initial features",
"Compile times can be long"
],
"links": []
}
],
"opinions": [],
"argument": "Python is fastest to write for a single-script game-loop POC. The OpenAI SDK + a tiny terminal renderer fit naturally; no build step or transpile loop slows iteration. Team is comfortable with Python and the project never needs to leave a single repo.",
"selected_position": "Python",
"implications": [
"Use the official openai Python SDK for agent calls.",
"Single-file or small-module layout; no package manager beyond pip/uv.",
"Pin to Python 3.11+ for ergonomic match-statement parsing of agent actions."
],
"depends_on": [],
"related_decisions": [],
"related_artifacts": [],
"review": [],
"sign_off": {
"by": "human",
"actor": "kj",
"at": "2026-05-17T04:13:38.685Z",
"notes": "poc preset, no review required"
},
"seed_origin": "language-choice",
"tags": [
"foundation",
"poc",
"foundation",
"architecture",
"stack"
]
}
Loading
Loading