diff --git a/docs/site/advanced/mods-and-standing-teammates.md b/docs/site/advanced/mods-and-standing-teammates.md index 09572392..a1d5b52e 100644 --- a/docs/site/advanced/mods-and-standing-teammates.md +++ b/docs/site/advanced/mods-and-standing-teammates.md @@ -1,3 +1,79 @@ # Mods & standing teammates -[TODO] — lifecycle hooks (startup/idle/merge) under `_mods/`, and standing teammates like the comm-officer prose-polisher (which defers to the [Voice & tone](../contributing/voice-and-tone.md) guide). +Mods extend a workflow without touching the binary. A mod is a markdown file under `{workflow_dir}/_mods/`. There are two kinds, and one file can be both: a **lifecycle hook** that the first officer runs at a named point in the run, and a **standing teammate** declaration that spawns a long-lived specialist agent into the team. Both live in `_mods/*.md`. The difference is which sections the file carries and which binary reads them. `spacedock status` scans the `## Hook:` headings (the `--boot` MODS section, and the merge-hook guard); `spacedock dispatch` parses the standing-teammate sections. + +## Lifecycle hooks + +A mod hook is a `## Hook: {point}` section that the first officer runs at a fixed point in the run. Three points are supported: + +- `startup`: runs once at boot, before the normal dispatch loop. +- `idle`: runs on the idle re-check pass when no entity is ready to dispatch. +- `merge`: runs at the terminal merge boundary for an entity, before any local merge, archival, or status advancement. + +Hooks are additive and run alphabetically by mod filename. The body of a hook section is prose the first officer executes; it names the commands to run and the conditions to branch on, in plain markdown. Nothing compiles; the first officer reads the section and acts on it. + +A mod can register more than one point. The shipped `pr-merge` mod (`docs/dev/_mods/pr-merge.md`) registers all three: its `## Hook: startup` and `## Hook: idle` sections scan for entities with a pending `pr` and advance any whose PR has merged, and its `## Hook: merge` section opens the code-branch PR, records `pr:` on the entity, and blocks until merge. + +### Merge hooks can block, and the mechanism enforces it + +A `merge` hook can wait for captain approval before pushing, or for a remote PR to merge. The first officer signals the wait through the entity `mod-block` field, and `spacedock status` enforces the discipline so a blocked entity cannot slip past the gate: + +- **Set before invoking.** The first officer sets `mod-block=merge:{mod_name}` before running the merge hook: + + ```bash + spacedock status --workflow-dir {workflow_dir} --set {slug} mod-block=merge:{mod_name} + ``` + +- **Guarded.** `spacedock status --set` refuses any terminal transition while `mod-block` is non-empty. `--archive` refuses too. Pass `--force` to override. +- **Required when a merge hook exists.** Independently of `mod-block`, `status --set` and `status --archive` refuse to terminalize or archive an entity when the workflow registers any merge hook (`_mods/*.md` with a `## Hook: merge` section) *and* the entity's `pr` field is empty *and* `mod-block` is empty. This forces the merge ceremony to leave a truthful signal that a merge actually ran. `merge: local` in the workflow README exempts the `pr` requirement; `verdict=rejected` exempts it too (a rejected entity never runs the ceremony). `--force` bypasses everything. +- **Cleared in its own call.** When the blocking action completes, the first officer clears the block: + + ```bash + spacedock status --workflow-dir {workflow_dir} --set {slug} mod-block= + ``` + + This clear MUST be standalone. `status --set` exits 1 if `mod-block=` is combined with a terminal field (`status={terminal}`, `completed`, `verdict`, or `worktree=`) in one call. Use two commits. + +`mod-block` is read from frontmatter at boot, so a pending merge survives session resume. The first officer picks up which mod is blocking and resumes the wait. + +## Standing teammates + +A standing teammate is a long-lived specialist agent (a prose polisher, a code reviewer, a translator) declared by a mod with `standing: true` in its frontmatter. It lives in the team for the team's lifetime and is addressed by name. Use one when a recurring specialist judgment is worth a persistent agent rather than a fresh dispatch each time. + +### Declaration + +One mod file per teammate under `{workflow_dir}/_mods/{name}.md`. The parse contract (see `internal/dispatch/mods.go`): + +- **Frontmatter** carries `standing: true` and an optional `description`. +- **`## Hook: startup`** declares the spawn config as `- key: value` bullets. `spacedock dispatch spawn-standing` reads `subagent_type`, `name`, and `model` here; `model` must be one of `sonnet`, `opus`, `haiku`. Backtick-wrapped values are unwrapped. +- **`## Routing Usage`** (optional) is the prose each ensign sees telling it when and how to route to the teammate. +- **`## Agent Prompt`** MUST be the last top-level section. Its body, from the line after the heading to end of file, is the verbatim prompt passed to the spawned agent. Any `## ` heading after it is rejected loudly by `spacedock dispatch spawn-standing`. + +### Lifecycle + +The first officer drives three `spacedock dispatch` subcommands, all reading `_mods/` directly. Do not grep frontmatter yourself: + +```bash +spacedock dispatch list-standing --workflow-dir {wd} # abs mod paths, one per line, sorted +spacedock dispatch spawn-standing --mod {abs_path} --team {team_name} +spacedock dispatch show-standing --workflow-dir {wd} # ensign-facing routing block +``` + +- **Discovery** runs at boot via `list-standing`. It prints the absolute path of each `standing: true` mod, one per line, sorted alphabetically; empty output means none. +- **Spawn is deferred** to the first team-mode dispatch. `spawn-standing` emits an `Agent()` spec for the host to launch, or `{"status": "already-alive", "name": ...}` when the team config already lists that member. Standing teammates are **first-boot-wins**: when several workflows share one team, the first first officer to find the member absent spawns it, and the rest skip. A mod that fails to parse (missing `## Agent Prompt`, an invalid `model`, a trailing heading) is reported and skipped; it does not block the workflow. +- **Routing is best-effort and non-blocking.** Address the teammate by its declared `name`, with a 2-minute timeout. If no reply lands in time, the sender proceeds without the specialist's output. Round-trips of several minutes are normal on long drafts. +- **Teardown is team-scoped.** The teammate dies when the team is torn down (session end, explicit delete, captain shutdown). There is no cross-team or cross-session persistence; mid-session death is detected on the next routing attempt. + +Bare (single-entity) mode and degraded mode still run discovery (it is cheap) but skip the spawn pass, because there is no team to spawn into. + +### Ensign discovery + +Ensigns find standing teammates without the first officer wiring anything per dispatch. When a workflow declares at least one standing teammate, `spacedock dispatch build` appends a `spacedock dispatch show-standing` fetch line to each ensign dispatch. `show-standing` renders a `### Standing teammates available in your team` block, carrying each teammate's `## Routing Usage` body when present and otherwise a one-line fallback, so every dispatched worker learns who to route to. + +## The comm-officer prose-polisher + +The canonical standing teammate is the **comm-officer**: a standing prose-polisher the first officer routes deliberate drafts through before captain review. By convention it is declared as `_mods/comm-officer.md` with `standing: true` and named `comm-officer`. + +The first officer routes through it when composing **deliberate drafts**: PR bodies, gate-review summaries, long narrative entity-body sections, debrief content. It checks team membership first and treats the call as best-effort and non-blocking on the 2-minute timeout; if the comm-officer is absent or silent, the draft ships un-polished. Explicitly **out of scope**: live captain replies, short operational statuses (`pushed`, `tests green`, `PR opened`), tool-call output, commit messages, and transient logs. Polish is a deliberate-draft discipline, not a live-turn reflex. + +The comm-officer's prose discipline is light-touch by default: it applies the `elements-of-style:writing-clearly-and-concisely` skill (Strunk) to cut empty words and tighten sentences while preserving the caller's voice, rhythm, and technical vocabulary. It defers to a project voice guide when one exists. For Spacedock's own docs, that guide is the [Voice & tone](../contributing/voice-and-tone.md) page. The comm-officer and any doc contributor follow it, falling back to plain Strunk only where the guide is silent. diff --git a/docs/site/advanced/split-root-state.md b/docs/site/advanced/split-root-state.md index 9c82fa46..a8d93131 100644 --- a/docs/site/advanced/split-root-state.md +++ b/docs/site/advanced/split-root-state.md @@ -1,3 +1,129 @@ # Multi-workflow & split-root state -[TODO] — separating workflow definition from runtime state: the two roots (`definition_dir` / `state_dir`), declaring `state:` in the README, and concurrency-safe path-scoped state commits. See the [external-tracker bridge](external-tracker.md). +A split-root workflow separates the workflow's definition from its runtime +state. The README and stage declarations stay on your main branch; the mutable +entities (frontmatter updates, stage reports, archive moves) live in a +separate state checkout. State transitions stop polluting your code branch's +history, and the same workflow definition can drive shared issues without each +status change landing as a commit on `main`. + +You opt in with a single README field. Without it, Spacedock keeps the +default single-root behavior: entities sit beside the README on the same +branch. + +## The two roots + +A workflow resolves to two directory roles, derived from the README's `state:` +field (`internal/status/roots.go`): + +- **`definition_dir`**: the directory containing `README.md`. It holds the + workflow identity and the stage declarations. This is what you pass as + `--workflow-dir`. +- **`state_dir`**: `definition_dir/` when the README declares a + non-empty `state:` value, otherwise `definition_dir` itself. It holds the + active entities and the `_archive` directory. + +`spacedock status` reads stage declarations from `definition_dir/README.md` and +entities from `state_dir`. It writes frontmatter updates and archive moves only +into `state_dir`. In single-root mode the two roots are the same path, matching +the original same-directory layout, so existing workflows are unaffected. + +## Declare `state:` in the README + +Add a top-level `state:` field to the README frontmatter. The value is a path +relative to the README directory: + +```yaml +state: .spacedock-state +``` + +The path is resolved against the definition dir. The interpreter +(`internal/status/state.go`) rejects two classes of value rather than following +them silently: + +- An **absolute path** fails: `state:` must be relative to the README directory. +- A path that **escapes the definition dir** via `..` fails: the v0 contract is + a child checkout, not an arbitrary location. + +An empty `state:`, an absent field, or the explicit `$inline` sentinel all +resolve to single-root (inline) mode. The shipped `docs/dev` workflow uses +`state: .spacedock-state`; see its README for a live example. + +Active entities live directly under `state_dir`; there is no `entities/` +subdirectory. Archived entities move to `state_dir/_archive`. Read the state +with the launcher exactly as you would a single-root workflow; the split is +transparent to the command surface: + +```bash +spacedock status --workflow-dir docs/dev +spacedock status --workflow-dir docs/dev --next +``` + +## The state branch + +The state checkout lives on an orphan branch in the same repo (no second repo, +no second remote), and the checkout itself is a linked worktree of the main repo +at the gitignored `state:` path. State commits land on the orphan branch, so the +code branch never sees them. Spacedock derives the branch name from the workflow +dir's basename, `spacedock-state/`, so `docs/dev` maps to +`spacedock-state/dev`. An explicit `state-branch:` field in the README overrides +the derived name verbatim (`StateBranch` in `internal/status/state.go`). + +Because the branch is shared through `origin`, multiple agents (and multiple +operators) can drive the same workflow concurrently. That makes the commit +discipline below a correctness requirement, not a style preference. + +## Concurrency-safe state commits + +The state checkout is a single, non-branched git index. A bare `git add -A` +followed by a bare `git commit` sweeps up a sibling writer's staged entity, +cross-attributing or clobbering it. **Every writer commits path-scoped**, naming +exactly the entity it touched: + +```bash +git -C {state_checkout} add {entity_path} +git -C {state_checkout} commit -m "..." -- {entity_path} +``` + +Never a bare `git add -A` or a bare `git commit` against the state checkout. +On `index.lock` contention, retry after roughly two seconds. When the status +tool owns the `add`+`commit` under a lock, route through it instead: a +tool-managed atomic commit is preferred over the manual path-scoped fallback. + +### Multi-writer sync + +The path-scoped rule extends to three sync points against `origin`, not a pull +before every dispatch: + +- **After a state commit, push.** `git -C {state_checkout} push origin {state_branch}`. +- **On a non-fast-forward rejection, rebase then re-push.** + `git -C {state_checkout} pull --rebase origin {state_branch}` replays your + single-file commit atop the peer's. Disjoint paths produce no conflict. +- **At first-officer boot, pull once.** Integrate peers' state at boot, not on + every read. + +If `pull --rebase` conflicts (two writers editing the same entity's frontmatter +at once), the first officer halts the dispatch, aborts the rebase, and surfaces +the conflicting entity and peer commit to the captain. It does not +`--force`-push and does not auto-resolve with `-X ours`/`-X theirs`, either of +which silently drops a peer's edit. A full lock model is out of scope; the halt +is the boundary behavior. + +## Worktree stages under split-root + +When a split-root workflow has a worktree stage, the worktree isolates the +deliverable work product only. The entity body and stage reports are still +written and committed to the state checkout at the entity's state-checkout path, +never a worktree copy. The dispatch helper hands the worker that path even under +a worktree stage. "Commits must be on this branch" applies to the deliverable +artifacts; entity state always lands in the state checkout. + +## Bridging an external tracker + +Split-root state is the integration point for external trackers: Linear, GitHub +Issues, kata, or another ticket ledger. The external system can own backlog +intake, discussion, and assignment while Spacedock remains the execution +workflow. The bridge uses flat top-level frontmatter fields (`issue:`, +`source:`) so the current line-oriented parser preserves them. See the +[external-tracker bridge](external-tracker.md) for the field contract and the +principles that keep Spacedock's stage semantics out of the tracker. diff --git a/docs/site/advanced/sprints-and-roadmap.md b/docs/site/advanced/sprints-and-roadmap.md index d28e509a..2dbeefe5 100644 --- a/docs/site/advanced/sprints-and-roadmap.md +++ b/docs/site/advanced/sprints-and-roadmap.md @@ -1,3 +1,64 @@ # Sprints & roadmap -[TODO] — the roadmap as the strategy layer above the workflow, the shaping-FO/Commander split, the sprint lifecycle (shape → drive → close), and the convention-only `status --where` sprint grouping. +A roadmap is the strategy layer above the per-entity workflow: it owns outcome, scope, sequencing, and definition-of-done; the workflow owns task state. The two never overlap. Spacedock's own build uses this split. `docs/roadmap/README.md` is the strategy layer, and `docs/dev/.spacedock-state/` holds the executable entities that `spacedock status` queries. The roadmap explicitly "does **not** track task state." + +This page describes that construct as Spacedock practices it on itself. It is a convention (prose, frontmatter, and the native `status --where` query), not new binary behavior. There is no `sprint` recognizer, no `--sprint-validate` gate, and no contract bump. If you want this discipline for your own workflow, you adopt the convention; nothing in the launcher enforces it. + +## The roadmap as strategy layer + +The roadmap file owns four things the workflow does not: the **outcome** each sprint unlocks, the **scope** (which entities are in), the **sequencing** (value-ordered sprint list), and the **definition-of-done** per sprint. Task state, meaning which stage an entity is in and whether its gate passed, stays in the entities and is read with `spacedock status`. Keep the two separate: a roadmap that starts tracking stage transitions has duplicated the workflow and will drift from it. + +A roadmap groups its work into sprints. A sprint is a set of entities driven to one deliverable, with its own `index.md` recording the goal, the members-as-query, the DoD, and what is out of scope. + +## The shaping-FO / Commander split + +A sprint is run by two distinct roles, and the boundary between them is the load-bearing rule of the construct. + +- **Shaping FO** owns strategy and shape: the roadmap, each sprint's *definition* (deliverable + DoD), the gating ideation of the sprint's entities, the cross-entity coherence check, the staff readiness review, and packaging the sprint for execution. It stays high-level and does **not** hand-drive stage execution. +- **Commander** takes one packaged sprint and drives it to its deliverable: dispatches each stage, approves execution gates and merges, runs the sprint-wide integration test, and produces the report. The Commander boots `spacedock:first-officer` and creates its own team. + +The handoff between them is a **conn-to-drive dispatch**: a self-contained sprint package at `NNN-/dispatch-sprint-execution.md` that the Commander runs from a cold boot. It is a package, not a context transfer. The Commander does not inherit the shaping FO's session, and escalates back to the shaping FO and captain only on a third feedback cycle, a budget blowout, an irrecoverable block, or a genuine scope fork. + +## Sprint lifecycle: shape, drive, close + +A sprint moves through three phases, owner-tagged. The per-sprint checklist lives in the sprint's `index.md`; the canonical template is the lifecycle checklist in `docs/roadmap/README.md`. + +**Shape (shaping FO).** + +1. **Scope-lock** with the captain: which entities are in, which defer. The captain decides. +2. **Carve.** Stamp `sprint`, `group`, and `sprint-readiness` frontmatter on the members; write `index.md` (goal, members-as-query, DoD, out-of-scope). +3. **Ideate** each gated member: problem, approach, acceptance criteria, and test plan, with the **riskiest mechanism exercised first** (a spike, or a recorded "no spike needed"). Check existing ideation state first; never re-ideate a banked design. +4. **Preflight staff review.** Dispatch an *independent* reviewer to refute the designs. This is not optional and never a self-review: the reviewer is neither the FO nor the ideation ensigns, because the value is refuting the FO's own assumptions. Findings land in `staff-review.md` and Material ones are folded before the gates lock. +5. **Present ideation gates**, with checklist accounting and acceptance-criteria cross-check per member. The captain decides; the FO never self-approves. +6. **Package.** Write `dispatch-sprint-execution.md`: the boot recipe, per-member build notes, in-drive gates, and the release-cut recipe. + +**Drive (Commander, a separate cold-booted session).** + +1. Move each member through implementation, validation, and done, with a **detached adversarial audit at validation** for every high-stakes surface (front door, status guards, shipped scaffolding, CI/release machinery). +2. Merge each member to `next` by PR; keep state commits concurrency-safe. +3. **Pre-cut antipattern audit.** With all members merged and the tag not yet fired, dispatch an *independent* reviewer over the assembled sprint to catch cross-cutting antipatterns and integration holes before they ship. Ship-blockers are fixed before the cut; non-blockers are recorded for the next sprint. Running it after the tag means the antipattern has already shipped. +4. **Cut the release.** Confirm `go test ./...` is green from the root, then follow the authoritative cut procedure (see [Releasing](../contributing/releasing.md)). The captain authorizes the cut. + +**Close (shaping FO).** + +1. **Seed the next sprint.** Fold the pre-cut audit's deferred and non-blocking findings into the next sprint's backlog, and run a light post-cut release verification, since some release-machinery issues only manifest once the tag actually fires. + +## Membership is a query, not a list + +A sprint groups its entities by frontmatter query, never a hard-coded roster. Members carry `sprint`, `group`, and `sprint-readiness` frontmatter, and the rollup is the native `--where` filter on `spacedock status`. Run it against the workflow that holds the entities (`docs/dev` in Spacedock's own build): + +```bash +# every member of a sprint +spacedock status --workflow-dir docs/dev --where sprint=0200-flip + +# the drivable set — excludes deferred members +spacedock status --workflow-dir docs/dev --where sprint=0200-flip --where 'sprint-readiness != defer' +``` + +Each `--where` clause is `field value`, where `` is `=` or `!=`. Stacking clauses ANDs them: an entity is listed only when it matches every clause. The operator forms also cover presence and absence: `field !=` matches a non-empty value, `field =` matches an empty one. This is the same filter any reader can run; the sprint is a convention layered on top of it, not a separate command. + +`sprint-readiness: defer` is how a member stays in the sprint's definition but out of the Commander's drivable set. In the `0200-flip` capstone, `pj` is marked `defer` to mean "driven in the shaping session, not by the cold-boot Commander". That defers it from one driver; it is not dropped from the sprint. + +## Adopting it for your own workflow + +You do not need any launcher feature to use this. Add a roadmap file above your workflow, stamp `sprint` / `group` / `sprint-readiness` on the entities you group, and read membership with `status --where`. The construct buys you the strategy/state separation and the shaping-FO/Commander discipline; it costs you a convention you maintain by hand. Spacedock runs it on itself precisely to learn whether the convention earns a graduation to first-class support before any code is written for it. diff --git a/docs/site/concepts/gates-and-decisions.md b/docs/site/concepts/gates-and-decisions.md index d4f45610..a17619cb 100644 --- a/docs/site/concepts/gates-and-decisions.md +++ b/docs/site/concepts/gates-and-decisions.md @@ -1,3 +1,74 @@ # Gates & decisions -[TODO] — what a gate carries, the three calls (approve / redo with feedback / reject), feedback cycles and the loop cap, and the detached adversarial audit. +A gate is the decision point at the end of a stage where nothing advances without your vote. When a stage is marked `gate: true` in the workflow README, the first officer stops after the worker completes, renders a gate review, and waits for you. It never self-approves. This page covers what a gate review carries, the three calls you make, how feedback cycles loop and where they cap, and the detached adversarial audit that backs high-stakes validation. + +## What a gate carries + +The first officer presents a gate review only after it has read the worker's `## Stage Report`, checked every dispatched item, and counted the results. The review is the first officer's own prose with a fixed spine. The first three lines and the last line carry the decision; everything between is supporting evidence. If you stop reading after line three, you can still vote. + +A gate review looks like this: + +``` +Gate review: {entity title} — {stage} +Chosen direction: {one-line summary of the worker's approach, or n/a} +Recommend {approve | reject: {one-line reason}}. + +Checklist (from ## Stage Report in {entity_file_path} lines {start}-{end}): +- DONE: {≤10-word gist} +- SKIPPED: {gist} — {reason} +- FAILED: {gist} — {reason} + +Reviewer findings + Material: {fact-corrections, contract violations, missing AC evidence} + Polish: {wording, format drift, non-blocking suggestions} + +Assessment: {N} done, {N} skipped, {N} failed. + +Decision: {what approval/rejection does in concrete terms}. +``` + +Read it as follows: + +- **`Chosen direction` names what the worker picked**, so you do not have to open the entity file to learn it. Ideation picks an approach; validation picks PASS or REJECTED. Stages with no choice to make show `n/a`. +- **`Recommend` is the first officer's verdict, stated exactly once.** It does not reappear restated elsewhere in the review. +- **`Checklist` is a gist roll-up, not the report.** The full `## Stage Report` is cited by file path and line range; open it when you want the detail. +- **Reviewer findings split into `Material` and `Polish`.** Material items (fact-corrections, contract violations, missing acceptance-criterion evidence, claims the codebase contradicts) are the ones that should move your vote. Polish is non-blocking. An empty tier is dropped. +- **`Decision` names what your vote does in concrete terms.** For example, "approve to enter implementation in worktree `.worktrees/...`" or "reject to bounce back to {feedback-to target}". + +At every gate the first officer also runs an acceptance-criteria cross-check: it scans `## Acceptance criteria`, confirms each `**AC-N**` has evidence cited from this or a prior stage report, and names any criterion left without evidence. + +## The three calls + +You answer a gate with one of three calls. + +- **Approve.** The entity advances to the next stage and the first officer dispatches it, reusing the live worker when it can and dispatching fresh otherwise. If the next stage opens or closes a worktree, the `Decision` line told you so. Approving the terminal stage runs the merge and cleanup ceremony. +- **Redo with feedback.** You approve the direction but send concrete fixes back. Name the specific asks ("tighten the AC-2 substring assertion, correct the file path claim"), not "address the reviewer's notes". The first officer routes your asks back to the stage that owns the work, the worker re-does it, and the gate is re-presented. +- **Reject.** At a stage with a `feedback-to` target, rejecting bounces the work back to that target stage to be fixed and re-validated; this is the feedback cycle below. At a stage without `feedback-to`, rejection is terminal for that path. + +A redo and a reject at a `feedback-to` stage run the same routing machinery. The difference is whether you are correcting a direction you accept or sending it back. Both name the concrete fix asks so the next worker has something to act on. + +## Feedback cycles and the loop cap + +When a feedback stage recommends `REJECTED`, or you reject at a `feedback-to` stage, the work routes back to the stage named in `feedback-to`: the stage that owns the fix, not the reviewer that flagged it. In the dev workflow, `validation` has `feedback-to: implementation`, so a rejected validation sends the deliverable back to implementation, not back to the validator. + +The first officer tracks each round in a `### Feedback Cycles` section in the entity body, then: + +1. Reads the rejected stage's `feedback-to` target. +2. Routes your concrete findings to that target, reusing the live worker in the same worktree when it is still addressable and reuse conditions pass, dispatching fresh otherwise. The routed message carries the fix work and the stage assignment, not just an acknowledgment. +3. Re-runs the reviewer after the fix. +4. Re-enters the gate flow with the updated result, presenting you a fresh gate review. + +**The loop caps at three.** On cycle 3 the first officer escalates to you instead of bouncing a fourth time. The same fix has now failed twice, so the call returns to a human rather than looping. This cap is exercised by the `feedback-3-cycle-escalation` runtime scenario, which asserts the first officer escalates on the third rejected validation rather than auto-bouncing again. + +## The detached adversarial audit + +For high-stakes surfaces, a passing validation is necessary but not sufficient. Before merging, the first officer also runs a read-only adversarial audit. The audit catches the hole that validation cannot see itself: a test that passes today but would also pass on a broken future edit. + +The audit triggers on four surfaces: the front-door launcher (`spacedock claude` / `codex` / `doctor`), the `status` mutation and guard paths, the shipped contract and scaffolding, and the CI and release machinery. Routine, low-blast-radius changes do not need it; a normal validation suffices. + +It runs on a separate throwaway checkout, never the implementation worktree, and never mutates the deliverable. The auditor tries to refute the validation: it constructs an adversarial edit that the deliverable's own tests should catch and confirms they do. A test that stays green under an edit that breaks the claim is a hole. Findings come in two tiers, `Material` and `Polish`; "refuted nothing material" is a valid recorded outcome. + +Results feed the same gate machinery you already know: + +- **Material findings route back through the normal validation-to-implementation feedback flow**, with a `### Feedback Cycles` entry naming the audit and its adversarial edit. The gate is not presented as clean until they are closed. +- **A clean audit is noted in the gate's reviewer-findings block**, or as a one-line "detached audit: no material findings". diff --git a/docs/site/concepts/operating-model.md b/docs/site/concepts/operating-model.md index d9d4b9b3..6276ad59 100644 --- a/docs/site/concepts/operating-model.md +++ b/docs/site/concepts/operating-model.md @@ -1,3 +1,41 @@ # The operating model -[TODO] — the three roles (Captain, First Officer, Ensign), the shaping/driving split, and how batched, evidenced decisions serve the captain. +Spacedock runs on three roles and one division of labor: you shape the work and make the calls; the agents drive each item through its stages and bring decisions back to you with evidence. This page names the roles, the line between shaping and driving, and why decisions arrive batched. + +## Three roles + +| Role | Who | What they own | +|------|-----|---------------| +| **Captain** | You. | The mission, and the call at every approval gate unless you delegate it. | +| **First Officer** | The orchestrator agent. | Running the workflow: dispatch, gate presentation, advancing entity state. | +| **Ensign** | The worker agent. | Moving one entity (one work item) forward through one stage. | + +There is one captain and one first officer per session. The number of ensigns tracks the dispatchable work: the first officer dispatches one per entity per stage. + +The first officer reads the workflow README, runs `spacedock status --next` to find entities ready to advance, and dispatches an ensign for each. An ensign reads its assignment, does the stage's work, commits, writes a stage report, and signals done. The first officer reviews that report against the checklist it dispatched. If the stage is gated, it pauses and presents the report to you. If not, it advances the entity and dispatches the next stage itself. A completed non-gated, non-terminal stage is not a stopping point. + +## Shaping versus driving + +The captain shapes; the agents drive. These are different jobs, and the split is what keeps you out of the per-step loop. + +**Shaping is defining what good looks like before the work runs.** You set the mission, the stages, and the bar each stage must clear, all declared in the workflow README rather than negotiated mid-task. You commission a workflow with [`/spacedock:commission`](../running-workflows/commission.md), and you make the calls at gates: approve, redo with feedback, or reject. Some gates you answer yourself; others resolve through a delegated agent review. That is the whole of the captain's standing job. + +**Driving is moving entities through the declared stages.** The first officer schedules and dispatches; the ensign does the stage work and proves it. The first officer is allowed to take obvious reversible steps without asking: a dispatch the workflow already permits, a status read, a routine state transition. It asks you only when requirements are materially ambiguous, a design choice would change the output meaningfully, or scope is too unclear to turn into concrete criteria. Everything else happens without a prompt to you. + +The line holds because of one rule: the maker does not judge its own work. Review runs as a separate stage with fresh context and no access to the ensign's reasoning (see [Gates and decisions](gates-and-decisions.md)). The first officer never self-approves a gated stage. + +## Batched, evidenced decisions + +Decisions reach you batched and backed by evidence, not as a stream of interruptions. This is the point of the model: your attention is the bottleneck, so the agents queue work and surface only the calls that need a human. + +**Batch the work; decide as it flows back.** Queue many entities at once. Ensigns advance each through its stages in parallel. You handle gates as they surface, not one session at a time, and not on the agent's schedule. While one entity waits on a clarification, the first officer keeps dispatching the others. + +**Every gate carries evidence.** When the first officer presents a gate, it does not hand you the transcript. It renders a fixed gate-review format: the chosen direction in one line, a single clear recommendation you can approve with one "yes", a gist roll-up of the stage report's `DONE`/`SKIPPED`/`FAILED` items cited by file path and line range, and any reviewer findings split into `Material` and `Polish` tiers. The target is 15-25 lines. You decide on the evidence and the bar, not on a wall of output. + +**The decision leaves a trail.** Each gate records the verdict and its reason alongside the stage report in the entity file. The record outlives the reviewer, so a bad result traces back to the call that caused it. When you end a session, [`/spacedock:debrief`](../running-workflows/debrief-and-refit.md) captures what happened (commits, state changes, decisions, open issues), and the next session picks up from it. + +## Where to go next + +- [Workflows and entities](workflows-and-entities.md) covers the directory and files the roles operate on. +- [Stage lifecycle](stage-lifecycle.md): how an entity moves backlog → ideation → implementation → validation → done. +- [Gates and decisions](gates-and-decisions.md) lays out what a gate review carries, the three calls you make, and the detached adversarial audit. diff --git a/docs/site/concepts/stage-lifecycle.md b/docs/site/concepts/stage-lifecycle.md index c3ef0d28..6e719044 100644 --- a/docs/site/concepts/stage-lifecycle.md +++ b/docs/site/concepts/stage-lifecycle.md @@ -1,3 +1,67 @@ # The stage lifecycle -[TODO] — the stages (backlog → ideation → implementation → validation → done), fresh context at validation, worktree vs. inline, and what each stage declares. +An entity moves through a fixed chain of stages, one at a time, and each stage declares the work it owns and the proof it must produce. The dev workflow's chain is `backlog → ideation → implementation → validation → done`; your own workflow names its own stages, but the mechanics are the same. The first officer advances an entity stage by stage, dispatching one ensign per stage and pausing at the gates you declared. + +The stage order, names, and per-stage properties live in the workflow README's frontmatter under `stages.states`. This page uses the dev workflow (`docs/site/contributing/development-workflow.md`) as the running example; read that page for the full per-stage Inputs/Outputs/Good/Bad detail. + +## What a stage declares + +Each entry under `stages.states` is a stage name plus a set of boolean or string properties. The first officer reads these to decide how to dispatch and when to stop. A `stages.defaults` block sets the baseline; a stage entry overrides it. The properties that change behavior: + +| Property | Effect | +|----------|--------| +| `initial: true` | The stage an entity starts in. The dev workflow marks `backlog`. | +| `terminal: true` | The stage an entity ends in. Reaching it runs the merge and cleanup ceremony, not another dispatch. The dev workflow marks `done`. | +| `gate: true` | The first officer presents a stage report and waits for your decision instead of advancing on its own. | +| `worktree: true` | The stage's work runs in an isolated git worktree. Absent or `false`, it runs inline. | +| `fresh: true` | The stage always gets a freshly dispatched ensign, never a worker reused from the prior stage. | +| `feedback-to: {stage}` | On rejection, work routes back to the named stage rather than failing outright. | +| `concurrency: N` | How many entities may sit in this stage at once. | +| `agent: {name}` | Which worker skill the first officer dispatches. Defaults to `ensign`. | + +Beyond these properties, the prose of each stage's `###` subsection in the README is the stage definition: its Inputs, Outputs, and the Good/Bad bar. The first officer copies that subsection verbatim into the ensign's assignment, so what a stage declares in prose is exactly what the worker is told to do. + +## The stages + +Read the chain as a pipeline: each stage takes the prior stage's output as its input, and the bar rises from "is this clear?" to "is this proven?". + +- **`backlog`, the seed.** An entity enters here when first proposed: a title, a source, a brief description, and the test gates future stages must satisfy. No design work yet. `initial: true`, so this is where every new entity starts. The dev workflow also marks it `gate: true`, so the first officer presents a new entity for your go-ahead before it advances. +- **`ideation`, the design.** A worker clarifies the problem, explores approaches, and produces a fleshed-out body: problem statement, proposed approach, acceptance criteria, and a test plan. Each acceptance criterion names how it will be checked. This stage is `gate: true` in the dev workflow, so the first officer presents the design for your approval before any code is written. +- **`implementation`, the deliverable.** A worker produces the change the entity describes: code, fixtures, instruction text, on-disk state. This stage is `worktree: true`, so the work happens in an isolated checkout. Implementation completing is not a stopping point. A completed, non-gated stage routes straight on to validation. +- **`validation`, the independent check.** A worker verifies the deliverable against the acceptance criteria. It checks what was produced; it does not produce the deliverable itself. It reproduces the evidence each `AC-N` cites and returns a `PASSED` or `REJECTED` recommendation. This stage is `worktree: true`, `fresh: true`, `feedback-to: implementation`, and `gate: true`. +- **`done`, the verdict.** Validation is complete and you approve the result. The entity is closed with a `verdict` of `PASSED` or `REJECTED` and a `completed` timestamp. `terminal: true`. + +## Fresh context at validation + +Validation declares `fresh: true` because the reviewer must not be the maker. The first officer normally reuses a live worker across consecutive stages to save context, but a `fresh: true` stage forces a new dispatch every time. The validator arrives without the implementer's reasoning in its context, sees only the entity body and the deliverable, and pushes back on thin evidence. This is the mechanism behind the README's claim that "the agent doesn't get to judge its own work." + +When validation recommends `REJECTED`, `feedback-to: implementation` routes the concrete finding back to the implementation stage for rework rather than closing the entity. The entity re-enters implementation, the finding is addressed, and a fresh validator checks it again. A hard cap on feedback cycles prevents an endless bounce; on the third cycle the first officer escalates to you. + +## Worktree vs. inline + +A stage runs in an isolated git worktree when it declares `worktree: true`, and inline at the repo root otherwise. This is the "isolation when it matters" tradeoff: stages that mutate shared state (implementation, validation) get their own checkout so concurrent entities don't collide; lighter stages that only edit the entity body (backlog, ideation) run inline. + +The mechanics, run by the first officer: + +- **On first dispatch to a worktree stage,** the first officer creates a worktree at `.worktrees/{worker_key}-{slug}` on branch `{worker_key}/{slug}` and records the path in the entity's `worktree` frontmatter field. +- **Inside a worktree-backed stage,** the ensign keeps all reads, writes, and commits under that worktree. The deliverable is isolated there until the entity terminalizes. +- **In a split-root workflow** (the README declares a `state:` checkout, e.g. `state: .spacedock-state`), the entity body and stage report live in the state checkout, not in the worktree; the worktree isolates only the deliverable. The first officer's dispatch hands the ensign the correct state-checkout path for the entity; the worker trusts that path rather than writing entity state into the worktree. +- **At the terminal stage,** the first officer merges the worktree branch, clears the `worktree` field, removes the worktree, and deletes the local branch. + +To see where each entity sits and which are ready to advance, read the workflow state: + +```bash +spacedock status --workflow-dir docs/dev +``` + +```bash +spacedock status --workflow-dir docs/dev --next +``` + +`--next` lists the entities ready for dispatch, the query the first officer runs each loop. The `worktree` column shows the isolated checkout path for any entity currently mid-stage in a worktree-backed stage. + +## Where to go next + +- The roles that drive this pipeline (captain, first officer, ensign) are in [the operating model](operating-model.md). +- The decision points at stage boundaries are covered in [gates and decisions](gates-and-decisions.md). +- The entity frontmatter these stages update is in the [frontmatter contract](../reference/frontmatter-contract.md). diff --git a/docs/site/concepts/worked-example.md b/docs/site/concepts/worked-example.md index 111b3021..d0c11692 100644 --- a/docs/site/concepts/worked-example.md +++ b/docs/site/concepts/worked-example.md @@ -1,3 +1,123 @@ # A worked example -[TODO] — a walkthrough of one concrete, in-repo Spacedock workflow from commission to a delivered entity, tracing a single work item through the stages with real artifacts. This is the slot the `readme-real-workflow-example-link` task (e6) fills. +This page traces one real entity, `z9` `codex-plugin-auto-install`, from +backlog through to `done` / PASSED, using artifacts that live in this repo. It is +a concrete read of the abstract stage machine: backlog → ideation → +implementation → validation → done, the gates between them, and what the captain +actually decides at each one. + +The workflow is `docs/dev` (the Spacedock v1 dev workflow); its stages, gates, +and entity schema are defined in +[the development workflow reference](../contributing/development-workflow.md). +Runtime entity state lives in a separate `.spacedock-state` checkout, so the +finished entity itself is not in the main tree, but its full trajectory is on +the record in the `0198-pre-flip-hardening` sprint directory: `index.md`, +`dispatch-sprint-execution.md`, `debrief.md`, and `post-sprint-audit.md`. + +`z9` delivered front-door Codex plugin auto-install: `spacedock codex` now +installs a missing plugin then launches, the Codex analog of the Claude path. It +shipped as [PR #329](https://github.com/spacedock-dev/spacedock/pull/329). + +## See where it sits + +Each first-officer loop starts by asking the workflow what is ready to move. You +run the same query: + +```bash +spacedock status --workflow-dir docs/dev --next +``` + +Lists the entities ready to dispatch. To scope to one sprint, filter with +`--where`: + +```bash +spacedock status --workflow-dir docs/dev \ + --where sprint=0198-pre-flip-hardening --where 'sprint-readiness != defer' +``` + +This is the sprint membership query, the source of truth, not a hand-kept list. +At the start of `0198`, `z9` shows up here in the `binary-ux` group. + +## backlog → ideation: shape the work + +`z9` enters backlog as a seed: a title, a source, and a brief description of the +problem. It carries no design yet. Ideation is where a worker fleshes it out into +a problem statement, a proposed approach, entity-level acceptance criteria, and a +test plan, with each AC naming a check outside the entity body that can fail. + +For `z9` that meant a concrete approach: install through the shared `devBranch` +rather than a hardcoded `"next"`, so the install channel tracks the release +channel (`next` today, `main` after the flip). The design also resolved the +riskiest unknown up front (that the install branch was the channel variable, not +a literal) and recorded it before committing to the rest of the plan. + +Ideation ends at a **gate**. Because the dev workflow marks `ideation` with +`gate: true`, the first officer does not advance on its own: it presents the +design to the captain. The captain approved `z9`'s ideation on 2026-06-08. That +approval is the entry condition for implementation, and it is captured in the +sprint package so a later Commander session drives implementation directly +without re-presenting the gate. + +## implementation: produce the deliverable + +Once the design is approved, `z9` moves to implementation. The dev workflow runs +this stage in a `worktree` (`worktree: true`), so the dispatched ensign works in +an isolated checkout, not the shared tree. + +The dispatch package gave the implementer three concrete build notes, all of +which the work honored: + +- **Install on the shared `devBranch`**, via `ops.Install("codex", + marketplaceSource, devBranch)` and `--ref `, not a hardcoded + `next`, so the channel tracks the flip's later `devBranch` retarget. +- **Fix the now-false comments and error strings** it builds around + (`host_exec.go`, `frontdoor.go`), rather than adding new code around stale text. +- **Invert the obsolete test** `TestCodexFrontDoorNoPluginFailsFastWithoutInstalling`, + whose old assertion contradicted the new auto-install behavior. + +Implementation is complete when the deliverable is committed and the stage report +is filed. It is not a parking spot: a completed implementation routes straight to +`validation` dispatch. + +## validation: verify against the criteria + +`z9` moves to validation to be checked, not finished. The dev workflow marks +`validation` with `fresh: true` and `worktree: true`, so a fresh validator (one +that did not write the code) runs in its own worktree. It pulls every `AC-N` +from the entity's acceptance-criteria section, reproduces the evidence each one +cites, and produces a PASSED or REJECTED recommendation. + +`z9` is a front-door change, a high-stakes surface, so validation alone was not +sufficient. The dev workflow requires a **detached adversarial audit** for the +launcher front door: a read-only pass on a throwaway checkout of the merge result +that tries to refute the validation. The `z9` audit ran at commit `0b714fac` +and exercised five mandated probes, each reddening the test suite then reverting. +It refuted nothing material. Channel-tracking was confirmed clean. That clean +audit satisfied the sprint's DoD item for the high-stakes surface. + +Validation also has `gate: true` and `feedback-to: implementation`. A REJECTED +recommendation routes the finding back to implementation for another cycle; a +PASSED recommendation goes to the captain for the terminal decision. + +## done: the captain's verdict + +`z9` reaches `done` when the captain reads the validation report and approves. +The terminal stage records the outcome in frontmatter: `status: done`, +`verdict: PASSED`, and a `completed` timestamp. The sprint's `post-sprint-audit.md` +confirms `z9` finished `status: done`, `verdict: PASSED`, archived, with PR +reference #329, alongside its three sprint siblings (`kb` #327, `qa` #328, +`vh` #330). + +That is the whole point of the machine: nothing reached `done` on assertion +alone. `z9` advanced only on an approved design, an isolated implementation, a +fresh validation, a detached audit that failed to break it, and an explicit +captain verdict. Each one a decision, each one on the record. + +## Read the trail yourself + +The full trajectory is reconstructable from the sprint directory: + +- [`index.md`](https://github.com/spacedock-dev/spacedock/blob/next/docs/roadmap/0198-pre-flip-hardening/index.md): goal, members, definition of done, the approved-gate note. +- [`dispatch-sprint-execution.md`](https://github.com/spacedock-dev/spacedock/blob/next/docs/roadmap/0198-pre-flip-hardening/dispatch-sprint-execution.md): the per-member drive plan and `z9`'s build notes. +- [`debrief.md`](https://github.com/spacedock-dev/spacedock/blob/next/docs/roadmap/0198-pre-flip-hardening/debrief.md): what shipped, the PR links, and the decisions made along the way. +- [`post-sprint-audit.md`](https://github.com/spacedock-dev/spacedock/blob/next/docs/roadmap/0198-pre-flip-hardening/post-sprint-audit.md): the final-state confirmation and the detached-audit record. diff --git a/docs/site/concepts/workflows-and-entities.md b/docs/site/concepts/workflows-and-entities.md index 01401bbe..f8675364 100644 --- a/docs/site/concepts/workflows-and-entities.md +++ b/docs/site/concepts/workflows-and-entities.md @@ -1,3 +1,95 @@ # Workflows & entities -[TODO] — what a workflow is (a directory plus a README), what an entity/work item is, entity frontmatter, and the state checkout. See the [Frontmatter contract](../reference/frontmatter-contract.md) for the field reference. +**A workflow is a directory plus a README, and an entity is one markdown file inside it.** The README defines the stages, the schema, and the gates; each entity is a work item that moves through those stages. Everything about a work item lives in the file itself: the problem, the design notes, the bar for done, the stage reports. State survives a session, so the next one picks up where you left off. + +This page covers what those two things are, the frontmatter on an entity, and where the entity files actually live at runtime. + +## The workflow: a directory and its README + +The README is the single source of truth. Its frontmatter declares the stages, the entity type, and the ID style; its prose body defines what each stage means, what counts as good, and what a worker must produce. You commission a workflow with `/spacedock:commission`, which generates the directory, the README, and a few seed entities for you. + +A minimal README frontmatter looks like this: + +```yaml +--- +commissioned-by: spacedock@0.20.0 +entity-type: task +entity-label: task +entity-label-plural: tasks +id-style: sd-b32 +stages: + defaults: + worktree: false + concurrency: 3 + states: + - name: backlog + initial: true + gate: true + - name: implementation + worktree: true + - name: validation + fresh: true + feedback-to: implementation + gate: true + - name: done + terminal: true +--- +``` + +Each `states` entry is one stage. The per-stage flags decide behavior: + +- **`initial: true`** marks where a new entity starts; **`terminal: true`** marks where it ends. +- **`gate: true`** makes the stage end at a gate: the first officer pauses and presents a decision instead of advancing on its own. +- **`worktree: true`** runs the stage in its own git worktree, so stages that touch shared state stay isolated. Lighter stages run inline. +- **`fresh: true`** dispatches the stage with no access to the prior worker's reasoning. Use it for review stages so the maker doesn't judge its own work. +- **`feedback-to: `** names where rejected work bounces back to for revision. + +The README body documents each stage with `Inputs`, `Outputs`, `Good`, and `Bad`. That prose is the living spec; it is what every dispatched ensign works to. Tighten it to your actual bar before the first dispatch; editing it after agents have run against vague prose costs more. + +## The entity: one work item + +**An entity is a single work item, stored as a markdown file with YAML frontmatter.** Each entity lives as either a flat file `{slug}.md` or a folder `{slug}/index.md`. Use the folder form when reports or artifacts accumulate beside the work item. Slugs are lowercase with hyphens, no spaces (`add-login.md` or `add-login/index.md`). `spacedock status` reads both forms. + +The body holds the human-readable record: a description, a problem statement, the proposed approach, acceptance criteria, and the stage reports filed as the entity advances. + +## Entity frontmatter + +The frontmatter is the machine-readable state. The full schema lives in the workflow's own README; the [frontmatter contract](../reference/frontmatter-contract.md) is the field reference across workflows. The fields you set and read most often: + +| Field | What it holds | +|-------|---------------| +| `id` | The unique identifier; format set by `id-style` in the README. | +| `title` | Human-readable name. The filename slug is derived from it. | +| `status` | The current stage, one of the stage names declared in the README. | +| `source` | Where the entity came from (e.g. `commission seed`, `linear`). | +| `started` / `completed` | ISO 8601 timestamps for when work began and when the entity reached terminal. | +| `verdict` | `PASSED` or `REJECTED`, set at the final stage. | +| `score` | Optional priority, 0.0–1.0. | +| `worktree` | The worktree path while a dispatched agent is active; empty otherwise. | +| `issue` | Optional external ticket reference, e.g. `ENG-123`, `kata:task-abc123`, or `owner/repo#42`. | + +`status` is the field that drives everything: the first officer reads it to decide which entities are ready to advance. To see the queue, run the status viewer against the workflow directory: + +```bash +spacedock status --workflow-dir docs/dev +``` + +To list only the entities ready for dispatch, run the query the first officer runs each loop: + +```bash +spacedock status --workflow-dir docs/dev --next +``` + +## Where entities live: the state checkout + +**A workflow can keep its mutable entity state separate from the README using a state checkout.** The README frontmatter declares it with one field: + +```yaml +state: .spacedock-state +``` + +With this set, the README (the living spec) stays on your code branch, while the entity files, their reports, and the archive live under `.spacedock-state`. Routine stage transitions then never churn the code branch or collide with a feature PR. The path is resolved relative to the README's directory; `spacedock status` reads stages from the README and entities from the state checkout, and writes frontmatter updates and archive moves into the state checkout. + +The state checkout is a linked git worktree on an orphan branch in the same repo, so state commits land on that branch and the code branch never sees them. On a fresh clone the state worktree is absent. Run `spacedock state init` to fetch the orphan branch and re-add the linked worktree before working the workflow. + +Omit `state:` (or set `state: $inline`) for a standalone workflow that isn't embedded in a code repo you ship from. Then the entities live beside the README in the same directory, with no orphan branch and no extra checkout. diff --git a/docs/site/contributing/adding-a-runtime.md b/docs/site/contributing/adding-a-runtime.md index 58211580..27cd5ba3 100644 --- a/docs/site/contributing/adding-a-runtime.md +++ b/docs/site/contributing/adding-a-runtime.md @@ -1,3 +1,58 @@ # Adding a runtime -[TODO] — what "supported" means, adding support in layers (skill adapters → dispatch mode → registries → launch/install UX → live runner), and the "manifesting from void" operating prompt. The full guide is [Multi-host support](../reference/multi-host.md). +A host is supported when a live or fixture-backed run launches it as a first officer, dispatches an ensign through that host's native agent mechanism, and verifies durable workflow state: process exit, entity body, state-checkout git log, and clean status. A host is not supported because its instructions mention Spacedock, and a substring search over code or prose is not proof of behavior. Spacedock ships `spacedock claude` and `spacedock codex` as proven front doors; adding a host means earning the same level of proof. + +This page is the contributor's orientation. The full procedure, the exact checklists, and the worked Pi example live in [Multi-host support](../reference/multi-host.md). Read that before you write code. + +## What "supported" means + +The supported claim has four observable parts. The runtime must be able to: + +- **Launch** the host as the first officer. +- **Delegate** to an ensign through the host's own subagent or team mechanism. +- **Observe** that the ensign completed. +- **Verify** durable state: process exit, entity body, state-checkout git log, and clean status. + +If you cannot demonstrate all four against a real or fixture-backed run, the host is a spike, not a runtime. + +## Add support in layers + +Add support in small layers, each with its own proof at its own abstraction level. The order matters: every later layer assumes the earlier one is proven. + +1. **Skill adapters.** Add `skills/first-officer/references/-first-officer-runtime.md` and `skills/ensign/references/-ensign-runtime.md`, and wire both from the matching `SKILL.md` runtime-adapter section. The adapter must name the host's native mechanism. Do not emulate Claude `Agent`, `SendMessage`, `TeamCreate`, or `TeamDelete` unless the host actually provides those tools. +2. **Dispatch host mode.** Teach `spacedock dispatch build` to accept `host: ""` when the assignment shape differs by host. Keep entity and worktree paths explicit, especially for split-root workflows (`state: .spacedock-state`). Test the positive shape and the banned-tool negative case. +3. **Runtime contracts and registries.** For long-lived workers, define the minimum worker record (label, substrate, run/session handle, entity, stage, state, completion epoch) and reject stale completion evidence: a previous completion must never satisfy a later assignment. For a host team API, adapt Spacedock lifecycle intents to the host's native action schema. +4. **Launch/install UX.** Add `spacedock ` only after a manual or live harness proves the runtime path. Add `spacedock install --host ` only when the install path is known and checkable without mutating unrelated global host state. Add `spacedock doctor --host ` when there is a manifest, package, or runtime health check to verify. +5. **Live runner.** Prove the host with a live-gated test when the claim is runtime integration. Use a temp workflow fixture, isolated host config and session directories, and copied credentials rather than global host state. Assert process exit, entity content, git log, and clean status. Never pass on transcript phrasing. + +## Match the proof to the claim + +Use the smallest proof at the same abstraction level as the claim: + +- **Text claim** (an adapter mentions the right tool): parse or inspect the real instruction files. +- **Dispatch shape claim:** run `spacedock dispatch build` with a fixture and inspect the emitted JSON or body. +- **Adapter claim:** table-test lifecycle intents to exact host-native payloads. +- **Registry claim:** unit-test persistence and stale-epoch rejection. +- **Runtime claim:** a live-gated host run that mutates a temp workflow and verifies durable state. + +The failure mode to guard against: declaring a host supported because its prose looks right. Substring presence is acceptable proof only when the claim itself is about text being present or absent. + +## Manifesting from void + +When a runtime looks unsupported on first contact, do not read setup friction as proof the product path is impossible. A missing `auth.json`, an extension not auto-discovered in a temp home, or a subagent tool schema that differs from Claude's is harness work, not a blocker. A real blocker is a proven inability to launch, delegate, observe completion, or verify durable state *after* the harness is correct. + +Run the first implementation and validation loop under a deliberate "assume it works" prompt so the loop irons out auth, package paths, and tool-shape mismatches before anyone declares a wall: + +```text +Assume support is supposed to work. Do not treat missing polish, auth setup friction, or tool-shape mismatch as proof the runtime is impossible. In first-officer capacity, iron out the frictions: + +- if auth is missing in an isolated harness, copy/reuse the existing host auth file correctly; +- if the dispatch substrate needs a local package/extension path, wire it explicitly; +- if the host tool shape differs from Claude/Codex, adapt to the host-native contract rather than emulating Claude tools; +- if a live test fails due to harness setup, fix the harness and rerun; +- only stop for a real product/design blocker, not for first-contact setup friction. +``` + +The prompt earns its place by changing the default interpretation of a failure: harness work gets fixed in-loop, and only a proven product or design blocker stops the work. + +The worked Pi runtime is in [Multi-host support](../reference/multi-host.md): the live smoke mechanism, the exact parent prompt, the install and doctor surface, and the full acceptance checklist. diff --git a/docs/site/contributing/architecture-notes.md b/docs/site/contributing/architecture-notes.md index 9a5bf3eb..d7c55ff0 100644 --- a/docs/site/contributing/architecture-notes.md +++ b/docs/site/contributing/architecture-notes.md @@ -1,3 +1,44 @@ # Architecture notes -[TODO] — the project shape (`cmd/`, `internal/cli`, `internal/status`, `docs/specs`, `skills/`), the design contracts under `docs/specs/`, and the runtime live CI model. See [Proof policy](proof-policy.md). +Spacedock is a Go binary plus a set of harness skills, and the two halves divide cleanly: the binary owns state and command behavior, the skills own orchestration prose. This page maps the project shape, the design contracts under `docs/specs/`, and the runtime live CI model that proves the orchestration actually behaves. For the proof discipline these all serve, see [Proof policy](proof-policy.md). + +## Project shape + +The binary is small Go packages with narrow boundaries; the orchestration lives in markdown skills the host loads. + +- **`cmd/spacedock/`** holds the process entry point only (`main.go`). A second entry point, `cmd/spacedock-release/`, drives release cutting. +- **`internal/cli/`** owns command routing, usage text, and exit-code behavior. The front-door verbs (`spacedock claude`, `spacedock codex`, `spacedock pi`, `spacedock doctor`) launch a host with the Spacedock plugin loaded; `init`, `new`, and `status` are the workflow-facing verbs. Path resolution and launch shape live here, not in the skills. +- **`internal/status/`** is the `status` implementation: frontmatter parse, stage enumeration, the read queries (`--next` dispatch, `--resolve`, `--short-id`, `--validate`), mutation (`--set`, `--archive`), and the guards that refuse an unsafe mutation. Output is held stable by golden fixtures under `internal/status/testdata`. +- **`docs/specs/`** holds the design contracts (see below). +- **`skills/`** holds the host-loaded orchestration skills: `commission/`, `survey/`, `debrief/`, `refit/`, `first-officer/`, `ensign/`, `present-gate/`, and `feedback-rejection-flow/`. Each is a `SKILL.md` (some with `references/` and `bin/`). Skill instructions call `spacedock status`, never a plugin-private script path. The binary owns path resolution and mutation guards, and the skills stay declarative. + +Other `internal/` packages support these: `internal/contract` and `internal/contractlint` (the shipped contract and its structural lints), `internal/ensigncycle` (the runtime live scenario surface), `internal/dispatch`, `internal/safehouse` (the `.safehouse` sandbox profile), and `internal/release`. + +The division is deliberate: a behavior that can be guarded by the binary or a failing test belongs in the binary, not in a sentence in a skill file. + +## Design contracts under `docs/specs/` + +`docs/specs/` holds the contracts downstream code cites instead of re-deriving. Two are current. + +- **`state-behavior-extension.md`** defines the split-root storage profile. A development workflow keeps its README in the main repo and its mutable entities in a per-workflow `.spacedock-state` checkout, so shared issues advance without noisy state commits on the code branch. The README's `state: .spacedock-state` frontmatter field names the checkout, resolved relative to the README directory. The spec fixes the v0 layout (entities directly under `.spacedock-state`, no `entities/` directory; `_archive/` and `_debriefs/` siblings) and the mutation rules: reads compose the main README's stages with the checkout's entities, while `--set` and `--archive` write only inside the checkout. +- **`scenario-testing-principles.md`** sets out the semantic model for scenario testing. A *scenario* is a natural-language behavioral spec graded on **durable outcomes** (entity state before → after, archive state, on-disk artifacts, durable user-facing output), never transcript phrasing. An *executor* is a pluggable implementation of that check: a **codified** executor (a deterministic Go fixture/unit test, proving the modeled consumer) or an **LLM** executor (a real Claude/Codex run, proving the real producer). The two check the same scenario at different fidelity, which dissolves the recurring failure mode where an offline proof passes while the live run fails. The four seed scenario IDs declared in this spec must equal the `sharedRuntimeScenarios()` table in `internal/ensigncycle`; a lock test reds on drift in either direction. + +## Runtime live CI model + +The live lanes prove runtime behavior by launching a real headless host, observing its output, and checking the resulting workflow state. A static grep over workflow YAML or skill prose is not a substitute. This is the LLM-executor side of the scenario contract above. + +One host-neutral scenario table drives every supported host. The scenario surface lives in `internal/ensigncycle`: a host-neutral `sharedRuntimeScenarios()` table carries only runtime-neutral facts (scenario ID, old Python provenance, behavior intent) and encodes no launch, auth, plugin, or timeout field. Liveness is the runners' per-stage no-progress quiet budget (the shared streamWatcher's `quietBudgetDefault`, 60s), and a per-scenario basket timeout is banned. A per-host runner adapter (Claude and Codex today, with Pi tracked through a live/codified/gap coverage map) turns each scenario into a real launch. A parity meta-test (`TestSharedScenarioRunnerCoverage`) fails if a scenario has a runner for one host but not the other, and `TestSharedRuntimeScenarioDefinitions` reflects over the scenario type, pins the exact field set, and fails if any field names a single host. + +CI runs these in `.github/workflows/runtime-live-e2e.yml`. The offline gate job (`go test ./...`, no secrets) must pass before either live lane spends its environment approval: + +- **`claude-live`** (matrix `sonnet` and `claude-opus-4-8`): secret `ANTHROPIC_API_KEY`. Runs the full-cycle smoke and the shared suite, loading the current checkout via `spacedock claude --plugin-dir "$GITHUB_WORKSPACE"`. +- **`codex-live`**: secret `OPENAI_API_KEY`. Builds a local marketplace under `$RUNNER_TEMP` and fails if the listing names a remote `github.com`/`ref next` install instead of the local path. +- **`pi-live`**: installs `pi-coding-agent` and runs the Pi coverage guard plus the front-door smoke. + +Every live lane tests the current checkout, never a remote `--ref next` install. For the local invocation commands and the full layer-by-layer breakdown of the scenario surface, see [the development workflow](development-workflow.md). + +## See also + +- [Proof policy](proof-policy.md): why behavior is proven by exercising it, the instruction-file-read quarantine, and the detached adversarial audit. +- [The development workflow](development-workflow.md): the authoritative stage-by-stage rules, the entity field reference, and the live-suite commands. +- [Agent development](agent-development.md): the first-officer/ensign write-scope rules and the durable-state evidence discipline. diff --git a/docs/site/contributing/proof-policy.md b/docs/site/contributing/proof-policy.md index 075a2bbc..9dddb725 100644 --- a/docs/site/contributing/proof-policy.md +++ b/docs/site/contributing/proof-policy.md @@ -1,3 +1,51 @@ # Proof policy -[TODO] — behavior is proven by exercising it (not prose-grep), acceptance criteria as end-state properties with an independent source of truth, and the detached adversarial audit. The authoritative statement lives in [The development workflow](development-workflow.md). +In the Spacedock development workflow, behavior is proven by exercising it and observing a durable result, not by matching text in the files the model reads. This page summarizes the proof discipline a contributor must satisfy. The authoritative statement, stage by stage, lives in [the development workflow](development-workflow.md); when this page and that one disagree, that one wins. + +## Prove behavior by exercising it + +**A behavioral acceptance criterion is proven only by running the behavior and observing the outcome:** output bytes, exit code, resulting on-disk state, or a test that feeds many inputs and asserts uniform handling. In practice that means Go unit tests for parser and command behavior, golden fixtures for `status` output, behavior fixtures that drive the binary for command-level claims, and live workflow smoke tests when runtime behavior is the claim. + +A string, substring, or regex match over an instruction file the model reads (the first officer or ensign contract, the workflow README, a skill) never satisfies a behavioral acceptance criterion. The matched string was written by the same implementer the check is meant to police, so a passing check only asserts "the file contains the text we put in the file." It has no independent source of truth: a valid paraphrase fails it, and an inverted clause passes it ("It is a myth that the FO MUST advance…" keeps every matched substring while saying the opposite). Searching code is the same trap one level down. It asserts spelling, and false-passes on a renamed-but-equivalent branch. + +So "the contract says to run the command" never proves the agent runs it. "The skill renders the gate" is proven by invoking the skill and observing the rendered gate, never by finding the clause that asks for it. + +The one test that settles it: **does the expected value come from somewhere other than the file under test?** If no (the clause is its own expectation), the check is a tautology and is banned as proof. If yes (the file is bound to an independent source that can diverge from it), it may be a legitimate invariant. The legitimate case parses real artifacts in code and tests a relationship between independent values; for example, that the plugin manifest's contract range brackets the binary's contract version. The binary parses that manifest, so it is already outside "files the model reads," and the two versions can disagree. That divergence is exactly what lets the check fail. + +## Acceptance criteria are end-state properties + +An acceptance criterion names a property of the finished entity, not a stage action. If a draft AC reads as an imperative verb phrase, rewrite it as the end-state property it produces; the imperative belongs in the stage report's checklist instead. + +Each AC carries a `Verified by:` clause, and that clause must name something **outside the entity body** that a future reader can reproduce and that can fail: + +- a test name, +- a command's output or exit code, +- a file the change produces, or +- the resulting on-disk state. + +An AC whose only proof is review of the entity's own prose ("verified by reviewing this task's decision section") can never fail, so it is not an acceptance criterion. Every task must produce a real, checkable change: code, a fixture, on-disk state, or instruction text whose effect a separate check confirms. If a task's only output is a decision with nothing shipped, it does not belong in the dev queue; record the decision in the roadmap instead. + +Prefer an AC a code gate can enforce (a guard in the binary, a test that fails on violation) over one the agent is merely instructed to follow. Where a behavior can be guarded by the binary or a failing test, the proof is that gate, not a sentence in a skill file. + +When a task has both a text half and a behavioral half (extracting contract prose into a skill, say, or adding a contract clause), the text half is real authoring work but is not an acceptance criterion on its own, and the text check never stands in for the behavioral one. That invoking the skill renders the gate, or that the FO obeys the clause, is proven only by a live drive that runs it and observes the durable result. + +## The instruction-file-read quarantine + +**Tests do not read prompt or instruction files except in `internal/contractlint`,** and that package is limited to structural checks a machine can verify without interpreting prose: reference closure, frontmatter validity, structural absence, deduplication, line/count floors, and portability markers. The package's `doc_test.go` states the policy; `boundary_guard_test.go` and `instruction_read_detector_test.go` enforce it. The boundary guard fails on an instruction-file read outside the quarantine. + +Prose-grep is banned: a test asserting that a skill, contract, agent file, or the workflow README contains its own wording proves only that the wording is present. Code-bound prose checks are banned too: a prose-to-code consistency lint is not a behavior test and must never substitute for running the behavior. If a deleted read exposed an untested behavior that still matters, record the owed behavior test rather than keeping the read. + +## The detached adversarial audit + +For high-stakes surfaces, a passing validation is necessary but not sufficient. Before merging such a change, run or dispatch a read-only adversarial audit on a detached, throwaway checkout of the merge result. + +- **When it triggers.** The four high-stakes surfaces: the front-door launcher (`spacedock claude` / `codex` / `doctor`), the `status` mutation and guard paths, the shipped contract and scaffolding, and the CI/release machinery. Routine, low-blast-radius changes do not need it; a normal validation suffices. +- **What it produces.** The auditor works on a separate checkout (never the implementation worktree) and never mutates the deliverable. It tries to **refute** the validation: construct an adversarial edit that the deliverable's own tests should catch, and confirm they do. A test that stays green under an edit that breaks the claim is a hole. Findings come in two tiers: `Material:` for a real correctness or test-strength hole, `Polish:` for non-blocking notes. "Refuted nothing material" is itself a valid, recorded outcome. +- **How it is recorded.** Material findings route back through the normal validation → implementation feedback flow as a `### Feedback Cycles` entry naming the audit and its adversarial edit; the gate is not presented as clean until they are closed. A clean audit is noted in the gate's reviewer-findings block. + +The audit catches the class of hole where the test passes but would also pass on a broken future edit, the kind of thing validation cannot see because it trusts its own green suite. Real catches are on the record in the development workflow, including a `strings.Count(...) > 0` check that skipped on zero mentions and a bare `strings.Contains` satisfied by a negated disclaimer. + +## See also + +- [The development workflow](development-workflow.md): the authoritative stage-by-stage proof rules and the entity field reference. +- [Agent development](agent-development.md): the write-scope rules and the "prove runtime claims with durable state" discipline. diff --git a/docs/site/get-started/first-launch.md b/docs/site/get-started/first-launch.md index 2dc5c381..5659cafe 100644 --- a/docs/site/get-started/first-launch.md +++ b/docs/site/get-started/first-launch.md @@ -1,3 +1,42 @@ # Your first launch -[TODO] — launching the first officer with `spacedock claude "/spacedock:survey"`, the front-door command grammar, and what survey reports. +One command starts your first session and orients you in a project you already have: + +```bash +spacedock claude "/spacedock:survey" +``` + +This launches the first officer (the orchestrator agent that runs a Spacedock workflow) inside Claude Code, and hands it the survey task. The first launch sets up the plugin for you, so this single line is enough; you do not need a separate setup step. When a `.safehouse` profile is present in the working directory, the launch runs sandboxed automatically. + +Run it from inside a project that already has some agent history, such as a repo you have been coding in with Claude Code. Survey reads that history; an empty directory has nothing to report on. + +## The command grammar + +The front door is one shape, and every launch uses it: + +```bash +spacedock claude "task" [--safehouse…] [-- host-flags…] +``` + +- **The task comes first.** It is handed to the first officer as the launch prompt. Here the task is `/spacedock:survey`, a skill the first officer runs. It could just as well be a plain sentence describing work. +- **`--safehouse` forces the launch through the sandbox.** A `.safehouse` profile in the working directory does the same automatically, so you only pass the flag when you want to force it. +- **Anything after `--` forwards verbatim to the host** (`claude` itself), including flags like `--resume`, `--model`, and `--plugin-dir`. + +`spacedock codex "task"` and `spacedock pi "task"` take the same shape for the Codex and Pi harnesses. Claude Code is the primary surface; the examples here use it. + +## What survey reports + +Survey reconstructs what the agents in this project have implicitly been doing, then reports it back. The survey is read-only; it never edits your files. It reads your recorded agent session history (through the `agentsview` tool, which it offers to install if it is missing), and it shows you, in one pass: + +- **The inferred workflow**: the loop you have been running without naming it, written out as a stage chain. +- **The workstreams**, the distinct tracks of work, each labeled as mechanical (a routine loop) or exploration (human-driven creative work). +- **The decisions still open and waiting on you.** These are the forks that were raised but never resolved. Survey leads with them, because they are the work that is actually blocked on you. +- **How often you had to step in.** Survey counts the interruptions and decision points across those sessions, so you can see where your attention has been going. + +A typical report opens with a one-line headline naming the project, the number of sessions, the date range, and the decision and interruption counts, then lays out each section below it. If the project has no agent history, survey says so plainly and stops; there is nothing to misreport. + +## What comes next + +Survey ends with an offer, not an action. After the report, it asks whether you want it to commission a real Spacedock workflow built from what it found, turning the open decisions into approval gates and the workstreams into work items. You can say yes and let it scaffold the workflow, or say no and keep the survey as a standalone orientation. Either way, nothing has changed in your project until you choose. + +To go straight to defining a workflow yourself instead, see [your first workflow](first-workflow.md). If `spacedock` is not yet installed, start with [installing Spacedock](install.md). diff --git a/docs/site/get-started/first-workflow.md b/docs/site/get-started/first-workflow.md index 9253e286..182022b6 100644 --- a/docs/site/get-started/first-workflow.md +++ b/docs/site/get-started/first-workflow.md @@ -1,3 +1,122 @@ # Your first workflow -[TODO] — commissioning your first workflow with `/spacedock:commission`, the design and review gates, and what happens after. +A workflow is a directory of plain-text work items plus a README that defines +the stages they move through, the schema each item carries, and the gates where +you make a call. You create one by describing what you want, in plain language, +to the `/spacedock:commission` skill. This page walks that first commission end +to end: the questions it asks, the design and review gates it sets up, and what +happens once the workflow starts running. + +A few terms used below, defined on first use: + +- An **entity** is one work item: a single markdown file (the README also calls + it a "work item"). A bug report, a design idea, a feature: whatever the + workflow processes, each one is an entity. +- A **stage** is a bucket an entity sits in as it advances, for example + `ideation` or `implementation`. The first entity starts in the first stage and + moves toward a terminal one. +- A **gate** is a decision point at the end of a stage where the workflow pauses + for your call instead of advancing on its own. + +You are addressed as the captain, the workflow operator who makes the calls at +gates. The first officer is the orchestrator agent that runs the workflow; the +ensign is the worker agent that moves one entity through one stage. + +## Commission a workflow + +Run `/spacedock:commission` inside a Spacedock session and describe the work in +the same line: + +```bash +spacedock claude "/spacedock:commission Track design ideas through review stages" +``` + +If you have not launched a session yet, see +[Install Spacedock](install.md) first. You can also start bare +(`/spacedock:commission` with no description) and answer the questions from +scratch. + +The skill greets you and walks three phases: **design** (a few questions), +**generate** (it writes the files), and a **pilot run** (it starts the workflow +on your seed items). The design phase asks you three things, one question at a +time: + +1. **The mission and the entity.** What the workflow is for, and what each work + item represents: "a design idea", "a bug report", "a candidate feature". The + skill derives a short label from your answer (a "design idea" becomes an + `idea`). +2. **The stages.** It proposes an ordered list from your mission (for a design + workflow, something like `backlog → ideation → implementation → validation → + done`), and you confirm, add, remove, or rename them. +3. **Seed entities.** Two or three starting items to run through, each with a + title and a short description (and an optional score). These become the + workflow's first work. + +From your answers the skill then derives the gates: which stages pause for your +approval, and which earlier stage rejected work bounces back to. By default it +gates the stage before the terminal one. + +You do not have to get every answer right. After the questions, the skill +presents the full design as a summary (stages, gates, seed items, where the +files will live) and waits. **Nothing is generated until you accept.** Tell it +what to change and it re-presents. + +## What gets generated + +Once you accept, the skill writes the workflow into a new directory under +`docs/` and confirms each file it created: + +- `README.md`, the workflow's living spec. It holds the mission, the schema each + entity carries, and a section per stage describing its inputs, outputs, and + quality bar (`Good:` / `Bad:`). +- One file per seed entity, named from its title, with YAML frontmatter that + records its `status`, `score`, and other fields. + +The per-stage prose in the README is a best-guess starting point, not a +commitment. The skill flags this directly and offers a `review stages` walk that +steps through each stage's expectations so you can tighten the quality bar before +any work runs. Tightening here is cheap; an agent dispatched against a vague bar +is not. + +## The design and review gates + +A gate is where the workflow stops and hands you a decision instead of advancing +on its own. This is the line Spacedock draws: work flows through the stages, but +**nothing crosses a gate without a recorded decision.** A development workflow +gates the design stage and the review stage among others, so you sign off on +the approach before code is written, and on the result before it ships. + +At each gate the first officer pauses and presents a stage report: the chosen +direction, the evidence behind it, and a single recommendation. You make one of +three calls: + +- **Approve**, and the entity advances to the next stage. +- **Redo with feedback**: it goes back for revision against the notes you give. +- **Reject**, and it bounces to an earlier stage (the one the design named as the + rejection target) to be reworked. + +You decide on the report and its evidence, not on the agent's transcript. The +decision is recorded with its reason, so a result can later be traced back to the +call that produced it. + +## What happens after + +When you accept the design, the commission skill launches a pilot run on your +seed entities. It takes on the first-officer role itself for this first run: +it reads the workflow README, checks which entities are ready to advance, and +dispatches ensigns to move them through their stages. Stages that modify the +repo run in their own git worktree; lighter stages run inline. + +The run proceeds until the workflow goes idle or reaches a gate. There the first +officer stops and reports what happened: which entities moved, which stages they +passed through, which gate is waiting on you. From that point you are running the +workflow: approve, send back, or reject, and work continues. + +To resume the workflow in a later session, launch the first officer with no task: + +```bash +spacedock claude +``` + +It reads the saved workflow state, picks up where you left off, and dispatches +ensigns for any entity ready for its next stage. diff --git a/docs/site/index.md b/docs/site/index.md index 8cd4c729..7ea8ad5c 100644 --- a/docs/site/index.md +++ b/docs/site/index.md @@ -1,9 +1,40 @@ # Spacedock -**Spacedock is a multi-agent orchestrator where nothing ships without a decision.** It lives within your existing harness — Claude Code, Codex, or Pi. +**Spacedock is a multi-agent orchestrator where nothing ships without a decision.** It lives within your existing harness: Claude Code, Codex, or Pi. -[TODO] — the Home page: the pitch, what's different, and where to go next (Get started, Concepts, Running workflows, Contributing). +Spacedock breaks work into stages and surfaces the decisions each stage needs, batched for you. Each decision arrives with evidence measured against a predefined bar for what good looks like. You approve, send back, or escalate. You can also delegate the call to an agent. Either way, the decision is recorded with its evidence and reason. + +A few terms you'll meet throughout these docs: + +- A **workflow** is a directory of plain-text work item files plus a README that defines the stages, the schema, and the gates. +- An **entity** is one work item, a markdown file (or folder) that carries everything about the work: the problem, the design notes, the bar for done, and the stage reports. +- A **stage** is one step in the lifecycle; a **gate** is the decision point at its end. + +Three roles run a workflow: + +| Role | Who | +|------|-----| +| **Captain** | You. You define the mission and make the calls at gates unless you delegate them. | +| **First Officer** | The orchestrator agent that runs the workflow and reports to you at gates. | +| **Ensign** | The worker agent that moves one entity through one stage. | + +## What's different + +- **The agent doesn't get to judge its own work.** Review runs as a separate stage with fresh context, no access to the maker's reasoning. It pushes back on thin evidence and work that looks busy without proving its claim. +- **Every decision leaves a trail.** Each gate carries a stage report: findings, verdicts, artifacts, anomalies. You decide on evidence, not the transcript, and the record outlives the reviewer. +- **The bar sharpens as you use it.** Each stage declares what good means and the agent works to that line. When a standard turns out fuzzy in practice, the agent proposes an edit to the written criteria for your approval. +- **Batch the work; decide as it flows back.** Queue many entities at once. Agents advance each through its stages, and you handle gates as they surface, not one session at a time. +- **Work survives the context limit.** When an agent runs out of context, a successor carries forward what's in flight. + +## Where to go next + +- **[Get started](get-started/install.md)**: install the `spacedock` launcher and the host plugin, then make your [first launch](get-started/first-launch.md) and build your [first workflow](get-started/first-workflow.md). +- **[Concepts](concepts/operating-model.md)** covers the operating model, workflows and entities, the stage lifecycle, gates and decisions, and a worked example. +- **[Running workflows](running-workflows/commission.md)** walks through commissioning a workflow, surveying an existing project, operating a running workflow, and debriefing and refitting between sessions. +- **[Contributing](contributing/development-workflow.md)** covers the development workflow, agent development, the proof policy, and releasing. + +New here? Start with [Install](get-started/install.md). It walks a fresh install end to end and names the output to expect at each step. ## For agents using Spacedock -Spacedock's docs are read by agents too — a user's first officer parsing these docs is itself an agent. The build emits a curated `llms.txt` index of the docs at the site root for product-using agents. (Repo-development guidance for an agent working ON Spacedock lives under Contributing → [Agent development](contributing/agent-development.md).) +Spacedock's docs are read by agents too. A user's first officer parsing these docs is itself an agent. The build emits a curated `llms.txt` index of the docs at the site root for product-using agents. (Repo-development guidance for an agent working ON Spacedock lives under Contributing → [Agent development](contributing/agent-development.md).) diff --git a/docs/site/reference/command-reference.md b/docs/site/reference/command-reference.md index 765f46aa..29d24470 100644 --- a/docs/site/reference/command-reference.md +++ b/docs/site/reference/command-reference.md @@ -1,3 +1,182 @@ # Command reference -[TODO] — the `spacedock` command surface grouped into Launch (`claude`/`codex`/`pi`), Setup (`install`/`doctor`), and Workflow (`status`/`new`/`state`/`completion`/`dispatch`), plus `--version` and the `status` query forms. +The `spacedock` binary has ten subcommands in three groups (Launch, Setup, and +Workflow), plus a top-level `--version`. Run `spacedock` with no arguments to +print the grouped help; run `spacedock --help` for a command's own +flags. An unknown command or a stray leading flag exits 2 with a diagnostic on +stderr, and an unknown command resolution under cobra also exits 2. The verbs are +registered in `internal/cli/cli.go`. + +## --version + +```bash +spacedock --version +``` + +Prints the binary version and the contract level, e.g. `spacedock 0.20.0 (contract 1)`. +The `(contract N)` token is load-bearing: the first-officer and ensign skills +read it to check the launcher and the installed plugin agree. The version string +defaults to the `dev` sentinel and is overwritten by the release pipeline's +linker stamp, so an unstamped `go build` reads as a dev build rather than +impersonating a release. + +## Launch: claude, codex, pi + +`spacedock claude`, `spacedock codex`, and `spacedock pi` each start the named +host with the Spacedock first officer loaded. Claude Code is the primary surface; +Codex and Pi are supported but experimental. The grammar is the same for all +three: + +```bash +spacedock claude [task] [spacedock-flags] [-- host-flags] +``` + +- **The task comes first.** Positionals before `--` join with single spaces into + the launch prompt handed to the first officer. With no task, a fixed bootstrap + prompt starts the first officer rather than opening an idle agent. +- **Everything after `--` forwards verbatim to the host** (`claude` / `codex` / + `pi`): `--model`, `--resume`, `--plugin-dir`, and the like. A task placed + after `--` is host passthrough, not the launch prompt; the launcher warns on + stderr when it detects this without altering the assembled argv. +- **The launch is contract-gated.** When no plugin is installed, the launcher + auto-installs it then launches, so the single command yields a working session. + `--no-install` opts out and prints the manual install remedy. A contract + mismatch fails fast (exit 1), since auto-installing would not fix it. + +Spacedock-owned launch flags, declared in `internal/cli/frontdoor.go`: + +- `--safehouse`: force the safehouse sandbox wrap even without a `.safehouse` + profile in the working directory. A `.safehouse` profile triggers the wrap + automatically. +- `--safehouse-enable KEY[,KEY]`, `--safehouse-add-dirs DIR`, + `--safehouse-add-dirs-ro DIR`: repeatable sandbox knobs whose presence also + implies sandbox-on. +- `--plugin-dir DIR`: load a local plugin checkout, relaxing the contract gate. + Repeatable, and accepted both before `--` (parsed by spacedock) and after `--` + (forwarded verbatim). +- `--skip-contract-check`: bypass the contract gate and launch without resolving + the installed plugin. +- `--no-install`: refuse to auto-install a missing plugin and print instructions + instead. + +The contract gate is bypassed by `--skip-contract-check` or by any `--plugin-dir` +(the local checkout supersedes the installed plugin). `spacedock pi` loads Pi's +native skills and extension rather than a plugin manifest; its only spacedock +flag is `--plugin-dir`. + +## Setup: install, doctor + +`spacedock install` installs the per-host plugin, then runs the compatibility +check. `spacedock doctor` runs the check alone. + +```bash +spacedock install [--host claude|codex|pi] [--check] +spacedock doctor [--host claude|codex|pi] +``` + +- `--host` defaults to `claude`. For `codex`, install prints the `codex plugin` + commands to run from your shell rather than running them programmatically. +- `install --check` runs the compatibility report without installing. +- `doctor --plugin-manifest PATH` reads a manifest directly instead of resolving + the installed plugin. + +When `doctor` reports the installed plugin is out of date, refresh it with +`spacedock install --host claude`. See [Install Spacedock](../get-started/install.md) +for the full setup path. + +## Workflow: status, new, state, completion, dispatch + +These commands read and mutate workflow state. `status` and `dispatch` forward +their argv verbatim to their runners; the launcher neither parses nor reorders +their flags. + +### status + +```bash +spacedock status [args] +``` + +`spacedock status` resolves the workflow it acts on in this precedence: an +explicit `--workflow-dir DIR` (or the `PIPELINE_DIR` environment variable) is used +verbatim; otherwise it walks up from the working directory to the enclosing +commissioned workflow. With neither, it exits 1 with +`no Spacedock workflow here — pass --workflow-dir or run inside a workflow`. The +exit domain is `{0 success, 1 error}`, so a usage error is exit 1, never 2. + +The query forms, parsed in `internal/status/native_runner.go`: + +- **No flag**: prints the active-entity status table. `--archived` includes + archived entities; `--quiet` and `--json` change the rendering. +- `--next` prints the items ready to dispatch (requires a stages block in the + README). +- `--next-id` computes the next entity id. It accepts `--id-seed` and `--id-actor` + (valid only with `--next-id` or `--new`). +- `--boot` prints the first-officer boot view (queue, stages, team state). It is + incompatible with `--next`, `--next-id`, `--archived`, `--where`, and + `--fields`/`--all-fields`. +- `--validate` validates the workflow. It prints `VALID` and exits 0, or prints the + errors and exits 1. +- `--resolve REF` resolves a reference to one entity and prints its resolve line. + With `--root ROOT` it resolves across every workflow under `ROOT`, accepting a + `workflow::ref` qualifier. +- `--short-id REF` prints an entity's short display id. +- `--set SLUG field=value...` mutates an entity's frontmatter. Bare `completed` + and `started` auto-fill a timestamp; every other field requires a value. + Terminal transitions are gated (mod-block, merge-hook, verdict, and the + require-external-proof guards); `--force` bypasses with a warning. +- `--archive SLUG` archives an entity. +- `--discover [--root ROOT]` prints the commissioned workflows under `ROOT` + (default: the git toplevel of the working directory). It is incompatible with + every other flag. +- `--where 'field = value'` filters the table. It supports `=`, `!=`, `field =` + (empty), and `field !=` (non-empty), and is repeatable. +- `--fields a,b,c` / `--all-fields` chooses the columns. The two are mutually + exclusive. + +Most flags refuse to combine; the runner names the conflict and exits 1 (e.g. +`--set cannot be combined with --next`). + +### new + +```bash +echo "entity body" | spacedock new [--folder] SLUG +``` + +A pure alias for `status --new`: it prefixes the argv with `--new` and forwards +it, reading the entity body from stdin and auto-discovering the workflow. +`--folder` creates the entity as a folder rather than a single file. + +### state + +```bash +spacedock state init|new [--workflow-dir DIR] +``` + +Manages a split-root workflow's state checkout. `init` resumes a cloned workflow +by fetching the orphan state branch and checking it out as a linked worktree at +the workflow's `state:` path; a present checkout is a no-op that refreshes from +origin. `new` births that branch and worktree around a present split-root README. +An inline workflow has nothing to init. A missing or unknown subcommand exits 2. + +### completion + +```bash +spacedock completion bash|zsh +``` + +Prints a static shell-completion script for bash or zsh (exit 0). It completes +the top-level verbs and the common `status` flags. A missing or unknown shell +exits 2. + +### dispatch + +```bash +spacedock dispatch build | show-stage-def +``` + +Builds the worker dispatch artifacts the first officer hands an ensign. `build` +assembles the assignment (requires `--workflow-dir` unless `--print-schema` or +validate-only); `show-stage-def` prints a stage's definition (`--workflow-dir` +and `--stage`). A missing or unknown subcommand exits 2. See +[Adding runtime support](multi-host.md) for how `dispatch build` learns a new +host mode. diff --git a/docs/site/reference/frontmatter-contract.md b/docs/site/reference/frontmatter-contract.md index b416c3ee..8fe3f340 100644 --- a/docs/site/reference/frontmatter-contract.md +++ b/docs/site/reference/frontmatter-contract.md @@ -1,3 +1,79 @@ # Frontmatter contract -[TODO] — the entity frontmatter field reference (surfacing the development workflow's "Schema / Field Reference" table) plus the external-tracker `issue`/`source` fields. The always-current schema is in [The development workflow](../contributing/development-workflow.md); a standalone `docs/specs/frontmatter-contract.md` is a planned follow-up. +Every entity is a markdown file (or a folder with an `index.md`) whose YAML frontmatter carries the fields Spacedock reads to track and move it. The always-current schema lives in the development workflow's [Schema / Field Reference](../contributing/development-workflow.md#field-reference); this page surfaces that table and the external-tracker bridge fields in one place for reference. A standalone `docs/specs/frontmatter-contract.md` is a planned follow-up; until it lands, the development workflow README is the source of truth. + +The frontmatter parser is line-oriented. Keep fields flat and top-level. If richer metadata becomes necessary, add more flat custom fields rather than nested YAML, because the v1 parser preserves lines, not arbitrary structure. + +## Entity fields + +These are the fields the development workflow declares for a `task` entity. Other workflows may rename the entity type and adjust fields, but these are the contract a dev-workflow entity must satisfy. + +| Field | Type | Description | +|-------|------|-------------| +| `id` | string | Unique 24-character Spacedock Base32 ID, because this workflow uses `id-style: sd-b32`. | +| `title` | string | Human-readable entity name. | +| `status` | enum | One of: `backlog`, `ideation`, `implementation`, `validation`, `done`. The current stage. | +| `source` | string | Where the entity came from. Also used by the external-tracker bridge (see below). | +| `started` | ISO 8601 | When active work began. | +| `completed` | ISO 8601 | When the entity reached terminal status. | +| `verdict` | enum | `PASSED` or `REJECTED`. Set at the final stage. | +| `score` | number | Priority score, `0.0`–`1.0` (optional). A workflow can upgrade to a multi-dimension rubric in its README. | +| `worktree` | string | Worktree path while a dispatched agent is active; empty otherwise. | +| `issue` | string | Optional external ticket reference, such as `ENG-123`, `kata:task-abc123`, or `owner/repo#42`. | + +The `status` field is the execution state. `spacedock status` reads stage declarations from the workflow README and reports each entity's `status` against them; `--set status=` is the mutation that advances an entity. The status read path does not invent stages. If the README declares no stages block, membership cannot be validated. + +The `verdict` field is guarded on the finalize action, not on reaching a terminal stage. The guard keys on the finalize shape (a `--set` that writes `completed`, or an `--archive` of a terminal entity) and refuses it (exit 1, entity unmutated) when the post-state `verdict` is empty (see `internal/status/verdict_guard_test.go`). A bare dispatch into a terminal stage that does not write `completed` passes without a verdict, because the verdict is the outcome of work that has not happened yet. A finalize on an entity that already carries a verdict also passes, even when that `--set` does not re-name it. `--force` bypasses the guard. The failure mode the guard catches is finalizing or archiving a terminal entity with no verdict on record. + +## Copy-paste starter + +The development workflow's task template ships these fields blank for a new entity: + +```yaml +--- +id: +title: Task name here +status: backlog +source: +started: +completed: +verdict: +score: +worktree: +issue: +--- +``` + +Fill `title`, `status`, and `source` at creation. `started`, `completed`, `verdict`, and `worktree` are written by the runtime as the entity moves; do not edit them by hand while a dispatched agent is active. + +## External-tracker fields + +The `issue` and `source` fields are the v0 bridge to an external ledger such as kata, Linear, or GitHub Issues. They are flat top-level fields the current parser preserves; the bridge adds no tracker-specific stage rules. See [Tracking work in an external system](../advanced/external-tracker.md) for the full integration model. + +```yaml +issue: ENG-123 +source: linear +``` + +or: + +```yaml +issue: kata:task-abc123 +source: kata +``` + +The contract for these two fields: + +- **`issue` is the human-facing external reference.** It points at the ticket the entity mirrors; Spacedock does not parse its internals. +- **`source` records where the entity came from** when useful: the tracker name, or any origin marker. +- **Spacedock `status` remains the execution status.** The external tracker does not redefine Spacedock stage semantics inside the entity, and ownership stays one-way unless a future bridge explicitly declares bidirectional sync. + +## Validating an entity + +Check an entity against the contract with the status command's `--validate` flag: + +```bash +spacedock status --workflow-dir docs/dev --validate +``` + +It exits 0 when the workflow is valid and 1 when it is not, printing the errors to stderr; with `--json` it also emits a `{"command":"validate","valid":"true"}` (or `"false"`) envelope. `--validate` cannot be combined with the other status flags: `--next`, `--next-id`, `--boot`, `--where`, `--fields`/`--all-fields`, `--archived`, `--archive`, or `--set`; the command rejects the combination. Validation reads stages from the workflow README and entities from the state checkout, so it enforces the contract against the same schema the workflow declares, not an assumed one. diff --git a/docs/site/running-workflows/commission.md b/docs/site/running-workflows/commission.md index 9a7b44ed..461b3175 100644 --- a/docs/site/running-workflows/commission.md +++ b/docs/site/running-workflows/commission.md @@ -1,3 +1,61 @@ # Commission a workflow -[TODO] — generating a workflow from a description with `/spacedock:commission`: the four things to name (stages, entity, gated stages, per-stage rules) and what the first officer does next. +`/spacedock:commission` turns a description of the work you want tracked into a runnable workflow: a directory of markdown entities, a README that is the workflow's living spec, and a first officer ready to dispatch an ensign for each seed entity. You answer a short interactive design pass, the skill generates the files, and it launches a pilot run as the first officer. + +Invoke it from a session started with `spacedock claude`. You can pass the mission inline: + +``` +/spacedock:commission product idea to simulated customer interview +``` + +Text after the command name is taken as the workflow mission and presented for confirmation rather than asked from scratch. With no argument, the skill greets you and asks. + +## The four things you name + +The design pass collects four decisions. Everything else (the directory path, the entity identity scheme, the rejection routing) is derived from these and shown back to you for confirmation before any file is written. + +1. **The mission and what each entity is.** The first question asks what the workflow is for and what each work item represents. From the entity description the skill derives the entity label used throughout the generated files: "a design idea" becomes label `idea`, plural `ideas`, type `design_idea`. An entity is one work item, a markdown file that moves through the stages. + +2. **The stages.** The skill detects the workflow's shape from your mission (shipping code, testing a hypothesis, or iterating on an artifact) and proposes a stage list for you to confirm, modify, add to, or trim. Stage names describe the bucket an entity is sitting in: activity-flavored (`implementation`, `review`, `validation`) or state-flavored (`proposed`, `published`, `accepted`). The skill pushes back on pleonastic names: `awaiting_validation` reads as "the entity is in awaiting_validation," so it suggests `validation` instead. `done` is the universal terminal and stays as-is. + +3. **The gated stages.** A gate is a stage where the workflow pauses for your decision before an entity advances. By default the skill places one gate before the terminal stage. For each gate it also derives a rejection flow, the earlier stage an entity bounces back to when you reject, defaulting to the stage immediately before the gate. You confirm both in the design summary, stated in plain language ("If you reject at `review`, it goes back to `draft` for revision"). + +4. **The per-stage rules.** Each stage in the generated README carries three bullets that tell a dispatched ensign what "good" means for that stage: **`Outputs`** (what the worker produces), **`Good`** (your quality bar), and **`Bad`** (anti-patterns to avoid). The skill drafts these from the mission, but they are starting prose, not commitments. They are the rules every dispatched agent works from, so tightening them before the first dispatch is the single highest-leverage edit you can make. + +You also choose the entity ID style: `sd-b32` (recommended when multiple people or agents create entities across branches or worktrees), `sequential` (single-writer or numeric-continuity workflows), or `slug` (the filename is the identity). See the [frontmatter contract](../reference/frontmatter-contract.md) for what each style stores. + +## What gets generated + +After you accept the design, the skill writes everything into `docs/{mission-slug}/`: + +- `README.md`: the mission, the schema, a per-stage section for every stage (with the `Outputs` / `Good` / `Bad` bullets), and a copy-paste entity template. +- One file per seed entity at `docs/{mission-slug}/{slug}.md`, each with valid YAML frontmatter and the description you gave. +- `_mods/pr-merge.md`, generated only when a stage modifies the repo (a worktree stage) and you accept the `pr-merge` mod, which tracks PR state on the entity's `pr` field instead of modeling merge as its own stage. + +If any stage writes code or produces artifacts beyond the entity file, that stage is marked `worktree: true` so each entity gets an isolated branch and the main checkout stays clean. + +## Tighten the README before the first dispatch + +The README is the workflow's living spec. Before launching, the skill reminds you that the auto-generated per-stage bullets are best-guesses and prompts you to tighten them. You have two ways to do it: + +- Open `docs/{mission-slug}/README.md` and edit the bullets under each `### {stage}` heading directly. +- Type `review stages` to have the skill walk you through each stage one at a time, flag the bullets that read as generic, and apply your amendments inline. + +Editing here costs minutes; un-editing after agents have been dispatched against vague bullets costs more. + +## What the first officer does next + +Commission does not stop at generating files. It assumes the first-officer role itself and runs the pilot. There is no separate launch step for the first run. + +1. It loads the [first officer's operating contract](../concepts/operating-model.md) and reads the workflow README you just generated. +2. It runs `spacedock status --boot` to read the workflow's current state. +3. It probes for the team-mode tools and dispatches an ensign for each seed entity that is ready to advance, processing them through the stages. +4. When the workflow goes idle or reaches a gate, it reports the pilot results: which entities moved, which stages they passed through, and any gate waiting on your decision. + +To run the workflow in any later session, launch the first officer again: + +``` +spacedock claude +``` + +It reads the workflow state, picks up where the last session left off, and dispatches agents for any entity ready for its next stage. Day-to-day operation (seeing what is ready, dispatching, and handling gate decisions) is covered in [Operating a workflow](operating.md). diff --git a/docs/site/running-workflows/debrief-and-refit.md b/docs/site/running-workflows/debrief-and-refit.md index fa6e6b17..88d35d4a 100644 --- a/docs/site/running-workflows/debrief-and-refit.md +++ b/docs/site/running-workflows/debrief-and-refit.md @@ -1,3 +1,67 @@ # Debrief & refit -[TODO] — `/spacedock:debrief` (capture a session into a record the next one resumes from) and `/spacedock:refit` (upgrade workflow scaffolding to the current release while keeping local edits). +Two maintenance commands keep a workflow durable across sessions and releases. `/spacedock:debrief` captures what happened in a session into a record the next session reads to start with context. `/spacedock:refit` brings an existing workflow's scaffolding up to the current Spacedock version while leaving your local edits in place. You run both as the captain; each pauses for your confirmation before it writes anything. + +## Debrief: capture a session + +`/spacedock:debrief` writes a structured record of a session (shipped entities, newly filed backlog seeds, workflow-only commits, gate decisions, issues, and what's next) to `{dir}/_debriefs/{date}-{sequence}.md` and commits it. The next session's first officer reads the most recent debrief instead of starting cold. + +Run it at the end of a working session, or whenever you want a checkpoint: + +```bash +spacedock claude "/spacedock:debrief" +``` + +The skill works in four phases. You make the decisions at the boundaries; everything else is git and local-file reads, with no external services until you ask it to file an issue. + +1. **Discovery.** It finds the workflow with `spacedock status --discover`, then anchors the session start. If a prior debrief exists in `{dir}/_debriefs/`, the new session starts at the commit after that debrief's `last-commit` frontmatter field; if none exists, it falls back to the first commit in the workflow directory or the last 24 hours. It shows you the session boundary (since-commit and commit count) and waits for you to confirm or supply a different starting commit. + +2. **Extract.** It buckets every commit in range: PR squash-merges roll up into a **Shipped** section as a PR link, never enumerated; routine state churn (`dispatch:`, `advance:`, `state:`) is suppressed; only workflow-only commits that never flowed through a PR (`docs:`, `feedback:`, `ideation:`, reverts) are listed. It reads entity frontmatter to find what reached `done`, scans for gate approvals and rejections, and runs `spacedock status --workflow-dir {dir} --next` to populate **What's next**. + +3. **Draft and review.** It presents the draft with **Decisions** and **Observations** left as placeholders for you to fill. Add why a gate was approved or rejected, scope changes, design insights, or confirm as-is. Issues are split into **Workflow** (quirks in your pipeline, kept local) and **Spacedock** (framework bugs). For each Spacedock issue it offers to file an **anonymized** GitHub issue: the body carries the bug, repro steps, and scale, but never your mission, entity titles, or domain. You approve, edit, or decline each one before any `gh issue create` runs. + +4. **Write and commit.** It writes the debrief to `{dir}/_debriefs/{date}-{sequence:02d}.md` with `first-commit`, `last-commit`, and an approximate `duration` in frontmatter, commits it with a `debrief:` prefix, and reports the path: + + ``` + Debrief written to {dir}/_debriefs/2026-06-09-01.md and committed. + ``` + +The `last-commit` field is the load-bearing part: it is the anchor the next debrief reads to know where this session ended. + +## Refit: upgrade scaffolding to the current release + +`/spacedock:refit` upgrades a workflow's scaffolding files (the README and any installed mods in `_mods/`) to match the current Spacedock version, and migrates entity frontmatter when a schema change requires it. Agent files and the status viewer ship with the plugin, so they are never refit locally. The skill never auto-replaces a file you may have customized; it shows you a diff and you decide. + +You must give it the workflow directory: + +```bash +spacedock claude "/spacedock:refit path/to/workflow" +``` + +It reads the version stamp from the README frontmatter (`commissioned-by: spacedock@X.Y.Z`) and each mod's `version` field, compares them against the current version from the plugin manifest, and stops with "Workflow is already up to date." if everything matches. Otherwise it presents an upgrade plan and proceeds per file by strategy: + +- **`README.md`: show diff, never auto-replace.** Because you customize stages, schema fields, and quality criteria here, the skill generates what the current template would produce, diffs it against your README, and leaves it to you to apply the changes you want. It does not modify the README itself, only the version stamp at the end. +- **`_mods/{name}.md`: version diff.** For each installed mod it compares your `version` against the canonical mod at `mods/{name}.md`. Matching versions are skipped; differing versions get a diff and a y/n. A mod with no canonical match is treated as custom: acknowledged, no action. Canonical mods you don't have installed are offered for install. +- **`status` (legacy): remove.** A workflow-local `status` script predates the launcher. The status viewer is now the `spacedock status` command, so refit removes the local copy with `git rm`. + +### Schema migration and ID style + +After scaffolding, refit compares the old and new README `## Schema` and `### Field Reference` sections for changed types or ranges, renamed fields, removed fields, or new required fields. If a change affects entity data, it lists the affected entities, proposes the migration (for example, "Convert score from /25 to 0.0–1.0 by dividing by 25"), and waits for your y/n. On approval it edits **only** the named frontmatter fields with the Edit tool, never an entity body, never a whole-file rewrite. + +Refit preserves the README's `id-style` (`sequential`, `sd-b32`, or `slug`) and never changes it silently. It recommends `sd-b32` only under collaboration pressure (worktree stages, PR/merge mods, multiple creators, branches, offline work) and requires your explicit approval. Before any approved style change it runs `spacedock status --validate` against the workflow and reports failures; the actual ID rewrite is manual in this release. + +### When there is no version stamp + +If the README has no `commissioned-by` stamp, refit cannot tell what the original scaffolding looked like, so it enters **degraded mode** and offers two choices: **stamp only** (add stamps without changing anything, to establish a baseline) or **full refit with review** (show a full diff for every file and require your approval before replacing each). It never auto-replaces an unstamped file. + +When the refit finishes it updates the README stamp to the current version, prints a per-file summary, and suggests the commit: + +```bash +git commit -m "refit: upgrade workflow scaffolding to spacedock@{current_version}" +``` + +Git is the safety net throughout: `git diff` and `git checkout` recover anything you didn't mean to keep. + +## Where these fit + +Debrief and refit bracket the working loop described in [Operating a workflow](operating.md): you commission once, operate session by session, debrief at the end of a session, and refit when you upgrade Spacedock. For the commands these skills call, see the [Command reference](../reference/command-reference.md). diff --git a/docs/site/running-workflows/operating.md b/docs/site/running-workflows/operating.md index 6b4aaa27..47dc885c 100644 --- a/docs/site/running-workflows/operating.md +++ b/docs/site/running-workflows/operating.md @@ -1,3 +1,71 @@ # Operating a workflow -[TODO] — the day-to-day loop (see what's ready, dispatch, handle gates), the `spacedock status` queries (`--next`, `--where`), and handling gate decisions. See the [Command reference](../reference/command-reference.md). +Operating a workflow is a loop: see what is ready, dispatch the first officer to move it, and make a decision when work reaches a gate. The captain drives that loop; the first officer does the orchestration and the ensigns do the stage work. This page covers the loop, the `spacedock status` queries that show you the state, and how to handle a gate. + +## The day-to-day loop + +You run the same three steps each session: + +1. **See what is ready.** Query workflow state to find the entities that can move (the dispatchable set) and where everything sits. +2. **Dispatch the first officer.** Launch a session and let it pull a dispatchable entity through its next stage. +3. **Handle gates.** When a stage is gated, the first officer stops and presents the result. You approve, reject, or send it back. + +The loop ends a session when nothing is dispatchable, or when every dispatchable entity is waiting on a gate decision from you. + +## See what is ready + +`spacedock status` reads the workflow state and prints it. Run it against the workflow directory, the one holding the commissioned `README.md`: + +```bash +spacedock status --workflow-dir docs/dev +``` + +Prints the status table: one row per active entity, with its ID, title, and current stage. This is the full picture. + +To list only the entities ready to dispatch, add `--next`: + +```bash +spacedock status --workflow-dir docs/dev --next +``` + +Prints the dispatchable set: the entities whose next stage can run now, given concurrency limits and what is already in flight. This is the query the first officer runs each loop. When nothing is ready, the result is empty; there is nothing to dispatch. + +To filter the table by a frontmatter field, use `--where "field=value"`. The filter takes `=` (equals) or `!=` (not equals): + +```bash +spacedock status --workflow-dir docs/dev --where "status=ideation" +spacedock status --workflow-dir docs/dev --where "verdict!=" +``` + +The first prints every entity in the `ideation` stage. The second prints entities whose `verdict` field is set (the `!=` against an empty value). Use `--where` to answer targeted questions: what is in a given stage, which entities carry an external `issue`, which already have a `verdict`. + +Two more queries are worth knowing: + +- **`--validate`** checks every entity against the workflow's contract and reports problems (a missing or malformed ID, a duplicate ID, a stage name that breaks the naming rule). Run it when the table looks wrong. +- **`--resolve REF`** looks up one entity by slug, full ID, or ID prefix, so you can name it unambiguously before acting on it. + +All status queries are read-only. They print state, they do not change it. For the full flag list and the `--set` and `--archive` mutation forms, see the [Command reference](../reference/command-reference.md). + +## Dispatch + +Hand the first officer the workflow and let it run the dispatch cycle. Launch with your harness subcommand and a task that names the work; the first officer takes the workflow directory from the path you give it in the task, or runs `spacedock status --discover` to find it: + +```bash +spacedock claude "/spacedock:first-officer operate the workflow in docs/dev" +``` + +The first officer reads the workflow `README.md`, runs its own `status --next`, and for each dispatchable entity it dispatches an ensign to move the entity through its next stage. The ensign does the stage work (write the design, produce the deliverable, run the validation), commits, and files a stage report. The first officer reads the report, checks it against the stage's outputs and the entity's acceptance criteria, and advances the entity. A completed non-gated, non-terminal stage is not a stopping point: the first officer advances it and dispatches the next stage on its own, without waiting for you. + +It stops and returns to you only at a gate, at a terminal entity's merge ceremony, on a blocker, or when nothing is left to dispatch. + +## Handle gate decisions + +A gate is the decision point at the end of a stage marked `gate: true` in the workflow. When an entity reaches one, the first officer presents the stage report and the result of its review, then waits. It never self-approves. You decide: + +- **Approve.** The entity advances to its next stage. The first officer dispatches it (or, at a terminal stage, runs the merge-and-cleanup ceremony to close the entity with its verdict). +- **Reject.** On a stage with a `feedback-to` target (`validation` routes back to `implementation` in the `docs/dev` workflow), the rejection routes the concrete findings back to that stage and re-runs the work, then re-validates. A repeated rejection escalates back to you rather than bouncing indefinitely. +- **Send it back with direction.** If the result is close but not right, give the first officer the specific change to make. It updates the entity body, acceptance criteria, and test plan together, then re-runs the stage. + +The gate review names the chosen direction, cites the stage report, and ends with a single recommendation, approve or reject. Read the report it cites before deciding; overriding a `REJECTED` recommendation without a reason is exactly the kind of unexamined approval the gate exists to catch. + +When you approve a terminal stage, the entity is closed: the first officer records the merge, sets the `completed` timestamp and `verdict`, clears the worktree, and tears the worker down. At that point the loop returns to the top: run `status --next` and see what moved into reach. diff --git a/docs/site/running-workflows/survey.md b/docs/site/running-workflows/survey.md index b5ff6ce2..3a800a50 100644 --- a/docs/site/running-workflows/survey.md +++ b/docs/site/running-workflows/survey.md @@ -1,3 +1,65 @@ # Survey an existing project -[TODO] — reading a brownfield project's agent history with `/spacedock:survey` (read-only), what it reports, and how it offers to commission a workflow from what it found. +`/spacedock:survey` reads a brownfield project's agent history and reports what the agents have implicitly been doing (read-only), then offers to commission a workflow from what it found. Run it when you arrive at or return to a repo that already has agent sessions and want the lay of the land before doing anything else. It never edits your files; the only stop in the flow is the commission offer at the end. + +Survey is the recommended first launch. Point Spacedock at a project and hand it the survey skill: + +```bash +spacedock claude "/spacedock:survey" +``` + +This starts the first officer in Claude Code and runs the survey. The new-user walkthrough is in [Your first launch](../get-started/first-launch.md); this page is the operator's view of what survey does and how to read it. + +## What it reads + +Survey reads recorded agent session history through `agentsview`, a session-history tool. It does not parse raw logs by hand. It drives the `agentsview` binary to sync this project's sessions into a process-readable copy, then runs a fixed set of labeled, read-only SQL queries against that copy. The queries live in `skills/survey/references/queries.sql`, one labeled query per concern, so nothing is a black box. + +Two behaviors matter when you run it: + +- **It scopes to this repo by identity, not by name.** Survey resolves the repo root and scopes every query to that absolute path prefix. Because `agentsview` keys each session's project by the git-root basename, a same-basename sibling repo elsewhere on disk would otherwise fold in; the path-prefix scope keeps it out, and admits every checkout of this repo: the root, a subdir, a worktree. +- **If `agentsview` is missing, it asks before installing.** Survey needs `agentsview` to read the logs. When the binary is absent it tells you so and asks consent; on a yes it installs (`brew install --cask agentsview`, or the install-script fallback). It never installs without an explicit yes. If the sync fails for any reason (network, disk, permissions), it reports the exact failure and stops rather than guessing. + +If the repo has no Claude agent history, survey says so plainly and stops. There is nothing to discover. + +## What it reports + +Survey leads with a one-line headline (the project, the session count, the date range, and the decision and interruption counts), then renders the body in the same turn. The body is the value, so it does not pause for a confirmation before showing you the sections: + +- **Inferred workflow.** The implicit loop reconstructed from the decisions and prompts, as an arrow chain, with one honest line about it. +- **Workstreams.** The decisions and prompts clustered into tracks, each tagged with its work mode (see below). +- **Work by area.** Where edits actually landed, by logical area (`src`, `internal`, `docs`, …) regardless of physical location. A worktree edit counts toward its area, so worktree-based work is not hidden. Genuine config paths (`.claude`, `.beads`, `.git`) and external sibling references demote to a footnote. +- **Needs you.** The open decisions, the forks raised but never resolved. **Survey leads the report with these**, because they are the work blocked on you. Exploration threads you are deliberately holding are separated from mechanical questions awaiting an answer. +- **Recent decisions** and **interruptions**: the answered or shipped forks, and how often you had to step in. +- **Scaffold.** If another agent scaffold is in use (superpowers, gsd / get-shit-done, or another `.claude` skill tree), survey states it as a fact: the family, its invocation count, and whether it is checked in on disk. +- **Codex** (only when present). Codex sessions land with no recorded working directory, so survey attributes them to this repo through each command's working directory and reports them as their own section: a session count, the workstream clusters, and an activity tally. Gemini is a deferred follow-up. + +If a section's signal is empty, survey says the run found none of it. It never dresses an empty section up as "no decisions". + +### The open frontier is cross-checked + +The open-decision scan reads transcripts only, which cannot tell a shipped fork from a still-open one. Before presenting the frontier, survey cross-references the repo (`git log`, merged PRs, the working tree) and splits each open fork three ways: + +- **shipped**: a confident match to a merged PR or commit. Dropped from the frontier. +- **decided, not shipped**: moved to a backlog line. +- **never decided**: stays on the `NEEDS YOU` frontier. + +The match is conservative: a fork is dropped only on a confident repo match, because a false "still open" is a cheap nudge while a false "shipped" silently hides real open work. When no repo signal is available, whether because this is not a git repo or because the lookups fail, the frontier degrades to transcript-only and every open fork is flagged `unverified` rather than presented as authoritative. + +## The commission offer + +Survey closes by offering to commission a workflow from what it found, and the offer is keyed to each track's work mode, so it is not one undifferentiated pitch. Survey classifies each track as **mechanical**, **exploration**, or **unlabeled**: + +- **Mechanical tracks** (the routine issue → worktree → PR loop) get an **automation** offer: a workflow that gates the crucial decisions and lets the agent drive the loop between gates. +- **Exploration tracks** (creative, content, or design work where your steering is the point) get a **book-keeping** offer: structure for the parallel threads, tracking each draft or path and its state (in-flight / paused-by-choice / abandoned). There is no automate-the-human-out pitch here. The involvement is the work. +- **Unlabeled tracks** get the generic book-keeping offer, never a guessed automation pitch. + +A project with both modes gets both offers. Each offer cites a real number from the scan: the track names, the gate-pass count, the open forks, the cancelled-path count. + +On a **yes**, survey hands off to [commission](commission.md) in batch mode, assembling the inputs from the scan: + +- **stages** ← the inferred workflow loop; +- **seed entities** ← the workstreams; +- **approval gates** ← the open forks that survived the repo cross-check; +- **mission and entity** ← inferred from the workstreams and the project. + +Survey does not write the workflow files itself; file generation stays commission's job. On a **no**, it stops; the survey stands on its own as orientation.