Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
322e609
roadmap(0203): fo-efficiency shaping — j9 backbone + boot forensics
clkao Jun 13, 2026
972c219
roadmap(0203): fold captain contract audit — drop agents/first-office…
clkao Jun 13, 2026
13afe3f
roadmap(0203): spec moved into the j9 entity; sprint dir is now a thi…
clkao Jun 13, 2026
483c3c6
roadmap(0203): intake github#344 (context-budget spurious warnings) t…
clkao Jun 13, 2026
be3cf39
roadmap(0203): T3 filed as backlog seed fo-contract-prose-audit (bloc…
clkao Jun 13, 2026
0af3652
roadmap(0203): sprint artifacts — index.md + staff-review.md + dispat…
clkao Jun 13, 2026
95e86b8
feat(j9): AC-6 per-turn boot-window measurement parser + offline oracle
clkao Jun 13, 2026
b8a97a2
test(j9): AC-4 boot-resident deferred-load-point reference closure
clkao Jun 13, 2026
aa83d95
test(j9): pin pr_state live merge state + gh-absent degraded branch
clkao Jun 13, 2026
5f4baaf
feat(j9): P2 lazy-TeamCreate + P3 shallow-boot-then-greet
clkao Jun 13, 2026
fa30d17
test(j9): add the shallow-boot shared runtime scenario (AC-1/AC-2/AC-6)
clkao Jun 13, 2026
f35b5b9
feat(j9): P1 contract split + AC-5 offline-gate retarget (one change)
clkao Jun 13, 2026
0bb7e02
test(j9): deterministic offline repro of the AC-3 single-entity revie…
clkao Jun 13, 2026
364ee4f
fix(j9): correct rejection-flow to the contract-correct single-entity…
clkao Jun 13, 2026
fe822a9
fix(j9): host-neutral the rejection-flow prompt (Codex AC-3 regression)
clkao Jun 14, 2026
073e119
fix(j9): merge multi-delta tool_use so AC-2 catches a later-delta Tea…
clkao Jun 14, 2026
8328582
test(j9): pin the AC-2 parser fix to the validator-named real hang ca…
clkao Jun 14, 2026
b2d7d6e
test(j9): scrub CI repo-naming env from live child + loud wrong-root …
clkao Jun 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions docs/roadmap/0203-fo-efficiency/boot-analysis.md

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions docs/roadmap/0203-fo-efficiency/dispatch-sprint-execution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# 0203 (0.20.3) — FO efficiency — Commander dispatch (cold-boot)

## Boot

Sprint = the entities matching `sprint: 0203-fo-efficiency` (query, not a list): **j9** (lazy-teamcreate-shallow-boot), **#344** (context-budget-spurious-warnings), **T3** (fo-contract-prose-audit). Boot the first officer (`spacedock claude`), `status --boot`, and read each entity body for its gate-approved design + ACs. Readiness: `staff-review.md` (verdict READY). Goal/DoD: `index.md`. Evidence: `boot-analysis.md`.

## Deliverable & DoD

**0.20.3** = the FO-efficiency restructure + the context-budget probe fix. Done when, merged to `next` (then `main` at the cut) — see `index.md` Definition of Done. Headline: a live drive **measures** boot reaching < ~60k with the ~89k team-mode re-cache absent before the greet.

## Drive order — ⚠️ coordination

1. **j9 first** (the backbone; T3 keys off it). Per the operating principle — *begin with the end; do the hardest first, de-risk when it's cheap* — within j9 land the cheap, biggest lever first: **lazy-TeamCreate + shallow-greet + the AC-6 measured-saving drive**, proving the <60k/89k saving *before* the full contract split. Then **Phase-1** (the split + the offline-gate-assertion retarget). If the cheap measure already clears <60k, the split is the contract-cleanup / residual-~16k play, not load-bearing for the saving — decide then whether it stays in 0.20.3.
2. **T3 after j9 Phase-1 lands** — the slimmed refs must exist before the audit. Step-0 survey decides the collapse fork (cut vs recorded decision).
3. **#344** — already validated (`46224f5f`); merge with the batch. No ordering constraint (zero overlap).

## Per-member build notes

### j9 — lazy-teamcreate-shallow-boot · shipped-scaffolding surface · ⚠️ HIGH-STAKES
The 3-phase restructure (full spec in the entity body, AC-1..AC-6). It rewrites the very contract the FO + ensigns run under — test the new contract in isolation (the live `shallow-boot` scenario + the `contractlint` closure test) before merge. **Retarget the two offline-gate assertions** (`TestNoUnexpectedModHookOrPRMergeIntroduced` allowlist; `TestGradeMarkerMatchesContract` source) to the post-split layout as an explicit subtask, and keep `go test ./...` green (AC-5). Clean the 3 Polish residuals from `staff-review.md` (stale AC-count lines; AC-1(b) attribution; AC-6 89k-soft-spot note).

### #344 — context-budget-spurious-warnings · dispatch-path · DONE (held pre-merge)
Implemented + validated on `spacedock-ensign/context-budget-spurious-warnings` @ `46224f5f` (the `<synthetic>` census skip + the `[1m]`-suffix window promotion). 5 ACs green; golden parity zero-churn; detached audit confirmed the over-suppression guards load-bearing. Just merge with the batch.

### T3 — fo-contract-prose-audit · contract-cleanup · BLOCKED on j9 Phase-1
The 4-step audit method (survey → mechanical cut → comm-officer polish) against the slimmed refs. Step-0 survey is the collapse fork: non-empty inventory → code change; empty/trivial → a recorded roadmap decision (AC-4). Steer the survey to KEEP the budget-probe reuse-condition-0 prose (a deliberate cross-host abstraction split, not collapsible duplication).

## Detached adversarial audit (before merge)

High-stakes surface: **j9** (shipped contract/scaffolding). Run a read-only detached audit on a throwaway checkout of the merge result before merging j9 — refute that the live scenarios + `contractlint` guards would catch a broken edit. **#344** already had its detached audit. **T3** is behavior-preserving (live scenarios) — routine.

## Pre-cut antipattern audit (⚠️ before the v0.20.3 tag)

All merged, tag not yet fired → an INDEPENDENT staff-eng reviewer over the assembled sprint. **Critically: confirm AC-6 actually measured the <60k/89k saving in the live run** — the sprint's whole point. Verify main-PR CI gating. Ship-blockers fixed pre-cut; non-blockers seed the next sprint.

## Cut

Fire `v0.20.3` once the three are merged and the pre-cut audit is clean.

## Out of scope (deferred)

p2/vc (0.20.4 binary-simplification line); xp (cross-session FO↔Commander comms — the coordination gap this sprint hit live); ey (proof-policy port to shipped scaffolding).
45 changes: 45 additions & 0 deletions docs/roadmap/0203-fo-efficiency/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# 0203 — FO efficiency (0.20.3)

**Sprint:** the entities matching `sprint: 0203-fo-efficiency` (a query, not a hard-coded list) — `j9` (lazy-teamcreate-shallow-boot), `#344` (context-budget-spurious-warnings), `T3` (fo-contract-prose-audit).
**Theme:** make the first officer cheap to boot and run.

## Goal (success criterion)

An FO reaches interactive readiness — greet + state summary + *able to present a gate* — in seconds at **< ~60k** context, versus today's minutes at 126k+. Proven by a live FO-boot drive that **measures** the saving (j9 AC-6), never a grep over the restructured contract.

## Why

Boot forensics (`boot-analysis.md`) measured ~160k peak context and ~13.6 min to greet — with no team created and no worker dispatched. Structural, not a bug.

## Cost levers (ranked)

| Lever | ~boot cost removed | Needs the split? |
|-------|-------------------:|------------------|
| Lazy-TeamCreate (defer the team-mode prefix re-cache) | **~89k** | no |
| Defer contract reads at greet | ~16k | yes (minimal) |
| Defer the human status-table render | ~8.7k | no |
| Defer mod-file reads | ~6.5k | no |

## Definition of Done

0.20.3 ships when, merged to `next` (then `main` at the cut):
- **j9** — the FO contract is split into a boot-resident core + deferred dispatch/merge references; `TeamCreate` deferred off the boot/greet path; shallow-boot-then-greet off `status --boot --json`. AC-1..AC-6 green, including the live shallow-boot scenario, the offline gate staying green post-split, the `contractlint` closure test, and the **measured-saving drive** (greet context < ~60k, no pre-greet ~89k spike).
- **#344** — the context-budget probe emits no spurious `config_drift`/`mixed_models` warnings on healthy members and reads the correct window. (Implemented + validated on `spacedock-ensign/context-budget-spurious-warnings` @ `46224f5f`; held pre-merge — ships with the batch.)
- **T3** — the slimmed FO refs are audited + comm-officer-polished, behavior-preserving (live scenarios green) and measurably smaller — or a recorded roadmap decision if the split left nothing to cut.
- `v0.20.3` cut after the pre-cut antipattern audit is clean.

## Tasks

- **j9** (backbone) — contract split → lazy-TeamCreate → shallow-boot-then-greet. The full spec is the entity body.
- **#344** — context-budget spurious-warnings fix (validated, held pre-merge).
- **T3** — residual-prose audit + polish (blocked on j9 Phase-1; collapses to a decision if nothing to cut).

## Out of scope

p2/vc (0.20.4 binary-simplification line); xp (cross-session FO↔Commander comms — the coordination gap this sprint surfaced); ey (proof-policy port to shipped scaffolding).

## Artifacts

- `staff-review.md` — preflight readiness gap analysis (verdict: READY)
- `dispatch-sprint-execution.md` — cold-boot Commander dispatch package
- `boot-analysis.md` — the boot forensics (evidence base)
32 changes: 32 additions & 0 deletions docs/roadmap/0203-fo-efficiency/staff-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# 0203 FO-efficiency — preflight staff review

**Verdict: READY** (after one j9 rework cycle). Sprint-wide preflight over the pooled sprint — per-item + cross-cutting. Shaping-FO session, 2026-06-13.

## Per-item readiness

- **j9** (lazy-teamcreate-shallow-boot) — **READY.** The contract restructure (boot-resident/deferred split → lazy-TeamCreate → shallow-boot-then-greet). Reviewed in depth: a 4-lens design panel + adversarial re-verify (6 + 3 findings closed), then this sprint preflight surfaced 3 batch-blockers — all fixed in cycle 4 and re-verified. AC set AC-1..AC-6, all external-proven, no prose-grep.
- **#344** (context-budget-spurious-warnings) — **READY** (validated, held pre-merge). Narrow Go fix; 5 ACs green including golden parity zero-churn; a detached adversarial audit confirmed the over-suppression guards load-bearing. Zero file overlap / behavioral interaction with j9/T3. Merges with the batch.
- **T3** (fo-contract-prose-audit) — **READY.** A 4-step audit method against j9's *planned* modules; ACs external (live-scenario preservation + `wc` size-floor + collapse-decision record); correctly sequenced behind j9 Phase-1; collapses to a roadmap decision if the split leaves nothing to cut.

## Cross-cutting coherence

- **Zero file overlap.** #344 touches `internal/claudeteam` Go; j9/T3 touch the FO markdown refs + `contractlint`/`ensigncycle` tests. No merge conflict is possible.
- **#344 ↔ j9** meet only at the stable `reuse_ok` surface — the contract references the budget probe as a black box, and neither warning key appears anywhere in the refs. Clean.
- **T3 → j9 Phase-1** dependency is sound + non-circular: T3 designs its method (not a cut-list) against j9's planned modules, with an explicit collapse fork.

## Material findings — all CLOSED in j9 cycle 4

- **M1** — the Phase-1 split breaks two hard-coded offline-gate tests (`TestNoUnexpectedModHookOrPRMergeIntroduced` via the `## Hook:` allowlist; `TestGradeMarkerMatchesContract` via the `TERMINAL_TEARDOWN_BOUNDED` marker). → j9 owns an explicit retarget subtask + **AC-5** (offline gate exits 0 post-split).
- **M2** — AC-4's structural proof was unbuildable (cited a test that reads only `SKILL.md`). → rewritten as a real new `contractlint` `os.Stat`-oracle test.
- **M3** — the sprint's headline goal (<~60k / 89k saving) had no AC. → **AC-6** measured-saving live drive (greet-turn context < ceiling + no pre-greet ~89k spike, off `claude-stream.jsonl`).

## Residuals — Polish, fold into implementation (non-blocking)

- j9: three stale historical lines say "AC-1..AC-4" (live set is AC-1..AC-6) — mark superseded.
- j9: AC-1(b) credits the team `config.json` path to a "comm-officer hook" that doesn't ship — loose attribution; the path itself is real.
- j9 AC-6 soft spot: the ~89k is asserted (no team was created in the forensics run); the negative control rides an eager-team fixture — the present/absent cache-creation-spike signal stays falsifiable.
- T3: AC-4's collapse-decision-record cites `README.md`, now the conventional `index.md` — repoint the path at T3 implementation.

## Provenance

Boot forensics: `boot-analysis.md`. Per-item ideation: the entity bodies under `docs/dev/.spacedock-state/`. Preflight (4-lens) + j9 re-verify: shaping-FO session, 2026-06-13.
1 change: 1 addition & 0 deletions docs/specs/scenario-testing-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ The first foundation is the host-neutral runtime scenarios already shipped and h
- `feedback-3-cycle-escalation` — on the third consecutive REJECTED validation the FO escalates to the human instead of auto-bouncing a fourth time.
- `merge-hook-guardrail` — the FO cannot bypass a registered merge hook by terminalizing without pr, mod-block, or force.
- `filing` — the FO files a new seed entity via the atomic `spacedock new <slug>` path, not the drift-prone `--next-id` + hand-write pair.
- `shallow-boot` — a freshly-booted FO greets and reports accurate state, advances a merged PR before-greet (S7b), with no team created and no worker dispatched, then stops for input.
<!-- /seed-scenarios -->

These IDs are the code-backed source of truth. They mirror the `sharedRuntimeScenarios()` table in `internal/ensigncycle`; the seed IDs declared above must equal that table. This block is machine-readable so a lock test can bind the doc to the code and red on drift in either direction — adding, dropping, or renaming a scenario on one side without the other. This is what makes the doc the human-readable face of a code-backed truth rather than prose bound to nothing.
Expand Down
157 changes: 157 additions & 0 deletions internal/contractlint/boot_resident_closure_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
// ABOUTME: AC-4 reference-closure over the boot-resident FO contract bodies — every
// ABOUTME: deferred load-point they name resolves to a real file on disk (os.Stat oracle).
package contractlint

import (
"os"
"path/filepath"
"regexp"
"strings"
"testing"
)

// bootResidentBodies are the two contract bodies the FO loader inlines/reads at
// boot: the slimmed shared core and the Claude runtime adapter. AC-4 walks these
// (NOT a SKILL.md, which the existing TestUserSkillReferenceClosureResolves reads)
// because the loader reads the bodies directly, and only the bodies name the
// deferred load-points the boot core defers to (a sibling reference path, a bare
// skill invocation, a canonical mod file).
var bootResidentBodies = []string{
filepath.Join("skills", "first-officer", "references", "first-officer-shared-core.md"),
filepath.Join("skills", "first-officer", "references", "claude-first-officer-runtime.md"),
}

// bodyReferenceRe matches a sibling reference read-path named in a contract body
// (the dispatch/merge references the split defers to), the same path shape the
// SKILL.md closure check uses, applied to body text.
var bodyReferenceRe = regexp.MustCompile(`references/[A-Za-z0-9_./-]+\.md`)

// bodySkillRe matches a lazy skill invocation `spacedock:<name>` the boot core
// names as a deferred load point. The first-dispatch / terminal / gate skills
// (using-claude-team, present-gate, feedback-rejection-flow) each resolve to their
// skills/<name>/SKILL.md.
var bodySkillRe = regexp.MustCompile(`spacedock:([a-z0-9-]+)`)

// bodyModRe matches a CONCRETE _mods reference (e.g. `_mods/pr-merge.md`) the boot
// core names as a deferred mod-file load point. Brace-templated placeholders
// (`_mods/{mod_name}.md`) are NOT concrete load-points and are excluded by the
// non-brace character class.
var bodyModRe = regexp.MustCompile(`_mods/([a-z0-9][a-z0-9_.-]*\.md)`)

// lazyLoadSkills are the skill names a boot-resident body may name as deferred
// load points. The ensign skill is the dispatched-worker contract, not a boot-core
// load point, so it is excluded; the FO-self reference would be a self-load.
var lazyLoadSkills = map[string]bool{
"using-claude-team": true,
"present-gate": true,
"feedback-rejection-flow": true,
}

// deferredLoadPoint is one extracted load-point: the on-disk path the body names
// and the literal token that named it (for a useful failure message).
type deferredLoadPoint struct {
path string // repo-relative resolved path
named string // the literal token in the body
}

// extractDeferredLoadPoints parses one boot-resident body's text and returns every
// deferred load-point it names, resolved to a repo-relative on-disk path: sibling
// reference read-paths (resolved under the FO skill dir), lazy skill invocations
// (resolved to skills/<name>/SKILL.md), and concrete _mods files (resolved against
// the canonical mods/ tree the repo ships). It does NOT assert presence/absence of
// any prose — it only collects the paths the body NAMES, for the os.Stat oracle to
// resolve.
func extractDeferredLoadPoints(body string) []deferredLoadPoint {
foSkillDir := filepath.Join("skills", "first-officer")
var out []deferredLoadPoint
seen := map[string]bool{}
add := func(p deferredLoadPoint) {
if seen[p.path] {
return
}
seen[p.path] = true
out = append(out, p)
}
for _, m := range bodyReferenceRe.FindAllString(body, -1) {
if strings.Contains(m, "{") {
continue
}
add(deferredLoadPoint{path: filepath.Join(foSkillDir, m), named: m})
}
for _, m := range bodySkillRe.FindAllStringSubmatch(body, -1) {
name := m[1]
if !lazyLoadSkills[name] {
continue
}
add(deferredLoadPoint{path: filepath.Join("skills", name, "SKILL.md"), named: m[0]})
}
for _, m := range bodyModRe.FindAllStringSubmatch(body, -1) {
add(deferredLoadPoint{path: filepath.Join("mods", m[1]), named: m[0]})
}
return out
}

// TestBootResidentDeferredLoadPointsResolve is the AC-4 reference-closure guard: a
// genuine structural check, not a prose-grep. For each boot-resident contract body
// it extracts every deferred load-point the body NAMES and os.Stats it. The
// EXPECTED value (the target exists on disk) comes from the FILESYSTEM — an
// independent source the contract text can diverge from — so a body that names a
// deferred reference at a moved/renamed/deleted path fails the stat. It is NOT the
// banned present-here/absent-there heading grep (boundary_guard_test.go): it does
// not assert the body contains or lacks any heading; it asserts every load-point
// the body points at resolves to a real file. The empty-walk guard keeps it from
// passing vacuously.
func TestBootResidentDeferredLoadPointsResolve(t *testing.T) {
root := repoRoot(t)
total := 0
for _, rel := range bootResidentBodies {
data, err := os.ReadFile(filepath.Join(root, rel))
if err != nil {
t.Fatalf("read boot-resident body %s: %v", rel, err)
}
points := extractDeferredLoadPoints(string(data))
for _, p := range points {
total++
if _, err := os.Stat(filepath.Join(root, p.path)); err != nil {
t.Errorf("%s names deferred load-point %q which resolves to %s — but no such file exists on disk: %v", rel, p.named, p.path, err)
}
}
}
if total == 0 {
t.Fatal("extracted zero deferred load-points from the boot-resident bodies — extraction bug; the closure check would pass vacuously")
}
}

// TestBootResidentDeferredLoadPointGuardFailsOnDanglingTarget is the AC-4 control:
// it points a boot-resident-style fixture body at a non-existent deferred reference
// and proves the closure logic goes RED, so the guard is shown able to fail (not a
// guard that can only ever pass). It drives the same extraction + os.Stat the real
// guard uses, against a planted fixture, so the control exercises the real code
// path rather than re-implementing it.
func TestBootResidentDeferredLoadPointGuardFailsOnDanglingTarget(t *testing.T) {
root := repoRoot(t)
fixture := "At first dispatch, read references/claude-fo-this-file-does-not-exist.md\n" +
"and at terminal, invoke spacedock:using-claude-team.\n"
points := extractDeferredLoadPoints(fixture)
if len(points) == 0 {
t.Fatal("control fixture extracted no load-points — the dangling-target case never exercises the stat")
}
var sawDangling, sawReal bool
for _, p := range points {
_, err := os.Stat(filepath.Join(root, p.path))
if strings.Contains(p.named, "does-not-exist") {
if err == nil {
t.Fatalf("control: the dangling reference %q unexpectedly resolved on disk", p.named)
}
sawDangling = true
} else if err == nil {
sawReal = true
}
}
if !sawDangling {
t.Fatal("control: the dangling deferred reference was not extracted — the guard cannot fail on a moved/deleted target")
}
if !sawReal {
t.Fatal("control: the real load-point (using-claude-team) was not resolved — the discriminator has nothing to contrast the dangling case against")
}
}
Loading
Loading