Skip to content

Lazy-TeamCreate + shallow-boot-then-greet — reach interactive readiness fast#365

Open
clkao wants to merge 18 commits into
mainfrom
spacedock-ensign/lazy-teamcreate-shallow-boot
Open

Lazy-TeamCreate + shallow-boot-then-greet — reach interactive readiness fast#365
clkao wants to merge 18 commits into
mainfrom
spacedock-ensign/lazy-teamcreate-shallow-boot

Conversation

@clkao

@clkao clkao commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Greet the first officer to interactive readiness in seconds at ~43k context, not minutes at 160k — with no eager team-create.

What changed

  • Split the FO contract into a boot-resident core + lazily-loaded dispatch/merge references.
  • Defer TeamCreate off boot; shallow-boot-then-greet off status --boot --json with a before-greet merged-PR sweep.
  • Add the shallow-boot live scenario (Claude+Codex), the AC-4 reference-closure test, and the AC-6 measured-saving drive.
  • Fix the journeymetrics multi-delta parser so the no-TeamCreate oracle can actually fail.
  • Host-neutralize the rejection-flow prompt: Claude bare-fresh, Codex thread-reuse.

Evidence

  • Offline gate: go test ./... 1374/1374 passed.
  • Live shallow-boot (Claude): greet 43,265 (<60k), zero TeamCreate, no 89k spike; AC-3 Claude+Codex rejection-flow live-passed; detached adversarial audit clean.

Review guidance

  • High-stakes FO-contract surface: the Completion-and-Gates spine stays boot-resident while reuse-conditions moved to claude-fo-dispatch.md — verified no reachable instruction dropped.

j9

clkao and others added 17 commits June 13, 2026 16:54
j9 = backbone (3 phases): contract split (enabler + contract-audit), lazy-TeamCreate (~89k lever), shallow-boot-then-greet. Phase-1 + Phase-2/3 ideation spikes both VIABLE. T3 prose-audit files along.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r.md ref + add ethos operating-principles

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n index (drop staging spike docs)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o the sprint index

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ked on Phase-1)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch-sprint-execution.md (README→index)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add journeymetrics.ParseClaudeTurns (per-assistant-turn usage, deduped by
message id, with tool_use names) so a caller can measure a single turn's
context window — the field ParseClaudeJSONL's whole-run sum cannot report.

Add the ensigncycle AC-6 oracle assertShallowBootMeasured: identify the greet
turn (last non-dispatch turn) and assert the greet-turn context is below the
~60k ceiling and no pre-greet turn shows the ~89k team-mode prefix re-cache
spike. De-risked offline against committed real-shape stream fixtures (a
greet-and-stop positive, an eager-team negative) before the live drive relies
on it, with independent signal isolation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add TestBootResidentDeferredLoadPointsResolve in the contractlint quarantine:
os.ReadFile the two boot-resident FO contract bodies (slimmed shared core +
Claude runtime adapter), extract every deferred load-point they name (sibling
references/*.md read-paths, lazy spacedock:<skill> invocations →
skills/<name>/SKILL.md, concrete _mods/*.md → canonical mods/), and os.Stat
each. The filesystem is the independent oracle, so a body naming a moved or
deleted target fails — not a prose-grep. Empty-walk guard + a dangling-target
control prove it can fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The shallow-boot greet and the S7b before-greet merged-PR sweep rest on
status --boot --json's pr_state.entries[].state reflecting LIVE gh merge state,
not the stored pr field. Pin it: a stubbed gh reporting MERGED for a PR-bearing
non-terminal entity must surface as state=MERGED (live), and an absent gh must
surface as pr_state.status="gh not available" with no entries (the M6 degraded
branch the greet keys off to report merge state UNKNOWN). Offline, deterministic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P2 (Claude runtime adapter): the team-harness Skill invocation moves from "at
startup" to "before the first team-mode dispatch" — a greet-and-stop boot never
creates a team and never pays the ~89k team-mode prefix re-cache.

P3 (shared-core Startup): slim step 4 to the README frontmatter (defer the
body); read status --boot --json (no mod-file read, no team); split MODS into
the boot-resident MODS-REPORT vs the deferred RUN-STARTUP-HOOKS action and
PR_STATE into the boot-resident report vs the S7b action; add S7b, the
before-greet merged-PR sweep gated on a MERGED pr_state entry (reads pr-merge.md
only when there is a merge to advance, skips on gh-absent); end with greet-then-
stop, with all expensive deferrals (team, dispatch/merge modules, comm-officer)
past the greet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire a new host-neutral shallow-boot scenario per the README 4-step procedure:
the definition + meta-test pin + doc-lock seed block; a fixture (a gate-check at
a human gate + a PR-bearing entity whose stubbed gh reports MERGED, the canonical
pr-merge mod registered) and prompt; a host-neutral durable-state assertion
(assertShallowBoot: greet + gate presented, S7b merged-PR advanced+archived
before-greet, no team config on disk, no dispatch); an offline negative case
proving each sub-assertion goes red; Claude + Codex runners + a Pi coverage entry.

The Claude runner also grades AC-2 (no TeamCreate before greet, over the tool
stream) and AC-6 (greet-turn context below the ~60k ceiling, no pre-greet ~89k
cache_creation spike, over the captured token stream). Parity/definition/doc-lock
guards pass at zero spend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the FO contract by when it is needed so a boot reads only the
boot-resident core:
- first-officer-shared-core.md: add the Operating-principles (ethos) section at
  the top, drop the agents/first-officer.md cross-reference, and replace the
  Dispatch / Completion-reuse-conditions / Merge-and-Cleanup / Worktree-Ownership
  / Standing-Teammates / Mod-Block prose with load-point pointers. The
  gate-presentation spine (checklist review, AC cross-check, not-a-stopping-point,
  gated-stage decisions) and the split-root state-sync / rebase-conflict halt stay
  boot-resident.
- claude-first-officer-runtime.md: keep Captain Interaction, Agent Back-off, and
  Entity-Body Inspection; name the dispatch reference (read at first dispatch) and
  the merge reference (read at terminalization).
- New references/claude-fo-dispatch.md (Team Creation, standing teammates, Worker
  Resolution, Dispatch Adapter, Degraded Mode, Context Budget, Event Loop) and
  references/claude-fo-merge.md (Merge-and-Cleanup incl. the TERMINAL_TEARDOWN_
  BOUNDED marker, Ship-Local, Worktree-removal, Mod-Block Enforcement).

AC-5 retarget, same commit so go test ./... never goes red:
- allowedHookFiles += claude-fo-merge.md + claude-fo-dispatch.md (the relocated
  ## Hook: prose).
- TestGradeMarkerMatchesContract contractFiles repointed to claude-fo-merge.md
  (now owns TERMINAL_TEARDOWN_BOUNDED).
- isClaudeAdapter recognizes claude-fo-*.md so the relocated ~/.claude coupling
  stays exempt from the HOME-rooted portability check.

AC-4 resolves green against the post-split layout (the boot bodies now name both
new references, which exist on disk). go test ./... exits 0 (1329 pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wer-reuse finding

Pin the END-STATE shape the live single-entity (-p) Claude rejection-flow run
produces — two bare validation Agent spawns — without running the model or
touching the validator's live run (backlog seed e3z, bare-mode-coverage-baseline).
The two bare spawns red assertClaudeReviewerReuse on the >1-validation-spawn #141
keepalive violation (the live failure's shape); a team-mode control (one reviewer
reused by agentId) passes, proving the red is caused by the extra bare spawn, not
an unsatisfiable assertion.

Recon for the captain's fix-direction call — no fix applied.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… behavior (AC-3)

Option (b) per the captain: the live rejection-flow runs `spacedock claude -- -p`
naming one entity, so it is single-entity → bare. The old assertClaudeReviewerReuse
encoded a TEAM-mode #141 keepalive a `-p` run can never satisfy (the bare cycle-1
reviewer hard-fails reuse-condition-1). The contract already makes the bare flow
deterministic: "Feedback Rejection Flow (bare mode) … sequential: dispatch fix
agent … then dispatch reviewer" (claude-fo-dispatch.md, a rule that predates the
P1 split). So the contract-correct end-state is two distinct fresh validation
spawns with the fix agent and reviewer as separate dispatches.

- New assertClaudeSingleEntityRejectionFlow: >=2 distinct validation spawns AND no
  impl-as-validator (a SendMessage to an implementation worker to re-review). This
  catches BOTH observed non-deterministic live shapes — the 2-fresh-spawns run
  (PASS) and the impl-reused-through-validation run (FAIL).
- Claude runner points at it instead of the team-mode assertClaudeReviewerReuse.
- Shared prompt: drop "REUSE the kept-alive validation reviewer" (which drove the
  impl-as-validator hack) for a contract-faithful "follow your contract's feedback
  flow; fix agent and reviewer are separate; no self-review" — host-neutral, so
  Codex's contract-valid thread reuse (no team requirement) is unaffected.

No contract files touched — the determinism rests on the pre-existing bare-mode
rule; only the test encoded a wrong team-mode assumption. go test ./... 1334 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cycle-1 dropped "REUSE the kept-alive reviewer" entirely — correct for Claude
(bare -p, no team, can't reuse) but it broke Codex, which CAN reuse via a
persistent send_input thread and was green doing so. Make the prompt
host-CONDITIONAL (captain option a): route the cycle-2 re-review to the kept-alive
reviewer IF the host supports reusing it across the feedback cycle, otherwise
dispatch fresh. Claude → fresh (satisfies assertClaudeSingleEntityRejectionFlow);
Codex → reuses (satisfies the unchanged assertCodexReviewerReuse). Both hosts
contract-correct, neither assertion relaxed. The separate-workers / no-self-review
guard stays.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mCreate

ParseClaudeTurns deduped by message id taking only the FIRST delta (continue-on-
seen-id). But real runner streams are multi-delta: delta[0] is thinking/text and
the tool_use block lands on a LATER delta (verified against the committed real
captures — sonnet_teamdelete_hang has TeamCreate on delta[1], TeamDelete on
delta[2]). So assertNoTeamCreateBeforeGreet read tools from the wrong row and
could not see a TeamCreate — the lazy-TeamCreate proof was hollow, hidden by the
hand-written single-delta fixtures.

Fix: merge every delta's tool_use names into the turn (dedup by the tool_use
block's unique id so a repeated delta doesn't double-count); usage is identical
across deltas, so the first-delta usage is kept. Regenerate the AC-2/AC-6 fixtures
as multi-delta (thinking delta + tool_use/text delta per message), trim a real
multi-delta capture (claude_multidelta_team.stream.jsonl) for the journeymetrics
test, and add a positive control: a pre-greet TeamCreate on a later delta now
makes assertNoTeamCreateBeforeGreet RED (it false-passed before).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pture

Use the committed real-runner stream sonnet_teamdelete_hang.stream.jsonl (the
validator's verified ready oracle: 20/27 message ids multi-delta; its TeamCreate
and TeamDelete each on a non-first delta) as the parser regression — the FIXED
ParseClaudeTurns surfaces TeamCreate=true where the pre-fix first-delta-only parse
reported false across all 27 turns. Driving the FULL committed fixture pins the fix
to the exact stream the forensics verified the defect on; drop the redundant
trimmed copy + its journeymetrics test (superseded by the full-fixture regression).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
clkao added a commit that referenced this pull request Jun 14, 2026
@clkao clkao deployed to CI-E2E June 14, 2026 01:57 — with GitHub Actions Active
clkao added a commit that referenced this pull request Jun 14, 2026
…us wrong-root harness leak)

PR #365 CI: codex rejection-flow = known czza collab:wait flake (re-run);
opus TestLiveEnsignCycle = wrong-root wander from a GITHUB_WORKSPACE env leak
in the live-test harness. Captain: block + test-faithful env-scrub (test-only).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…detector (PR #365)

The PR #365 opus runtime-live-e2e failed TestLiveEnsignCycle at TeamCreate
because isolatedClaudeEnv's CI path passed the whole os.Environ() through
to the child FO — including GITHUB_WORKSPACE (= the real spacedock checkout).
That lured the FO to cd into the real repo and boot its docs/dev workflow
instead of the test's tmpdir fixture; it found nothing dispatchable and
(correctly, per the lazy contract) greeted-and-stopped, surfacing only as a
confusing pre-TeamCreate timeout. A test-harness env leak, not a contract
defect — real `spacedock claude` use has no such CI var.

- cleanEnviron now drops the GITHUB_*/RUNNER_* family (isCIRepoNamingVar) so
  both Claude live lanes (and Codex, which shares cleanEnviron) reproduce a
  production-clean child env. ANTHROPIC_API_KEY, CLAUDE_CONFIG_DIR (resolved
  before the child env is built), and PATH survive — verified by the existing
  config-dir/credential tests staying green.
- detectWrongRootBoot: a pure, model-agnostic detector that names the expected
  fixture root vs the wandered-to path, keyed on cd-off-fixture /
  --workflow-dir-outside / workflow-README-read-outside (a contract-skill Read
  from --plugin-dir is NOT flagged). Wired into TestLiveEnsignCycle and the
  shared Claude runner so a future leak fails loud and early.

Test-only; zero skills/** touched (captain: test-faithful env fix, no
FO-contract change). Offline gate `go test ./...` exit 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant