Lazy-TeamCreate + shallow-boot-then-greet — reach interactive readiness fast#365
Open
clkao wants to merge 18 commits into
Open
Lazy-TeamCreate + shallow-boot-then-greet — reach interactive readiness fast#365clkao wants to merge 18 commits into
clkao wants to merge 18 commits into
Conversation
j9 = backbone (3 phases): contract split (enabler + contract-audit), lazy-TeamCreate (~89k lever), shallow-boot-then-greet. Phase-1 + Phase-2/3 ideation spikes both VIABLE. T3 prose-audit files along. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r.md ref + add ethos operating-principles Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n index (drop staging spike docs) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o the sprint index Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ked on Phase-1) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch-sprint-execution.md (README→index) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add journeymetrics.ParseClaudeTurns (per-assistant-turn usage, deduped by message id, with tool_use names) so a caller can measure a single turn's context window — the field ParseClaudeJSONL's whole-run sum cannot report. Add the ensigncycle AC-6 oracle assertShallowBootMeasured: identify the greet turn (last non-dispatch turn) and assert the greet-turn context is below the ~60k ceiling and no pre-greet turn shows the ~89k team-mode prefix re-cache spike. De-risked offline against committed real-shape stream fixtures (a greet-and-stop positive, an eager-team negative) before the live drive relies on it, with independent signal isolation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add TestBootResidentDeferredLoadPointsResolve in the contractlint quarantine: os.ReadFile the two boot-resident FO contract bodies (slimmed shared core + Claude runtime adapter), extract every deferred load-point they name (sibling references/*.md read-paths, lazy spacedock:<skill> invocations → skills/<name>/SKILL.md, concrete _mods/*.md → canonical mods/), and os.Stat each. The filesystem is the independent oracle, so a body naming a moved or deleted target fails — not a prose-grep. Empty-walk guard + a dangling-target control prove it can fail. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The shallow-boot greet and the S7b before-greet merged-PR sweep rest on status --boot --json's pr_state.entries[].state reflecting LIVE gh merge state, not the stored pr field. Pin it: a stubbed gh reporting MERGED for a PR-bearing non-terminal entity must surface as state=MERGED (live), and an absent gh must surface as pr_state.status="gh not available" with no entries (the M6 degraded branch the greet keys off to report merge state UNKNOWN). Offline, deterministic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
P2 (Claude runtime adapter): the team-harness Skill invocation moves from "at startup" to "before the first team-mode dispatch" — a greet-and-stop boot never creates a team and never pays the ~89k team-mode prefix re-cache. P3 (shared-core Startup): slim step 4 to the README frontmatter (defer the body); read status --boot --json (no mod-file read, no team); split MODS into the boot-resident MODS-REPORT vs the deferred RUN-STARTUP-HOOKS action and PR_STATE into the boot-resident report vs the S7b action; add S7b, the before-greet merged-PR sweep gated on a MERGED pr_state entry (reads pr-merge.md only when there is a merge to advance, skips on gh-absent); end with greet-then- stop, with all expensive deferrals (team, dispatch/merge modules, comm-officer) past the greet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire a new host-neutral shallow-boot scenario per the README 4-step procedure: the definition + meta-test pin + doc-lock seed block; a fixture (a gate-check at a human gate + a PR-bearing entity whose stubbed gh reports MERGED, the canonical pr-merge mod registered) and prompt; a host-neutral durable-state assertion (assertShallowBoot: greet + gate presented, S7b merged-PR advanced+archived before-greet, no team config on disk, no dispatch); an offline negative case proving each sub-assertion goes red; Claude + Codex runners + a Pi coverage entry. The Claude runner also grades AC-2 (no TeamCreate before greet, over the tool stream) and AC-6 (greet-turn context below the ~60k ceiling, no pre-greet ~89k cache_creation spike, over the captured token stream). Parity/definition/doc-lock guards pass at zero spend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the FO contract by when it is needed so a boot reads only the boot-resident core: - first-officer-shared-core.md: add the Operating-principles (ethos) section at the top, drop the agents/first-officer.md cross-reference, and replace the Dispatch / Completion-reuse-conditions / Merge-and-Cleanup / Worktree-Ownership / Standing-Teammates / Mod-Block prose with load-point pointers. The gate-presentation spine (checklist review, AC cross-check, not-a-stopping-point, gated-stage decisions) and the split-root state-sync / rebase-conflict halt stay boot-resident. - claude-first-officer-runtime.md: keep Captain Interaction, Agent Back-off, and Entity-Body Inspection; name the dispatch reference (read at first dispatch) and the merge reference (read at terminalization). - New references/claude-fo-dispatch.md (Team Creation, standing teammates, Worker Resolution, Dispatch Adapter, Degraded Mode, Context Budget, Event Loop) and references/claude-fo-merge.md (Merge-and-Cleanup incl. the TERMINAL_TEARDOWN_ BOUNDED marker, Ship-Local, Worktree-removal, Mod-Block Enforcement). AC-5 retarget, same commit so go test ./... never goes red: - allowedHookFiles += claude-fo-merge.md + claude-fo-dispatch.md (the relocated ## Hook: prose). - TestGradeMarkerMatchesContract contractFiles repointed to claude-fo-merge.md (now owns TERMINAL_TEARDOWN_BOUNDED). - isClaudeAdapter recognizes claude-fo-*.md so the relocated ~/.claude coupling stays exempt from the HOME-rooted portability check. AC-4 resolves green against the post-split layout (the boot bodies now name both new references, which exist on disk). go test ./... exits 0 (1329 pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wer-reuse finding Pin the END-STATE shape the live single-entity (-p) Claude rejection-flow run produces — two bare validation Agent spawns — without running the model or touching the validator's live run (backlog seed e3z, bare-mode-coverage-baseline). The two bare spawns red assertClaudeReviewerReuse on the >1-validation-spawn #141 keepalive violation (the live failure's shape); a team-mode control (one reviewer reused by agentId) passes, proving the red is caused by the extra bare spawn, not an unsatisfiable assertion. Recon for the captain's fix-direction call — no fix applied. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… behavior (AC-3) Option (b) per the captain: the live rejection-flow runs `spacedock claude -- -p` naming one entity, so it is single-entity → bare. The old assertClaudeReviewerReuse encoded a TEAM-mode #141 keepalive a `-p` run can never satisfy (the bare cycle-1 reviewer hard-fails reuse-condition-1). The contract already makes the bare flow deterministic: "Feedback Rejection Flow (bare mode) … sequential: dispatch fix agent … then dispatch reviewer" (claude-fo-dispatch.md, a rule that predates the P1 split). So the contract-correct end-state is two distinct fresh validation spawns with the fix agent and reviewer as separate dispatches. - New assertClaudeSingleEntityRejectionFlow: >=2 distinct validation spawns AND no impl-as-validator (a SendMessage to an implementation worker to re-review). This catches BOTH observed non-deterministic live shapes — the 2-fresh-spawns run (PASS) and the impl-reused-through-validation run (FAIL). - Claude runner points at it instead of the team-mode assertClaudeReviewerReuse. - Shared prompt: drop "REUSE the kept-alive validation reviewer" (which drove the impl-as-validator hack) for a contract-faithful "follow your contract's feedback flow; fix agent and reviewer are separate; no self-review" — host-neutral, so Codex's contract-valid thread reuse (no team requirement) is unaffected. No contract files touched — the determinism rests on the pre-existing bare-mode rule; only the test encoded a wrong team-mode assumption. go test ./... 1334 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cycle-1 dropped "REUSE the kept-alive reviewer" entirely — correct for Claude (bare -p, no team, can't reuse) but it broke Codex, which CAN reuse via a persistent send_input thread and was green doing so. Make the prompt host-CONDITIONAL (captain option a): route the cycle-2 re-review to the kept-alive reviewer IF the host supports reusing it across the feedback cycle, otherwise dispatch fresh. Claude → fresh (satisfies assertClaudeSingleEntityRejectionFlow); Codex → reuses (satisfies the unchanged assertCodexReviewerReuse). Both hosts contract-correct, neither assertion relaxed. The separate-workers / no-self-review guard stays. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mCreate ParseClaudeTurns deduped by message id taking only the FIRST delta (continue-on- seen-id). But real runner streams are multi-delta: delta[0] is thinking/text and the tool_use block lands on a LATER delta (verified against the committed real captures — sonnet_teamdelete_hang has TeamCreate on delta[1], TeamDelete on delta[2]). So assertNoTeamCreateBeforeGreet read tools from the wrong row and could not see a TeamCreate — the lazy-TeamCreate proof was hollow, hidden by the hand-written single-delta fixtures. Fix: merge every delta's tool_use names into the turn (dedup by the tool_use block's unique id so a repeated delta doesn't double-count); usage is identical across deltas, so the first-delta usage is kept. Regenerate the AC-2/AC-6 fixtures as multi-delta (thinking delta + tool_use/text delta per message), trim a real multi-delta capture (claude_multidelta_team.stream.jsonl) for the journeymetrics test, and add a positive control: a pre-greet TeamCreate on a later delta now makes assertNoTeamCreateBeforeGreet RED (it false-passed before). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pture Use the committed real-runner stream sonnet_teamdelete_hang.stream.jsonl (the validator's verified ready oracle: 20/27 message ids multi-delta; its TeamCreate and TeamDelete each on a non-first delta) as the parser regression — the FIXED ParseClaudeTurns surfaces TeamCreate=true where the pre-fix first-delta-only parse reported false across all 27 turns. Driving the FULL committed fixture pins the fix to the exact stream the forensics verified the defect on; drop the redundant trimmed copy + its journeymetrics test (superseded by the full-fixture regression). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
clkao
added a commit
that referenced
this pull request
Jun 14, 2026
clkao
added a commit
that referenced
this pull request
Jun 14, 2026
…us wrong-root harness leak) PR #365 CI: codex rejection-flow = known czza collab:wait flake (re-run); opus TestLiveEnsignCycle = wrong-root wander from a GITHUB_WORKSPACE env leak in the live-test harness. Captain: block + test-faithful env-scrub (test-only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…detector (PR #365) The PR #365 opus runtime-live-e2e failed TestLiveEnsignCycle at TeamCreate because isolatedClaudeEnv's CI path passed the whole os.Environ() through to the child FO — including GITHUB_WORKSPACE (= the real spacedock checkout). That lured the FO to cd into the real repo and boot its docs/dev workflow instead of the test's tmpdir fixture; it found nothing dispatchable and (correctly, per the lazy contract) greeted-and-stopped, surfacing only as a confusing pre-TeamCreate timeout. A test-harness env leak, not a contract defect — real `spacedock claude` use has no such CI var. - cleanEnviron now drops the GITHUB_*/RUNNER_* family (isCIRepoNamingVar) so both Claude live lanes (and Codex, which shares cleanEnviron) reproduce a production-clean child env. ANTHROPIC_API_KEY, CLAUDE_CONFIG_DIR (resolved before the child env is built), and PATH survive — verified by the existing config-dir/credential tests staying green. - detectWrongRootBoot: a pure, model-agnostic detector that names the expected fixture root vs the wandered-to path, keyed on cd-off-fixture / --workflow-dir-outside / workflow-README-read-outside (a contract-skill Read from --plugin-dir is NOT flagged). Wired into TestLiveEnsignCycle and the shared Claude runner so a future leak fails loud and early. Test-only; zero skills/** touched (captain: test-faithful env fix, no FO-contract change). Offline gate `go test ./...` exit 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Greet the first officer to interactive readiness in seconds at ~43k context, not minutes at 160k — with no eager team-create.
What changed
status --boot --jsonwith a before-greet merged-PR sweep.shallow-bootlive scenario (Claude+Codex), the AC-4 reference-closure test, and the AC-6 measured-saving drive.Evidence
go test ./...1374/1374 passed.shallow-boot(Claude): greet 43,265 (<60k), zero TeamCreate, no 89k spike; AC-3 Claude+Codex rejection-flow live-passed; detached adversarial audit clean.Review guidance
claude-fo-dispatch.md— verified no reachable instruction dropped.j9