From 8a2bfff904fc1aa5059132ef17058bceaeef71d9 Mon Sep 17 00:00:00 2001 From: CL Kao Date: Sat, 13 Jun 2026 00:31:17 -0700 Subject: [PATCH] status --next-id: use-new hint; FO contract + filing scenario for atomic create AC-1: `status --next-id` plain-text emits a stderr hint pointing at `spacedock new` (atomic-create), leaving stdout the bare id; the --json path stays hint-free and unchanged. The hint is a native-only divergence stripped in the parity normalizer (like STATE_BACKEND); its presence is pinned by a dedicated test in internal/status. AC-2: teach `spacedock new [--folder] [--id-seed S --id-actor A]` as the blessed atomic-create path in first-officer-shared-core.md (Status Viewer, ID Styles, FO Write Scope) and claude-first-officer-runtime.md, keeping --next-id as candidate-preview only. Proven by a new `filing` shared scenario in internal/ensigncycle: empty-workflow fixture + a prompt asking the FO to file one seed, graded on the FO's tool-call stream (filed via `new`, did NOT use the --next-id + hand-write pair) for both Claude and Codex, with a Pi coverage entry so TestSharedScenarioRunnerCoverage stays green. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/specs/scenario-testing-principles.md | 1 + .../ensigncycle/claude_live_runner_test.go | 22 +++ .../ensigncycle/codex_live_runner_test.go | 22 +++ .../ensigncycle/pi_shared_coverage_test.go | 4 + .../shared_filing_negative_test.go | 98 +++++++++++ internal/ensigncycle/shared_filing_test.go | 155 ++++++++++++++++++ internal/ensigncycle/shared_fixtures_test.go | 47 ++++++ .../ensigncycle/shared_scenarios_meta_test.go | 1 + internal/ensigncycle/shared_scenarios_test.go | 5 + internal/status/harness_test.go | 13 ++ internal/status/native_runner.go | 5 + internal/status/nextid_new_hint_test.go | 62 +++++++ internal/status/zz_independent_parity_test.go | 4 + .../claude-first-officer-runtime.md | 4 + .../references/first-officer-shared-core.md | 12 +- 15 files changed, 449 insertions(+), 6 deletions(-) create mode 100644 internal/ensigncycle/shared_filing_negative_test.go create mode 100644 internal/ensigncycle/shared_filing_test.go create mode 100644 internal/status/nextid_new_hint_test.go diff --git a/docs/specs/scenario-testing-principles.md b/docs/specs/scenario-testing-principles.md index a87c687a2..3def6c945 100644 --- a/docs/specs/scenario-testing-principles.md +++ b/docs/specs/scenario-testing-principles.md @@ -59,6 +59,7 @@ The first foundation is the host-neutral runtime scenarios already shipped and h - `rejection-flow` — the FO drives a two-cycle rejection trajectory: route the finding back through implementation, re-implement, and re-validate a second cycle reusing the kept-alive reviewer. - `feedback-3-cycle-escalation` — on the third consecutive REJECTED validation the FO escalates to the human instead of auto-bouncing a fourth time. - `merge-hook-guardrail` — the FO cannot bypass a registered merge hook by terminalizing without pr, mod-block, or force. +- `filing` — the FO files a new seed entity via the atomic `spacedock new ` path, not the drift-prone `--next-id` + hand-write pair. These IDs are the code-backed source of truth. They mirror the `sharedRuntimeScenarios()` table in `internal/ensigncycle`; the seed IDs declared above must equal that table. This block is machine-readable so a lock test can bind the doc to the code and red on drift in either direction — adding, dropping, or renaming a scenario on one side without the other. This is what makes the doc the human-readable face of a code-backed truth rather than prose bound to nothing. diff --git a/internal/ensigncycle/claude_live_runner_test.go b/internal/ensigncycle/claude_live_runner_test.go index ea0ec4124..892d8f61e 100644 --- a/internal/ensigncycle/claude_live_runner_test.go +++ b/internal/ensigncycle/claude_live_runner_test.go @@ -102,6 +102,7 @@ func claudeScenarioRunners() map[string]func(*testing.T, claudeLiveRunner, share "rejection-flow": runClaudeRejectionFlowScenario, "feedback-3-cycle-escalation": runClaudeFeedback3CycleEscalationScenario, "merge-hook-guardrail": runClaudeMergeHookGuardrailScenario, + "filing": runClaudeFilingScenario, } } @@ -203,6 +204,27 @@ func runClaudeMergeHookGuardrailScenario(t *testing.T, runner claudeLiveRunner, emitClaudeScenarioMetrics(t, scenario, result, runner.model) } +// runClaudeFilingScenario drives the real FO against an EMPTY workflow and asks it +// to file one seed entity. It grades the FO's recorded tool-call stream — the FO +// filed via `spacedock … new `, not the `--next-id` + `Write` pair — because +// the durable end-state file is indistinguishable between the two paths. The file +// must also actually land (the run produced a real seed), so the stream grade is +// proof of HOW, not just THAT, the entity was filed. +func runClaudeFilingScenario(t *testing.T, runner claudeLiveRunner, scenario sharedRuntimeScenario) { + t.Helper() + workflowRoot := t.TempDir() + entityPath := writeFilingWorkflow(t, workflowRoot) + + result := runner.run(t, scenario, workflowRoot, filingPrompt()) + if _, err := os.Stat(entityPath); err != nil { + t.Fatalf("the FO did not land the seed entity at %s: %v\nFinal message:\n%s\nArtifacts: %s", entityPath, err, result.finalMessage, result.artifactDir) + } + if err := assertClaudeFilingViaNew(result.stream, filingSlug); err != nil { + t.Fatalf("%v\nFinal message:\n%s\nArtifacts: %s", err, result.finalMessage, result.artifactDir) + } + emitClaudeScenarioMetrics(t, scenario, result, runner.model) +} + // run launches the real `spacedock claude` front door for one shared scenario and // returns the (finalMessage, full stream) the shared assertions consume. The // launch shape is the spike WINNER: --plugin-dir + --skip-contract-check are the diff --git a/internal/ensigncycle/codex_live_runner_test.go b/internal/ensigncycle/codex_live_runner_test.go index ea40eb814..5fd4413e0 100644 --- a/internal/ensigncycle/codex_live_runner_test.go +++ b/internal/ensigncycle/codex_live_runner_test.go @@ -75,6 +75,7 @@ func codexScenarioRunners() map[string]func(*testing.T, codexLiveRunner, sharedR "rejection-flow": runCodexRejectionFlowScenario, "feedback-3-cycle-escalation": runCodexFeedback3CycleEscalationScenario, "merge-hook-guardrail": runCodexMergeHookGuardrailScenario, + "filing": runCodexFilingScenario, } } @@ -214,6 +215,27 @@ func runCodexMergeHookGuardrailScenario(t *testing.T, runner codexLiveRunner, sc emitCodexScenarioMetrics(t, scenario, result) } +// runCodexFilingScenario drives the real FO against an EMPTY workflow and asks it +// to file one seed entity. Like the Claude runner it grades the FO's recorded +// command stream — the FO filed via `spacedock … new `, not a `--next-id` +// preview-then-write — because the durable end-state file is indistinguishable +// between the two paths. The file must also actually land, so the stream grade is +// proof of HOW, not just THAT, the entity was filed. +func runCodexFilingScenario(t *testing.T, runner codexLiveRunner, scenario sharedRuntimeScenario) { + t.Helper() + workflowRoot := t.TempDir() + entityPath := writeFilingWorkflow(t, workflowRoot) + + result := runner.run(t, scenario, workflowRoot, filingPrompt()) + if _, err := os.Stat(entityPath); err != nil { + t.Fatalf("the FO did not land the seed entity at %s: %v\nFinal message:\n%s\nArtifacts: %s", entityPath, err, result.finalMessage, result.artifactDir) + } + if err := assertCodexFilingViaNew(result.jsonl, filingSlug); err != nil { + t.Fatalf("%v\nFinal message:\n%s\nArtifacts: %s", err, result.finalMessage, result.artifactDir) + } + emitCodexScenarioMetrics(t, scenario, result) +} + // run launches `codex exec --json` for one shared scenario. Liveness is the SAME // streamWatcher the Claude runner and the live cycle use — one mechanism, no second // impl. drainToExit runs the process to exit accumulating the full --json diff --git a/internal/ensigncycle/pi_shared_coverage_test.go b/internal/ensigncycle/pi_shared_coverage_test.go index f36f22d36..e26881698 100644 --- a/internal/ensigncycle/pi_shared_coverage_test.go +++ b/internal/ensigncycle/pi_shared_coverage_test.go @@ -27,6 +27,10 @@ func piSharedScenarioCoverageMap() map[string]piSharedScenarioCoverage { mode: "gap", reason: "Pi currently has durable live coverage for subagent dispatch/front-door setup, but not a live-safe shared first-officer merge-hook runner.", }, + "filing": { + mode: "gap", + reason: "Pi currently has durable live coverage for subagent dispatch/front-door setup, but not a live-safe shared first-officer filing runner.", + }, } } diff --git a/internal/ensigncycle/shared_filing_negative_test.go b/internal/ensigncycle/shared_filing_negative_test.go new file mode 100644 index 000000000..a70231025 --- /dev/null +++ b/internal/ensigncycle/shared_filing_negative_test.go @@ -0,0 +1,98 @@ +package ensigncycle + +import "testing" + +// Offline positive + negative cases for the `filing` scenario assertions. They +// build synthetic host streams — a stream that filed via `new` (passes) and the +// SPECIFIC manual-flow streams the assertion guards against (`--next-id` + a +// hand-write, must go red) — so a tautological assertion that only checked "a new +// command appeared" would stay green on the manual flow and these cases fail it. +// Offline (default tag): the assertions are pure functions over the transcript. + +// claudeToolUse builds a stream-json assistant line carrying one tool_use block. +func claudeToolUse(name, inputJSON string) string { + return `{"type":"assistant","message":{"content":[{"type":"tool_use","name":"` + name + `","input":` + inputJSON + `}]}}` +} + +// codexCommand builds a `codex exec --json` command_execution item line. +func codexCommand(command string) string { + return `{"type":"item.completed","item":{"type":"command_execution","command":"` + command + `"}}` +} + +func TestAssertClaudeFilingViaNew(t *testing.T) { + slug := filingSlug + + // Positive: the FO filed via `spacedock new ` piping a body on stdin. + filed := claudeToolUse("Bash", `{"command":"spacedock new `+slug+` --workflow-dir . <<'EOF'\n# Wire The Thing\nbody\nEOF"}`) + if err := assertClaudeFilingViaNew(filed, slug); err != nil { + t.Fatalf("expected a `new`-filed stream to pass: %v", err) + } + + // Positive: the `--new` flag alias also counts. + filedAlias := claudeToolUse("Bash", `{"command":"spacedock status --new `+slug+` --workflow-dir ."}`) + if err := assertClaudeFilingViaNew(filedAlias, slug); err != nil { + t.Fatalf("expected the `--new` alias to count as atomic filing: %v", err) + } + + // Negative: no atomic filing at all — the FO only previewed the id and never + // committed to a create path. Must fail on the missing-`new` half. + previewOnly := claudeToolUse("Bash", `{"command":"spacedock status --next-id --workflow-dir ."}`) + if err := assertClaudeFilingViaNew(previewOnly, slug); err == nil { + t.Fatal("expected a stream with no `new` command to fail") + } + + // Negative: the manual pair — `--next-id` preview THEN a `Write` of the entity + // file. This is the drift-prone flow `new` replaces; it must fail even though + // the durable file would look identical. + manualPair := claudeToolUse("Bash", `{"command":"spacedock status --next-id --workflow-dir ."}`) + "\n" + + claudeToolUse("Write", `{"file_path":"001-`+slug+`.md","content":"---\nid: 001\n---\n"}`) + if err := assertClaudeFilingViaNew(manualPair, slug); err == nil { + t.Fatal("expected the manual `--next-id` + `Write` pair to fail even with no `new` command") + } + + // Negative: BOTH `new` AND the manual pair appear — a run that filed atomically + // but ALSO hand-wrote must still fail on the pair check, so the positive half + // cannot mask a manual write. + newPlusManual := filed + "\n" + manualPair + if err := assertClaudeFilingViaNew(newPlusManual, slug); err == nil { + t.Fatal("expected `new` plus the manual `--next-id` + `Write` pair to fail on the pair check") + } + + // A `--next-id` alone alongside `new` (no entity Write) is fine — previewing the + // candidate is not the manual flow without the hand-write that pairs with it. + newWithPreview := filed + "\n" + claudeToolUse("Bash", `{"command":"spacedock status --next-id --workflow-dir ."}`) + if err := assertClaudeFilingViaNew(newWithPreview, slug); err != nil { + t.Fatalf("expected `new` plus a bare `--next-id` preview (no entity Write) to pass: %v", err) + } +} + +func TestAssertCodexFilingViaNew(t *testing.T) { + slug := filingSlug + + // Positive: the FO filed via a `spacedock new ` command_execution. + filed := codexCommand("spacedock new " + slug + " --workflow-dir .") + if err := assertCodexFilingViaNew(filed, slug); err != nil { + t.Fatalf("expected a `new`-filed Codex stream to pass: %v", err) + } + + // Negative: no atomic filing — must fail on the missing-`new` half. + none := codexCommand("spacedock status --workflow-dir .") + if err := assertCodexFilingViaNew(none, slug); err == nil { + t.Fatal("expected a Codex stream with no `new` command to fail") + } + + // Negative: the manual flow's id source — a `--next-id` command — appears. On + // Codex (no Write tool) the `--next-id` command itself is the discriminator; + // `new` needs none. Must fail even if `new` was also run. + newPlusNextID := filed + "\n" + codexCommand("spacedock status --next-id --workflow-dir .") + if err := assertCodexFilingViaNew(newPlusNextID, slug); err == nil { + t.Fatal("expected a `--next-id` filing command on Codex to fail even alongside `new`") + } + + // Negative: only the manual `--next-id` preview, no `new` — fails on both halves + // (caught by the missing-`new` check first). + nextIDOnly := codexCommand("spacedock status --next-id --workflow-dir .") + if err := assertCodexFilingViaNew(nextIDOnly, slug); err == nil { + t.Fatal("expected a `--next-id`-only Codex stream to fail") + } +} diff --git a/internal/ensigncycle/shared_filing_test.go b/internal/ensigncycle/shared_filing_test.go new file mode 100644 index 000000000..ada5e7cf1 --- /dev/null +++ b/internal/ensigncycle/shared_filing_test.go @@ -0,0 +1,155 @@ +package ensigncycle + +import ( + "encoding/json" + "fmt" + "regexp" + "strings" +) + +// The host-specific filing assertions for the `filing` scenario. They grade the +// FO's recorded tool-call STREAM — not the end-state file, which looks identical +// whether filed via `spacedock new` or hand-assembled, and not a grep of the +// contract prose. The producer signal is: the FO ran a `spacedock … new ` +// invocation (the atomic-create path) and did NOT fall back to the manual +// `--next-id` + file-write pair. Claude runs commands via the `Bash` tool and +// writes files via the `Write` tool; Codex runs everything (including file +// writes) as `command_execution` items — so the manual-pair shape differs per +// host and the assertions live behind host adapters, like reviewer-reuse. They +// sit under the DEFAULT build tags (stdlib JSON only) so the offline negative +// tests exercise them without spending a model. + +// newInvocation matches a spacedock atomic-create invocation in a command string: +// either the `new` subcommand or the `--new` flag (its alias), in a `spacedock` +// or `${SPACEDOCK_BIN…}` launcher call. The slug is matched separately so the +// command must carry BOTH the create verb and the requested slug. +var newInvocation = regexp.MustCompile(`(?:spacedock|SPACEDOCK_BIN)[^\n]*?(?:\bnew\b|--new)`) + +// nextIDInvocation matches a `status --next-id` candidate-preview command — the +// first half of the manual filing pair the atomic path replaces. +var nextIDInvocation = regexp.MustCompile(`--next-id\b`) + +// commandFilesViaNew reports whether a command string is the atomic-create call +// for the requested slug: a `new`/`--new` invocation that names the slug. +func commandFilesViaNew(command, slug string) bool { + return newInvocation.MatchString(command) && strings.Contains(command, slug) +} + +// assertClaudeFilingViaNew scans the stream-json transcript for the FO filing the +// seed via `spacedock … new ` (a Bash tool call) and NOT via the manual +// `--next-id` + `Write` pair. It enforces both halves, because either alone +// false-passes the manual flow: +// +// 1. The FO ran a `spacedock … new ` Bash command (the atomic-create path). +// 2. The FO did NOT emit the manual pair: a `--next-id` Bash command AND a +// `Write` tool_use creating the entity `.md`. A `Write` of the entity file +// after a `--next-id` preview is exactly the drift-prone flow `new` replaces, +// so its presence FAILS even if `new` was also run. +func assertClaudeFilingViaNew(stream, slug string) error { + filedViaNew := false + sawNextID := false + wroteEntityFile := false + + for _, line := range strings.Split(stream, "\n") { + line = strings.TrimSpace(line) + if line == "" { + continue + } + var entry struct { + Message *struct { + Content []struct { + Type string `json:"type"` + Name string `json:"name"` + Input struct { + Command string `json:"command"` + FilePath string `json:"file_path"` + } `json:"input"` + } `json:"content"` + } `json:"message"` + } + if err := json.Unmarshal([]byte(line), &entry); err != nil || entry.Message == nil { + continue + } + for _, block := range entry.Message.Content { + if block.Type != "tool_use" { + continue + } + switch block.Name { + case "Bash": + if commandFilesViaNew(block.Input.Command, slug) { + filedViaNew = true + } + if nextIDInvocation.MatchString(block.Input.Command) { + sawNextID = true + } + case "Write": + if strings.Contains(block.Input.FilePath, slug) && strings.HasSuffix(block.Input.FilePath, ".md") { + wroteEntityFile = true + } + } + } + } + + if !filedViaNew { + return fmt.Errorf("the FO did not file the seed via a `spacedock … new %s` command — it never used the atomic-create path", slug) + } + if sawNextID && wroteEntityFile { + return fmt.Errorf("the FO emitted the manual `--next-id` + `Write %s.md` pair — it hand-assembled the entity instead of letting `new` write it atomically", slug) + } + return nil +} + +// codexCommandItem is one `codex exec --json` command_execution item: Codex runs +// every shell action — including writing a file via heredoc/apply_patch — as a +// command_execution, so both the `new` invocation and any manual file-write land +// here. +type codexCommandItem struct { + Type string `json:"type"` + Item struct { + Type string `json:"type"` + Command string `json:"command"` + } `json:"item"` +} + +// assertCodexFilingViaNew scans the `codex exec --json` transcript for the FO +// filing the seed via `spacedock … new ` and NOT via the manual flow. On +// Codex there is no `Write` tool — the manual pair would be a `--next-id` command +// followed by a shell file-write — so the discriminator is the `--next-id` +// candidate-preview command: the atomic path needs none. It enforces both halves: +// +// 1. The FO ran a `spacedock … new ` command_execution. +// 2. The FO did NOT run a `--next-id` filing command (the manual pair's id +// source). `new` mints the id itself, so a `--next-id` here means the FO +// reached for the manual flow. +func assertCodexFilingViaNew(jsonl, slug string) error { + filedViaNew := false + sawNextID := false + + for _, line := range strings.Split(jsonl, "\n") { + line = strings.TrimSpace(line) + if line == "" { + continue + } + var ev codexCommandItem + if err := json.Unmarshal([]byte(line), &ev); err != nil { + continue + } + if ev.Item.Type != "command_execution" { + continue + } + if commandFilesViaNew(ev.Item.Command, slug) { + filedViaNew = true + } + if nextIDInvocation.MatchString(ev.Item.Command) { + sawNextID = true + } + } + + if !filedViaNew { + return fmt.Errorf("the FO did not file the seed via a `spacedock … new %s` command — it never used the atomic-create path", slug) + } + if sawNextID { + return fmt.Errorf("the FO ran a `--next-id` filing command — it reached for the manual preview-then-write flow instead of the atomic `new` path") + } + return nil +} diff --git a/internal/ensigncycle/shared_fixtures_test.go b/internal/ensigncycle/shared_fixtures_test.go index 799776e0b..13634fcce 100644 --- a/internal/ensigncycle/shared_fixtures_test.go +++ b/internal/ensigncycle/shared_fixtures_test.go @@ -293,3 +293,50 @@ func mergeHookGuardPrompt() string { "Do not edit, archive, approve, force, set mod-block, or retry terminalization. Your final response must include the guard error mentioning merge hooks.", ) } + +// filingSlug is the slug the FO is asked to file. It is what the positive +// assertion looks for in the `spacedock … new ` command and what the entity +// file lands as on disk. +const filingSlug = "wire-the-thing" + +func writeFilingWorkflow(t *testing.T, root string) string { + t.Helper() + writeFile(t, filepath.Join(root, "README.md"), filingReadme()) + gitInit(t, root) + // The entity does NOT exist yet — the FO files the first seed during the run. + // `spacedock new ` writes the flat `.md` form (the minted id is + // stamped INTO the frontmatter, not into the filename). The runner stats this + // path AFTER the run to confirm the seed landed. + return filepath.Join(root, filingSlug+".md") +} + +func filingReadme() string { + return "---\n" + + "commissioned-by: spacedock@1\n" + + "entity-type: task\n" + + "id-style: sequential\n" + + "stages:\n" + + " defaults:\n" + + " worktree: false\n" + + " concurrency: 1\n" + + " states:\n" + + " - name: backlog\n" + + " initial: true\n" + + " - name: done\n" + + " terminal: true\n" + + "---\n" + + "# Filing Fixture\n\n" + + "This fixture starts EMPTY: there are no entities yet. The first officer is asked to file one seed task. The id-style is `sequential`, so the manual flow (`status --next-id` then hand-writing the file) is available — the scenario proves the FO instead uses the atomic-create path.\n\n" + + "### backlog\n\nSeed tasks land here.\n\n- **Outputs:** A filed seed entity.\n\n" + + "### done\n\nTerminal state.\n" +} + +func filingPrompt() string { + return fmt.Sprintf("%s\n\n%s\n%s\n%s\n%s", + "Use $spacedock:first-officer for this whole run.", + "Workflow directory: .", + "This workflow is empty. File one new seed task with the slug `"+filingSlug+"` and the title `Wire The Thing`, landing it in the initial backlog stage with a one-line description body.", + "File it using the blessed atomic-create path your contract teaches, not by hand-assembling frontmatter after a candidate-id preview.", + "Do not dispatch any workers and do not advance the entity past backlog. Your final response must confirm the seed task was filed.", + ) +} diff --git a/internal/ensigncycle/shared_scenarios_meta_test.go b/internal/ensigncycle/shared_scenarios_meta_test.go index b858fac0e..751a5d2a1 100644 --- a/internal/ensigncycle/shared_scenarios_meta_test.go +++ b/internal/ensigncycle/shared_scenarios_meta_test.go @@ -33,6 +33,7 @@ func TestSharedRuntimeScenarioDefinitions(t *testing.T) { "rejection-flow", "feedback-3-cycle-escalation", "merge-hook-guardrail", + "filing", } if !reflect.DeepEqual(got, want) { t.Fatalf("shared runtime scenarios = %v, want %v", got, want) diff --git a/internal/ensigncycle/shared_scenarios_test.go b/internal/ensigncycle/shared_scenarios_test.go index 06c1960dd..aba634dc0 100644 --- a/internal/ensigncycle/shared_scenarios_test.go +++ b/internal/ensigncycle/shared_scenarios_test.go @@ -39,5 +39,10 @@ func sharedRuntimeScenarios() []sharedRuntimeScenario { oldPythonTest: "tests/test_merge_hook_guardrail.py", intent: "FO cannot bypass a registered merge hook by terminalizing without pr, mod-block, or force.", }, + { + name: "filing", + oldPythonTest: "n/a (new behavior — `spacedock new` adopted post-Python port)", + intent: "FO files a new seed entity via the atomic `spacedock new ` path, not the drift-prone `--next-id` + hand-write pair.", + }, } } diff --git a/internal/status/harness_test.go b/internal/status/harness_test.go index 8553bba9d..00d6360f3 100644 --- a/internal/status/harness_test.go +++ b/internal/status/harness_test.go @@ -65,6 +65,19 @@ func stripSandbox(s string) string { return sandboxLineRe.ReplaceAllString(s, "") } +// nextIDHintLineRe matches the native-only `--next-id` use-`new` hint line. Like +// the STATE_BACKEND banner, this is an intentional native/oracle divergence (the +// retired Python oracle emits no hint) that the parity normalizers strip from +// both sides so the certified id-on-stdout parity still byte-matches. The hint's +// presence itself is pinned by TestNextIDPlainTextEmitsNewHintOnStderr. +var nextIDHintLineRe = regexp.MustCompile(`hint: ` + "`spacedock new" + `[^\n]*\n`) + +// stripNextIDHint removes the native-only --next-id use-`new` hint so a native +// run that carries it normalizes to the same stderr bytes as the oracle. +func stripNextIDHint(s string) string { + return nextIDHintLineRe.ReplaceAllString(s, "") +} + // bootNextIDLineRe matches ONLY the boot `NEXT_ID:` line's sd-b32 value — the // minted candidate, which hashes the realpath'd workflow dir and so varies per // checkout path. Anchored to the line prefix so it does NOT touch a stored diff --git a/internal/status/native_runner.go b/internal/status/native_runner.go index 846379990..7c3f1028f 100644 --- a/internal/status/native_runner.go +++ b/internal/status/native_runner.go @@ -221,6 +221,11 @@ func dispatch(probe claudeteam.TeamStateProbe, args []string, dir string, e env, emitJSON(stdout, singletonJSON("next-id", "id", id)) return 0 } + // A --next-id candidate is a preview, not a reservation; point the operator + // at `spacedock new`, which mints the id and atomically writes the stamped + // entity in one call. The hint goes to stderr so it never pollutes the id + // that callers parse from stdout. + fmt.Fprintln(stderr, "hint: `spacedock new ` files an entity atomically (mints the id and writes the stamped file in one step)") fmt.Fprintln(stdout, id) return 0 } diff --git a/internal/status/nextid_new_hint_test.go b/internal/status/nextid_new_hint_test.go new file mode 100644 index 000000000..098847718 --- /dev/null +++ b/internal/status/nextid_new_hint_test.go @@ -0,0 +1,62 @@ +// ABOUTME: AC-1 — `status --next-id` emits a use-`new` hint on stderr without +// ABOUTME: altering the stdout id, and the --next-id --json path stays hint-free. +package status + +import ( + "strings" + "testing" +) + +// TestNextIDPlainTextEmitsNewHintOnStderr pins that the plain-text --next-id +// path writes ONLY the computed id to stdout (unchanged from the parity bytes) +// while pointing the operator at the atomic `spacedock new` path on stderr. The +// expectation source is independent of native_runner.go: the id is what +// computeNextID returns (observed via stdout), not the hint string read back +// out of the runner, so the assertion cannot be satisfied by echoing prose. +func TestNextIDPlainTextEmitsNewHintOnStderr(t *testing.T) { + root := indSeqWorkflow(t) + env := indEnv(t) + + out, errOut, code := indRunNative(t, root, env, "", "--workflow-dir", root, "--next-id") + if code != 0 { + t.Fatalf("--next-id exit=%d stderr=%q", code, errOut) + } + + // stdout is exactly the id and nothing else (single trailing newline). The + // id is computeNextID's output; the seq fixture's next sequential id is 006. + if out != "006\n" { + t.Fatalf("--next-id stdout must be exactly the id, got %q", out) + } + + // stderr carries a hint that points at `spacedock new` as the atomic path. + if !strings.Contains(errOut, "spacedock new") { + t.Fatalf("--next-id stderr must hint at `spacedock new`, got %q", errOut) + } + + // The hint must not leak the id into stderr (the id belongs on stdout only). + if strings.Contains(errOut, strings.TrimSpace(out)) { + t.Fatalf("--next-id hint must not echo the id %q onto stderr: %q", strings.TrimSpace(out), errOut) + } +} + +// TestNextIDJSONStaysHintFree pins that the machine-readable --next-id --json +// path is unchanged: stdout is the single-key {"command":"next-id","id":...} +// object and stderr carries NO hint, so a JSON consumer sees no new noise. +func TestNextIDJSONStaysHintFree(t *testing.T) { + root := indSeqWorkflow(t) + env := indEnv(t) + + out, errOut, code := indRunNative(t, root, env, "", "--workflow-dir", root, "--next-id", "--json") + if code != 0 { + t.Fatalf("--next-id --json exit=%d stderr=%q", code, errOut) + } + if got := strings.TrimSpace(out); got != `{"command":"next-id","id":"006"}` { + t.Fatalf("--next-id --json stdout changed, got %q", got) + } + if strings.Contains(errOut, "spacedock new") { + t.Fatalf("--next-id --json must stay hint-free on stderr, got %q", errOut) + } + if errOut != "" { + t.Fatalf("--next-id --json stderr must be empty, got %q", errOut) + } +} diff --git a/internal/status/zz_independent_parity_test.go b/internal/status/zz_independent_parity_test.go index 33e0a92a3..63e0a81a5 100644 --- a/internal/status/zz_independent_parity_test.go +++ b/internal/status/zz_independent_parity_test.go @@ -69,6 +69,10 @@ func indNormalize(s, root string) string { // SANDBOX is likewise native-only and PATH-dependent — strip it from the // parity body; the state-from-inputs behavior is pinned by boot_sandbox_test.go. s = stripSandbox(s) + // Strip the native-only --next-id use-`new` hint (another documented + // native/oracle divergence — the oracle emits no hint); its presence is + // pinned by TestNextIDPlainTextEmitsNewHintOnStderr instead. + s = stripNextIDHint(s) if root != "" { if real, err := filepath.EvalSymlinks(root); err == nil && real != root { s = strings.ReplaceAll(s, real, "") diff --git a/skills/first-officer/references/claude-first-officer-runtime.md b/skills/first-officer/references/claude-first-officer-runtime.md index 4714b78a1..e2f2c5aef 100644 --- a/skills/first-officer/references/claude-first-officer-runtime.md +++ b/skills/first-officer/references/claude-first-officer-runtime.md @@ -222,3 +222,7 @@ For the dispatch-idle and idle-hallucination guardrails, see `## Awaiting Comple ## Entity-Body Inspection See `## Probe and Ideation Discipline` in the shared core for the Grep-over-Read rule. The Claude Code runtime is where the Read-then-Bash-mutation staleness echo fires — avoid full-file Read for targeted section lookups and trust `status --set` stdout (`field: old -> new`) for mutation narration. + +## Filing New Entities + +To file a seed task, do NOT use the Write tool to hand-assemble frontmatter after a `status --next-id` preview — that two-step flow can land a stale id when a `--next-id` candidate drifts between the preview and the Write. Use `spacedock new [--folder] [--id-seed S --id-actor A]` via Bash, piping a complete entity stub on stdin (frontmatter with `id` omitted or blank, followed by the brief description body): it mints the id, stamps it into the frontmatter, and atomically writes the stamped entity as flat `.md` in one call (see `## FO Write Scope` in the shared core for the full contract). `--next-id` stays a candidate-preview surface only. `new` writes but does not commit; for split-root state checkouts the FO still does the path-scoped commit + push after `new` (per the shared core's State Management rule). diff --git a/skills/first-officer/references/first-officer-shared-core.md b/skills/first-officer/references/first-officer-shared-core.md index 6656b6a27..511f898e4 100644 --- a/skills/first-officer/references/first-officer-shared-core.md +++ b/skills/first-officer/references/first-officer-shared-core.md @@ -38,7 +38,7 @@ ${SPACEDOCK_BIN:-spacedock} status --workflow-dir {workflow_dir} [--next-id|--ne - `--boot` — startup roll-up (mods, ID style, next-ID candidate, orphans, PR state, dispatchables). Incompatible with `--next`, `--next-id`, `--archived`, `--where`. - `--validate` — run before trusting manually edited workflow state. - `--resolve REF` — deterministic lookup by slug, exact stored ID, or sd-b32 address prefix; `--root` rejects unqualified cross-workflow ambiguity rather than guessing. -- `--next-id` — immediately before filing a new task for `sequential` and `sd-b32` (n/a for `slug`). For `sd-b32`, pass `--id-seed "{slug-or-title}"` and optionally `--id-actor "{actor-or-agent}"` so creation context enters the candidate. +- `--next-id` — preview the next-id candidate for `sequential` and `sd-b32` (n/a for `slug`). For `sd-b32`, pass `--id-seed "{slug-or-title}"` and optionally `--id-actor "{actor-or-agent}"` so creation context enters the candidate. To file a new entity, do NOT pair `--next-id` with a hand-written file — use `spacedock new` (see FO Write Scope), which mints the id and atomically writes the stamped entity in one call. `--next-id` is candidate-preview only. - `--next` / `--where "pr !="` — targeted event-loop queries. The `--set` flag updates entity frontmatter fields: @@ -69,11 +69,11 @@ Distinct from event-loop `status` calls (the `--next` / `--where` the FO runs af README frontmatter `id-style` defines how new entities are addressed: -- `sequential` — `id` is the numeric ID returned by `status --next-id`; counts active plus archived. -- `sd-b32` — `id` is the 24-char SD-B32 (Spacedock Base32, alphabet `0123456789abcdefghjkmnpqrstvwxyz`, SHA-derived) returned by `status --next-id --id-seed "{slug-or-title}"`. Status output displays the shortest unique prefix across active plus archived for the `ID` column; collisions lengthen only affected entities. Duplicate full stored ID is a validation failure. -- `slug` — identity derives from the entity slug. Omit or blank `id` on creation; do not call `status --next-id`. +- `sequential` — `id` is a numeric ID counting active plus archived. `spacedock new ` mints it; `status --next-id` previews the same candidate. +- `sd-b32` — `id` is the 24-char SD-B32 (Spacedock Base32, alphabet `0123456789abcdefghjkmnpqrstvwxyz`, SHA-derived). `spacedock new --id-seed "{slug-or-title}"` mints it; `status --next-id --id-seed "{slug-or-title}"` previews the candidate. Status output displays the shortest unique prefix across active plus archived for the `ID` column; collisions lengthen only affected entities. Duplicate full stored ID is a validation failure. +- `slug` — identity derives from the entity slug. `spacedock new ` files it with a blank `id`; `--next-id` is n/a. -SD-B32 `NEXT_ID` from `--boot` / `--next-id` is a candidate, not a reservation — call `--next-id --id-seed "{slug-or-title}"` immediately before writing the entity. Short sd-b32 references shown to operators are shortest unique prefixes with `MIN_PREFIX: 2`; use `status --resolve` before mutating if the reference came from a human or older transcript. +A `--next-id` candidate (SD-B32 `NEXT_ID` from `--boot` / `--next-id` included) is a preview, not a reservation — between the preview and the write, a peer's filing can shift it, so a hand-assembled file can land a stale id. `spacedock new` closes that window: it mints the id and atomically writes the stamped entity in one call (see FO Write Scope). Short sd-b32 references shown to operators are shortest unique prefixes with `MIN_PREFIX: 2`; use `status --resolve` before mutating if the reference came from a human or older transcript. ## Single-Entity Mode @@ -249,7 +249,7 @@ This matches the escalate-rather-than-guess discipline. A full lock model is out The FO may write these on main — nothing else: - **Entity frontmatter** — via `spacedock status --set` for all field updates -- **New entity files** — seed task creation (frontmatter + brief description body) +- **New entity files** — seed task creation via `spacedock new [--folder] [--id-seed S --id-actor A] < stub`, the blessed atomic-create path. Pipe a complete entity stub on stdin — frontmatter (title, status, and the rest, with `id` omitted or blank) followed by the brief description body — and `new` mints the id, stamps it into that frontmatter, and atomically writes the stamped entity in one call, so no `--next-id` candidate can drift between preview and write. The file lands as flat `.md` (or `/index.md` with `--folder`); the minted id goes in the frontmatter, not the filename. Pass `--id-seed`/`--id-actor` for sd-b32; `new` rejects them for id-style slug. Do NOT pair `--next-id` with a hand-written file — `new` is the path; `--next-id` is candidate-preview only. `new` writes the file but does not commit: for split-root state checkouts the FO still does the path-scoped commit + push after `new` (per State Management's concurrency-safe state-commit rule). - **`### Feedback Cycles` section** — in entity bodies, tracking rejection rounds. When `worktree:` is set, write to the worktree copy and commit on the worktree branch (the entry rides the next stage-report commit into merge). When `worktree:` is empty, write to main. Under stage-worktree stickiness, `worktree:` is empty only before the first worktree-creating dispatch. - **Archive moves** — relocating entity files to `{workflow_dir}/_archive/` - **State-transition commits** — dispatch, advance, merge boundary commits