Add deterministic prompt seam-contract checks (Tier 1, #23)#37
Merged
Conversation
CDD's slash-commands are agentic prompts whose steps hand artifacts to each other; a one-sided edit can silently strand a downstream prompt-driven step. Add scripts/prompt-seam-check.sh (+ prompt-seam-whitelist.txt), extending the proven command-drift-check.sh pattern with four grep-only, no-LLM seam contracts: - /cdd-* references resolve to an existing command file (catches rename fallout like #27/#31); known non-commands are whitelisted. - the gh_issue_NN branch token is produced (cdd-next-step) and consumed (cdd-pre-pr -> Closes #NN) in agreement. - backticked file paths in the command files + CLAUDE.md + README resolve. - each cdd-*.md keeps its load-bearing headings. CDD-repo-only (not shipped in the template). Wired into CI (template-smoke.yml), /cdd-pre-pr (cdd-only section + checklist), the engineering-practices enforced list, CLAUDE.md, and the roadmap (Phase 11: Tier-1 done; Tiers 2-3 recorded as deferred, per the #23 investigation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) Pre-PR reconciliation: - prompt-seam-check.sh tripped shellcheck SC2016 on the deliberately single-quoted grep/sed backtick patterns in the path-existence check, which would have failed the template-smoke CI shellcheck step. Added a `# shellcheck disable=SC2016` directive before the compound command. - doc/architecture/overview.md: note the prompt-seam check in the scripts/ tree comment and add a sibling paragraph describing it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
drabaioli
commented
Jun 29, 2026
#23) Tier 2 (generalize into a prompt-lint framework) is premature with a single consumer; Tier 3 (LLM-as-judge evals) is rejected on principle — a headless eval can only run by stripping the human checkpoints that define CDD. Neither is planned work, so neither belongs on the roadmap. The rationale now lives in ADR 0002, and the Tier 1 roadmap line + architecture ADR index point at it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ple (#23) The earlier draft argued LLM-as-judge evals were inapplicable because a headless eval "strips the human checkpoints that define CDD." That objection only holds for a naive full end-to-end autonomous run; for the realistic per-prompt behavioral checks, stubbing the human approval is exactly what an eval does and is legitimate. The honest reasons to defer are practical — cost (mitigable with a paths-filtered CI job), judge-calibration effort, flakiness, and ROI on a single-maintainer repo already covered by the seam checks + human dogfood. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The four-seam enumeration was duplicated in three places; the roadmap line now points at the script header and engineering-practices.md for detail and keeps only what-landed + wiring + the ADR 0002 scope pointer. No info lost. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
drabaioli
added a commit
that referenced
this pull request
Jun 30, 2026
…UDE.md Merging main brought in scripts/prompt-seam-check.sh (#37), whose check-1 regex extracts /cdd-state from file paths like tools/cdd-state.sh — the same false positive /cdd-worktree is whitelisted for. Add /cdd-state to the whitelist, and list cdd-state.sh in CLAUDE.md's bash -n line and tools/ row. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
drabaioli
added a commit
that referenced
this pull request
Jun 30, 2026
* Add per-task state record (status + session links) Introduce a small JSON file colocated with the handoff at ~/.cdd/handoffs/<repo>/<branch>.state.json, written by the slash commands at their stage transitions and read by external consumers (cdd-dash). It captures a task's stage + finer status, the PR number once opened, and the append-only chain of Claude Code sessions that worked it (each session link is `claude --resume <id>`, derived from $CLAUDE_CODE_SESSION_ID). The record is advisory and reconstructible: a local cache describing work on this machine, explicitly not a cross-machine transfer mechanism (issue #22) and not an event history (snapshot + timestamps only). It is an additive sibling of the handoff and fits the frozen worktree-helper contract without enlarging it — the helper only deletes it. - Process doc: new §2.13 documenting the schema, stage/status vocabulary, session mechanism, and non-goals; pointers from §2.6/§2.8/§3.3. - Commands (repo + template copies, drift-checked parity): /cdd-next-step seeds the record and instructs the implementation session to write plan_approved / implementation_done; /cdd-merge-base, /cdd-pre-pr (checks_passed + pr_open), and /cdd-process-pr record their transitions. In-worktree writers derive the path at runtime (drift-free); next-step uses its existing hardcoded-slug path. - cdd-worktree.sh: cdd-worktree-done deletes the .state.json sibling alongside the handoff (only when the branch is deleted); the stale sweep in /cdd-next-step removes it too. - settings.json (both copies): allow Write to the handoff dir, rm of *.state.json, and hostname / date -u for stamping. cdd-dash consumption is a mechanical follow-up in that repo; no roadmap change (off-roadmap, intent-driven). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Track per-task state record work as roadmap Phase 13 Pre-PR reconciliation: the state-record feature was off-roadmap intent-driven work with no existing checkbox. Record the landed item plus its named downstream follow-ups (consume the record in cdd-worktree-list/cdd-dash; multi-machine resume, issue #22). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Capture state-write hardening design in Phase 13 roadmap Record the investigation outcome: a dual-mode cdd-state helper for atomic writes plus a gh-pr-create PostToolUse hook for the one outcome-bearing transition. Note that a fully guaranteed UserPromptSubmit/Expansion trigger fires at invocation only, so it cannot capture outcomes (checks_passed, PR number) the model produces mid-run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Route state writes through a cdd-state helper, simplify the schema Replace the prompt-driven JSON hand-editing with tools/cdd-state.sh, a dual-mode self-installing helper (like cdd-worktree.sh): cdd-state seed <branch> creates the record, cdd-state set <stage> [--pr NN] advances it. Writes are atomic (temp-file + rename) and well-formed, so the malformed-JSON / wrong-field failure mode of hand-editing is gone; the model still decides when. Absent jq or an absent record, it no-ops. Simplify the schema per review: squash stage+status into a single stage enum, trim session entries to {id, stage}, and drop the url, machine, created_at, updated_at, recorded_at fields (all speculative / no current consumer). Every command's verbose state paragraph collapses to one cdd-state line, which also removes the process-doc references from the prompts. settings.json swaps the Write/hostname/date allows for Bash(cdd-state*). Addresses PR #38 review: #3494362739, #3494407885, #3494415079, #3494288014, #3494420020, #3494418136, #3494308377, #3493923487, #3494324754. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Rewrite state-record docs: cdd-state helper, slim schema, resume-chain motivation Process doc §2.13 rewritten around the session resume-chain (find/resume the session that worked a branch) as the primary payoff; cdd-dash is one downstream consumer, not the justification. Documents the cdd-state helper, the single stage enum, and the slimmed schema. §2.8 gains a note on the companion helper; the frozen three-command worktree contract is unchanged. Roadmap: mark the helper done (Phase 13a), narrow the remaining hardening to the gh-pr-create PostToolUse hook, and scope the consume item to cdd-dash (cdd-worktree-list infers fine, doesn't need the record). Architecture/feature/BOOTSTRAP updated for the second self-installing helper and the settings allow change. Addresses PR #38 review: #3494396840, #3494425533, #3494430578, #3494436426. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Swap repo settings allow-list to cdd-state (match template) The repo's own .claude/settings.json now mirrors the template: drop the unused Write/hostname/date allows and add Bash(cdd-state*), so this repo's CDD sessions can run the state helper without a prompt. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Whitelist /cdd-state in the prompt-seam check; note the helper in CLAUDE.md Merging main brought in scripts/prompt-seam-check.sh (#37), whose check-1 regex extracts /cdd-state from file paths like tools/cdd-state.sh — the same false positive /cdd-worktree is whitelisted for. Add /cdd-state to the whitelist, and list cdd-state.sh in CLAUDE.md's bash -n line and tools/ row. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Compact the Phase 13 roadmap items Trim the completed state-record item to a summary plus a pointer to process doc §2.13 (which holds the full design/schema, so no info is lost), and tighten the PR-number-hook follow-up. Addresses PR #38 review: #3500597168. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Track process-doc reference trimming as a Phase 11 follow-up Capture the deferred cleanup surfaced in PR #38 review: commands point at the large process doc ~14 times, loading it into context each run. Record as a tracked efficiency task rather than expanding this PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
scripts/prompt-seam-check.sh(+scripts/prompt-seam-whitelist.txt): deterministic, grep-only seam-contract checks over the CDD repo's own prompts (slash-commands and the docs around them), guarding against a one-sided edit silently stranding a downstream prompt-driven step. No LLM, no API key — the same proven shape ascommand-drift-check.sh. CDD-repo-only; not shipped in the template.Four seams are pinned:
/cdd-*reference across the repo's markdown resolves to a.claude/commands/cdd-*.md(or a whitelisted non-command). Catches rename fallout like Prepend all commands with 'cdd-' #27/Dynamic default branch, /cdd-merge-base rename, CamelCase PROJECT_DIR #31.gh_issue_NNtoken produced incdd-next-step.mdis still consumed (as aCloses #NNline) incdd-pre-pr.md.CLAUDE.md, andREADME.mdresolve to real files.cdd-*.mdkeeps its load-bearing headings.Wired into CI (
template-smoke.yml),/cdd-pre-pr, and the engineering-practices enforced list.Wiring & docs
.github/workflows/template-smoke.yml: new prompt-seam check step..claude/commands/cdd-pre-pr.md:cdd-only-fenced "Prompt-seam checks" section.CLAUDE.md,doc/knowledge_base/engineering-practices.md,doc/knowledge_base/roadmap.md,doc/architecture/overview.md: updated to list the new check.Verification
bash -n, command-drift, prompt-seam, install smoke, end-to-end bootstrap smoke, and demo seed-overlay smoke all pass locally.SC2016shellcheck failure in the new script (deliberately literal single-quoted backtick patterns); fixed with a scoped# shellcheck disable=SC2016directive.Verdict + Tier 2/3 deferral rationale: issue #23 comment.
Closes #23
🤖 Generated with Claude Code