Skip to content

Add deterministic prompt seam-contract checks (Tier 1, #23)#37

Merged
drabaioli merged 5 commits into
mainfrom
gh_issue_23_prompt_seam_checks
Jun 29, 2026
Merged

Add deterministic prompt seam-contract checks (Tier 1, #23)#37
drabaioli merged 5 commits into
mainfrom
gh_issue_23_prompt_seam_checks

Conversation

@drabaioli

Copy link
Copy Markdown
Owner

Summary

Adds scripts/prompt-seam-check.sh (+ scripts/prompt-seam-whitelist.txt): deterministic, grep-only seam-contract checks over the CDD repo's own prompts (slash-commands and the docs around them), guarding against a one-sided edit silently stranding a downstream prompt-driven step. No LLM, no API key — the same proven shape as command-drift-check.sh. CDD-repo-only; not shipped in the template.

Four seams are pinned:

  1. Command-name resolution — every /cdd-* reference across the repo's markdown resolves to a .claude/commands/cdd-*.md (or a whitelisted non-command). Catches rename fallout like Prepend all commands with 'cdd-' #27/Dynamic default branch, /cdd-merge-base rename, CamelCase PROJECT_DIR #31.
  2. Branch-token contract — the gh_issue_NN token produced in cdd-next-step.md is still consumed (as a Closes #NN line) in cdd-pre-pr.md.
  3. Path-existence — backticked repo-relative file paths in the command files, CLAUDE.md, and README.md resolve to real files.
  4. Required-section presence — each cdd-*.md keeps its load-bearing headings.

Wired into CI (template-smoke.yml), /cdd-pre-pr, and the engineering-practices enforced list.

Wiring & docs

  • .github/workflows/template-smoke.yml: new prompt-seam check step.
  • .claude/commands/cdd-pre-pr.md: cdd-only-fenced "Prompt-seam checks" section.
  • CLAUDE.md, doc/knowledge_base/engineering-practices.md, doc/knowledge_base/roadmap.md, doc/architecture/overview.md: updated to list the new check.

Verification

  • shellcheck (exact CI invocation), bash -n, command-drift, prompt-seam, install smoke, end-to-end bootstrap smoke, and demo seed-overlay smoke all pass locally.
  • Pre-PR review caught a CI-breaking SC2016 shellcheck failure in the new script (deliberately literal single-quoted backtick patterns); fixed with a scoped # shellcheck disable=SC2016 directive.

Verdict + Tier 2/3 deferral rationale: issue #23 comment.

Closes #23

🤖 Generated with Claude Code

drabaioli and others added 2 commits June 29, 2026 12:11
CDD's slash-commands are agentic prompts whose steps hand artifacts to each
other; a one-sided edit can silently strand a downstream prompt-driven step.
Add scripts/prompt-seam-check.sh (+ prompt-seam-whitelist.txt), extending the
proven command-drift-check.sh pattern with four grep-only, no-LLM seam
contracts:

- /cdd-* references resolve to an existing command file (catches rename
  fallout like #27/#31); known non-commands are whitelisted.
- the gh_issue_NN branch token is produced (cdd-next-step) and consumed
  (cdd-pre-pr -> Closes #NN) in agreement.
- backticked file paths in the command files + CLAUDE.md + README resolve.
- each cdd-*.md keeps its load-bearing headings.

CDD-repo-only (not shipped in the template). Wired into CI
(template-smoke.yml), /cdd-pre-pr (cdd-only section + checklist), the
engineering-practices enforced list, CLAUDE.md, and the roadmap (Phase 11:
Tier-1 done; Tiers 2-3 recorded as deferred, per the #23 investigation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
)

Pre-PR reconciliation:
- prompt-seam-check.sh tripped shellcheck SC2016 on the deliberately
  single-quoted grep/sed backtick patterns in the path-existence check,
  which would have failed the template-smoke CI shellcheck step. Added a
  `# shellcheck disable=SC2016` directive before the compound command.
- doc/architecture/overview.md: note the prompt-seam check in the scripts/
  tree comment and add a sibling paragraph describing it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread doc/knowledge_base/roadmap.md Outdated
drabaioli and others added 3 commits June 29, 2026 14:16
#23)

Tier 2 (generalize into a prompt-lint framework) is premature with a single
consumer; Tier 3 (LLM-as-judge evals) is rejected on principle — a headless
eval can only run by stripping the human checkpoints that define CDD. Neither
is planned work, so neither belongs on the roadmap. The rationale now lives in
ADR 0002, and the Tier 1 roadmap line + architecture ADR index point at it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ple (#23)

The earlier draft argued LLM-as-judge evals were inapplicable because a headless
eval "strips the human checkpoints that define CDD." That objection only holds
for a naive full end-to-end autonomous run; for the realistic per-prompt
behavioral checks, stubbing the human approval is exactly what an eval does and
is legitimate. The honest reasons to defer are practical — cost (mitigable with
a paths-filtered CI job), judge-calibration effort, flakiness, and ROI on a
single-maintainer repo already covered by the seam checks + human dogfood.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The four-seam enumeration was duplicated in three places; the roadmap line now
points at the script header and engineering-practices.md for detail and keeps
only what-landed + wiring + the ADR 0002 scope pointer. No info lost.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@drabaioli drabaioli merged commit 3e6010e into main Jun 29, 2026
1 check passed
@drabaioli drabaioli deleted the gh_issue_23_prompt_seam_checks branch June 29, 2026 17:06
drabaioli added a commit that referenced this pull request Jun 30, 2026
…UDE.md

Merging main brought in scripts/prompt-seam-check.sh (#37), whose check-1
regex extracts /cdd-state from file paths like tools/cdd-state.sh — the
same false positive /cdd-worktree is whitelisted for. Add /cdd-state to
the whitelist, and list cdd-state.sh in CLAUDE.md's bash -n line and
tools/ row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
drabaioli added a commit that referenced this pull request Jun 30, 2026
* Add per-task state record (status + session links)

Introduce a small JSON file colocated with the handoff at
~/.cdd/handoffs/<repo>/<branch>.state.json, written by the slash commands at
their stage transitions and read by external consumers (cdd-dash). It captures
a task's stage + finer status, the PR number once opened, and the append-only
chain of Claude Code sessions that worked it (each session link is
`claude --resume <id>`, derived from $CLAUDE_CODE_SESSION_ID).

The record is advisory and reconstructible: a local cache describing work on
this machine, explicitly not a cross-machine transfer mechanism (issue #22) and
not an event history (snapshot + timestamps only). It is an additive sibling of
the handoff and fits the frozen worktree-helper contract without enlarging it —
the helper only deletes it.

- Process doc: new §2.13 documenting the schema, stage/status vocabulary,
  session mechanism, and non-goals; pointers from §2.6/§2.8/§3.3.
- Commands (repo + template copies, drift-checked parity): /cdd-next-step seeds
  the record and instructs the implementation session to write plan_approved /
  implementation_done; /cdd-merge-base, /cdd-pre-pr (checks_passed + pr_open),
  and /cdd-process-pr record their transitions. In-worktree writers derive the
  path at runtime (drift-free); next-step uses its existing hardcoded-slug path.
- cdd-worktree.sh: cdd-worktree-done deletes the .state.json sibling alongside
  the handoff (only when the branch is deleted); the stale sweep in
  /cdd-next-step removes it too.
- settings.json (both copies): allow Write to the handoff dir, rm of
  *.state.json, and hostname / date -u for stamping.

cdd-dash consumption is a mechanical follow-up in that repo; no roadmap change
(off-roadmap, intent-driven).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Track per-task state record work as roadmap Phase 13

Pre-PR reconciliation: the state-record feature was off-roadmap
intent-driven work with no existing checkbox. Record the landed item
plus its named downstream follow-ups (consume the record in
cdd-worktree-list/cdd-dash; multi-machine resume, issue #22).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Capture state-write hardening design in Phase 13 roadmap

Record the investigation outcome: a dual-mode cdd-state helper for
atomic writes plus a gh-pr-create PostToolUse hook for the one
outcome-bearing transition. Note that a fully guaranteed
UserPromptSubmit/Expansion trigger fires at invocation only, so it
cannot capture outcomes (checks_passed, PR number) the model produces
mid-run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Route state writes through a cdd-state helper, simplify the schema

Replace the prompt-driven JSON hand-editing with tools/cdd-state.sh, a
dual-mode self-installing helper (like cdd-worktree.sh): cdd-state seed
<branch> creates the record, cdd-state set <stage> [--pr NN] advances it.
Writes are atomic (temp-file + rename) and well-formed, so the
malformed-JSON / wrong-field failure mode of hand-editing is gone; the
model still decides when. Absent jq or an absent record, it no-ops.

Simplify the schema per review: squash stage+status into a single stage
enum, trim session entries to {id, stage}, and drop the url, machine,
created_at, updated_at, recorded_at fields (all speculative / no current
consumer). Every command's verbose state paragraph collapses to one
cdd-state line, which also removes the process-doc references from the
prompts. settings.json swaps the Write/hostname/date allows for
Bash(cdd-state*).

Addresses PR #38 review: #3494362739, #3494407885, #3494415079,
#3494288014, #3494420020, #3494418136, #3494308377, #3493923487,
#3494324754.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Rewrite state-record docs: cdd-state helper, slim schema, resume-chain motivation

Process doc §2.13 rewritten around the session resume-chain (find/resume
the session that worked a branch) as the primary payoff; cdd-dash is one
downstream consumer, not the justification. Documents the cdd-state
helper, the single stage enum, and the slimmed schema. §2.8 gains a note
on the companion helper; the frozen three-command worktree contract is
unchanged.

Roadmap: mark the helper done (Phase 13a), narrow the remaining
hardening to the gh-pr-create PostToolUse hook, and scope the consume
item to cdd-dash (cdd-worktree-list infers fine, doesn't need the
record). Architecture/feature/BOOTSTRAP updated for the second
self-installing helper and the settings allow change.

Addresses PR #38 review: #3494396840, #3494425533, #3494430578,
#3494436426.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Swap repo settings allow-list to cdd-state (match template)

The repo's own .claude/settings.json now mirrors the template: drop the
unused Write/hostname/date allows and add Bash(cdd-state*), so this
repo's CDD sessions can run the state helper without a prompt.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Whitelist /cdd-state in the prompt-seam check; note the helper in CLAUDE.md

Merging main brought in scripts/prompt-seam-check.sh (#37), whose check-1
regex extracts /cdd-state from file paths like tools/cdd-state.sh — the
same false positive /cdd-worktree is whitelisted for. Add /cdd-state to
the whitelist, and list cdd-state.sh in CLAUDE.md's bash -n line and
tools/ row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Compact the Phase 13 roadmap items

Trim the completed state-record item to a summary plus a pointer to
process doc §2.13 (which holds the full design/schema, so no info is
lost), and tighten the PR-number-hook follow-up.

Addresses PR #38 review: #3500597168.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Track process-doc reference trimming as a Phase 11 follow-up

Capture the deferred cleanup surfaced in PR #38 review: commands point at
the large process doc ~14 times, loading it into context each run. Record
as a tracked efficiency task rather than expanding this PR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate how to "CI/test" LLM prompts

1 participant