test(canaries): add tier-1 semantic canaries on compiled prompt output#54
Conversation
Closes SingularityAI-Dev#47. Pins 13 executive phrasings the compiler is supposed to produce. If a refactor accidentally drops or alters any of these structures while keeping unit tests green, the corresponding canary fires loudly. Each canary is authored against current compiler output — codifies present behaviour as the contract going forward, not aspirational. Refinements per SingularityAI-Dev#47 review: - 'must be valid JSON' is load-bearing in the contract narrative; pinned - Strategy preamble matches on the strategy name appearing somewhere, not on a specific sentence (lower brittleness, same regression-catching power) - Retry-context section header and previousFailureReason content are asserted independently — both halves can break independently - Output-schema fenced block (json) is asserted as a serialisation contract distinct from the surrounding 'must be valid JSON' prose Coverage: 13 canaries - 3 on output-schema rendering (header, must-be-valid-JSON, fenced json) - 3 on quality-gate rendering (header, before-responding-verify, checkbox) - 2 on confidence config (minimum threshold, escalate_below) - 2 on retry context (header, previousFailureReason content) - 1 on reasoning strategy preamble - 1 on branch context - 1 on step identity (Current Step header) Verified loudly-failing on a deliberate mutation: changed 'must be valid JSON' to 'must be JSON-shaped' in compiler.ts; the load-bearing canary failed cleanly with a clear assertion message. Restored before commit. Tier 2 (snapshot tests for conformance fixtures) deliberately deferred per Rain's SingularityAI-Dev#47 comment: 'Get tier 1 in, run it for a few weeks, see what slips through.'
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA new canary test suite is added to ChangesCompiled Prompt Canaries
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b75fa12
into
SingularityAI-Dev:main
Closes #47.
Pins 13 executive phrasings the compiler is supposed to produce. If a refactor accidentally drops or alters any of these structures while keeping unit tests green, the corresponding canary fires loudly.
Authored against current compiler output — codifies present behaviour as the contract going forward, not aspirational.
Refinements per your #47 review (all baked in)
\\\json\\s*\\n[\\s\\S]+?\\n\\\\\/`Coverage breakdown (13 canaries)
Verification
Sanity check that canaries fail loudly when they should: I temporarily mutated `compiler.ts` to change the load-bearing phrase from `"must be valid JSON matching"` to `"must be JSON-shaped matching"`. Result:
Restored compiler.ts before commit. The canary fires cleanly with a useful assertion message — no false negatives.
Sequencing
This is tier 1 only per your #47 review:
Tier 2 (fixture-output snapshots) deliberately deferred. Will revisit once tier 1 has had a few weeks of real signal.
Note on the count
Originally proposed 8-12 canaries; landed on 13. The extra one came from your refinement to split retry-context into independent header / content assertions — that's now two canaries instead of one. If thirteen feels like one too many, easiest cut is the quality gate Markdown checkbox canary which is the most tightly coupled to a specific implementation detail.
Summary by cubic
Add tier-1 semantic canary tests for the compiled prompt output to pin 13 contract phrases and catch silent regressions. Implements #47 feedback: match strategy by name, split retry header/content, keep “must be valid JSON”, and require a fenced json code block.
Written for commit e21c7bc. Summary will update on new commits.
Summary by CodeRabbit