Skip to content

test(canaries): add tier-1 semantic canaries on compiled prompt output#54

Merged
SingleSourceStudios merged 2 commits intoSingularityAI-Dev:mainfrom
antnewman:tests/canaries-tier-1
May 7, 2026
Merged

test(canaries): add tier-1 semantic canaries on compiled prompt output#54
SingleSourceStudios merged 2 commits intoSingularityAI-Dev:mainfrom
antnewman:tests/canaries-tier-1

Conversation

@antnewman
Copy link
Copy Markdown
Contributor

@antnewman antnewman commented May 7, 2026

Closes #47.

Pins 13 executive phrasings the compiler is supposed to produce. If a refactor accidentally drops or alters any of these structures while keeping unit tests green, the corresponding canary fires loudly.

Authored against current compiler output — codifies present behaviour as the contract going forward, not aspirational.

Refinements per your #47 review (all baked in)

Your refinement How it landed
Pin `"must be valid JSON"` (load-bearing in contract narrative) Dedicated canary: output schema produces the load-bearing 'must be valid JSON' phrasing
Strategy preamble — match on strategy name appearing, not specific sentence Canary uses `expect(segment).toContain("react")` — no sentence pinning
Retry-context: section header and previousFailureReason content should break independently Split into two separate canaries — retry context produces a section header AND retry context surfaces previousFailureReason content
Add canary on fenced ```json` block (serialisation contract distinct from surrounding prose) Dedicated canary: output schema is rendered inside a fenced ```json block, not inlined as prose — matches `/\\\\json\\s*\\n[\\s\\S]+?\\n\\\\\/`

Coverage breakdown (13 canaries)

Block Count What's pinned
Output schema rendering 3 Required Output header, must be valid JSON, fenced ```json block
Quality-gate rendering 3 Pre-Response Checklist header, "Before responding, verify:" framing, Markdown checkbox marker
Confidence config 2 Minimum threshold language, escalate_below language
Retry context 2 Section header, previousFailureReason content (independent)
Reasoning strategy preamble 1 Strategy name appearing
Branch context 1 Section header + reason text
Step identity 1 Current Step header with step name

Verification

$ npx vitest run packages/core/canaries.test.ts

 Test Files  1 passed (1)
      Tests  13 passed (13)

Sanity check that canaries fail loudly when they should: I temporarily mutated `compiler.ts` to change the load-bearing phrase from `"must be valid JSON matching"` to `"must be JSON-shaped matching"`. Result:

× output schema produces the load-bearing 'must be valid JSON' phrasing
  AssertionError: expected '...' to contain 'must be valid JSON'
Tests  1 failed | 12 passed (13)

Restored compiler.ts before commit. The canary fires cleanly with a useful assertion message — no false negatives.

Sequencing

This is tier 1 only per your #47 review:

Tier 1 is the right starting point. Eight to twelve canaries is the right sizing; more becomes brittle, fewer leaves coverage holes.

Tier 2 (snapshot tests for conformance fixtures): hold for now. Get tier 1 in, run it for a few weeks, see what slips through.

Tier 2 (fixture-output snapshots) deliberately deferred. Will revisit once tier 1 has had a few weeks of real signal.

Note on the count

Originally proposed 8-12 canaries; landed on 13. The extra one came from your refinement to split retry-context into independent header / content assertions — that's now two canaries instead of one. If thirteen feels like one too many, easiest cut is the quality gate Markdown checkbox canary which is the most tightly coupled to a specific implementation detail.


Summary by cubic

Add tier-1 semantic canary tests for the compiled prompt output to pin 13 contract phrases and catch silent regressions. Implements #47 feedback: match strategy by name, split retry header/content, keep “must be valid JSON”, and require a fenced json code block.

  • New Features
    • Output schema: header, “must be valid JSON”, fenced json code block.
    • Quality gates: checklist header, “Before responding, verify:”, markdown checkbox.
    • Confidence: minimum threshold and escalate_below phrasing.
    • Retry context: header and previousFailureReason (independent).
    • Context identity: reasoning strategy name, Branch Context, Current Step header.

Written for commit e21c7bc. Summary will update on new commits.

Summary by CodeRabbit

  • Tests
    • Added comprehensive test suite to validate compiled prompt output structure and content consistency.

Closes SingularityAI-Dev#47.

Pins 13 executive phrasings the compiler is supposed to produce. If a
refactor accidentally drops or alters any of these structures while
keeping unit tests green, the corresponding canary fires loudly.

Each canary is authored against current compiler output — codifies
present behaviour as the contract going forward, not aspirational.

Refinements per SingularityAI-Dev#47 review:

- 'must be valid JSON' is load-bearing in the contract narrative; pinned
- Strategy preamble matches on the strategy name appearing somewhere,
  not on a specific sentence (lower brittleness, same regression-catching
  power)
- Retry-context section header and previousFailureReason content are
  asserted independently — both halves can break independently
- Output-schema fenced block (json) is asserted as a serialisation
  contract distinct from the surrounding 'must be valid JSON' prose

Coverage: 13 canaries
- 3 on output-schema rendering (header, must-be-valid-JSON, fenced json)
- 3 on quality-gate rendering (header, before-responding-verify, checkbox)
- 2 on confidence config (minimum threshold, escalate_below)
- 2 on retry context (header, previousFailureReason content)
- 1 on reasoning strategy preamble
- 1 on branch context
- 1 on step identity (Current Step header)

Verified loudly-failing on a deliberate mutation: changed
'must be valid JSON' to 'must be JSON-shaped' in compiler.ts; the
load-bearing canary failed cleanly with a clear assertion message.
Restored before commit.

Tier 2 (snapshot tests for conformance fixtures) deliberately deferred
per Rain's SingularityAI-Dev#47 comment: 'Get tier 1 in, run it for a few weeks, see
what slips through.'
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3b131198-0ddb-49d1-9fc6-eaf66b28e2cf

📥 Commits

Reviewing files that changed from the base of the PR and between 5622e4f and e21c7bc.

📒 Files selected for processing (1)
  • packages/core/canaries.test.ts

📝 Walkthrough

Walkthrough

A new canary test suite is added to packages/core/canaries.test.ts to assert semantic invariants in compiled prompt output. The tests construct minimal fixtures and invoke compileStep, then verify specific executive phrasings appear in systemPromptSegment across output schemas, quality gates, confidence configuration, retry context, strategy naming, branch routing, and current step headers.

Changes

Compiled Prompt Canaries

Layer / File(s) Summary
Canary Test Suite
packages/core/canaries.test.ts
New Vitest suite with helper constructors and eight+ deterministic canary assertions pinning down headline compiler-output semantics: required output formatting ("must be valid JSON" and fenced schema), quality gates (section header and markdown checkboxes), confidence thresholds, retry context, strategy naming, branch routing, and current step headers.

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A warren of canaries begins to sing,
Each note a promise the compiler will ring—
"Must be valid JSON," the quality gates align,
No silent regressions when semantics define!
Truth in the phrasings, forever enshrined.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@SingleSourceStudios SingleSourceStudios enabled auto-merge (squash) May 7, 2026 20:48
@SingleSourceStudios SingleSourceStudios merged commit b75fa12 into SingularityAI-Dev:main May 7, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests: missing canaries on compiled prompt semantics — silent regression risk to headline phrasings

2 participants