Assertions live under expect: and cover several surfaces:
calls: step counts and provider effects (GitHub/Slack ops)prompts: final AI prompts (post templating/context)outputs: step outputs with history and selectorsworkflow_output: workflow-level outputs (for workflow testing)no_calls: assert that specific steps or provider ops were NOT calledfail: assert that the case failed with a specific messagestrict_violation: assert strict mode failure for a missing expect on a stepllm_judge: semantic evaluation of outputs using an LLM (pass/fail verdicts or structured extraction)use: reference reusable macros defined intests.defaults.macros
expect:
calls:
- step: overview
exactly: 1
- provider: github
op: labels.add
at_least: 1
args:
contains: [feature, "review/effort:2"]
- provider: slack
op: chat.postMessage
at_least: 1
args:
contains: ["Review complete"]Counts are consistent everywhere: exactly, at_least, at_most.
Supported providers:
github: GitHub API operations (e.g.,labels.add,issues.createComment,pulls.createReview,checks.create)slack: Slack API operations (e.g.,chat.postMessage)
The args field supports:
contains: array of values that must be present (for labels) or substrings (for Slack text)
expect:
prompts:
- step: overview
contains: ["feat: add user search", "diff --git a/src/search.ts"]
- step: comment-assistant
matches: "(?i)\\/visor\\s+help"
- step: overview
# Select the prompt that mentions a specific file
where:
contains: ["src/search.ts"]
contains: ["diff --git a/src/search.ts"]contains: required substringsnot_contains: forbidden substringsmatches: regex (prefix(?i)for case-insensitive)index:first|last| N (default: last)where: selector to choose a prompt from history usingcontains/not_contains/matchesbefore applying the assertion
Tip: Enable --prompt-max-chars CLI flag or tests.defaults.prompt_max_chars config setting to cap stored prompt size for large diffs.
Use path with dot/bracket syntax. You can select by index or by a where probe over the same output history.
expect:
outputs:
- step: validate-fact
index: 0
path: fact_id
equals: f1
- step: validate-fact
where: { path: fact_id, equals: f2 }
path: confidence
equals: high
- step: aggregate-validations
path: all_valid
equals: trueSupported comparators:
equals(primitive)equalsDeep(structural)matches(regex)contains_unordered(array membership ignoring order)
For workflow testing, use workflow_output to assert on workflow-level outputs (defined in the workflow's outputs: section):
expect:
workflow_output:
- path: summary
contains: "Review completed"
- path: issues_found
equals: 3
- path: categories
contains_unordered: ["security", "performance"]Supported comparators for workflow outputs:
equals(primitive)equalsDeep(structural)matches(regex)contains(substring check, can be string or array)not_contains(forbidden substrings)contains_unordered(array membership ignoring order)where(selector withpath+equals/matches)
Strict mode (default) fails any executed step without a corresponding expect.calls entry. You can also assert absence explicitly:
expect:
no_calls:
- provider: github
op: issues.createComment
- provider: slack
op: chat.postMessage
- step: extract-factsAssert that a test case fails with a specific error message:
expect:
fail:
message_contains: "validation failed"Assert that strict mode caught an unexpected step execution:
expect:
strict_violation:
for_step: unexpected-step
message_contains: "Step executed without expect"Define reusable assertion blocks in tests.defaults.macros and reference them with use:
tests:
defaults:
macros:
basic-github-check:
calls:
- provider: github
op: checks.create
at_least: 1
cases:
- name: my-test
event: pr_opened
expect:
use: [basic-github-check]
calls:
- step: overview
exactly: 1Macros are merged with inline expectations, allowing you to compose reusable assertion patterns.
Use llm_judge for semantic evaluation of outputs using an LLM. This is useful when exact string matching or regex isn't enough — for example, verifying that a response is technically accurate, helpful, or follows specific criteria.
expect:
llm_judge:
- step: chat
path: text
prompt: |
The user asked "How does rate limiting work?"
Evaluate whether the response:
1. Actually explains the mechanism (not generic)
2. Mentions specific technical details
3. Is well-structured and helpfulThe LLM returns { pass: boolean, reason: string }. If pass is false, the test fails with the reason.
Define a custom schema to extract structured fields, then assert on them:
expect:
llm_judge:
- step: generate-response
path: text
prompt: |
Analyze this technical response about authentication.
Extract the requested properties.
schema:
properties:
mentions_oauth:
type: boolean
description: "Does the response mention OAuth?"
mentions_jwt:
type: boolean
description: "Does the response mention JWT tokens?"
quality:
type: string
enum: [poor, adequate, good, excellent]
description: "Overall response quality"
required: [mentions_oauth, mentions_jwt, quality]
assert:
mentions_oauth: true
quality: "good"Custom schemas always include pass and reason fields automatically.
- boolean:
field: trueorfield: false— exact match - string:
field: "value"— exact string match - array:
field: ["item1", "item2"]— checks that all listed items are present in the array
| Field | Type | Description |
|---|---|---|
step |
string | Step name to evaluate (uses output history) |
path |
string | Dot/bracket path into the output |
index |
first | last | number |
Which output from history (default: last) |
workflow_output |
boolean | Use workflow output instead of step output |
prompt |
string | Required. Evaluation criteria sent to the LLM |
model |
string | Override model (default: from config or env) |
schema |
verdict | object |
Schema mode (default: verdict = pass/fail) |
assert |
object | Field-level assertions on extracted result |
Set defaults for all llm_judge assertions in tests.defaults:
tests:
defaults:
llm_judge:
model: gemini-2.0-flash
provider: google # google | openai | anthropicOr override per-assertion with the model field. The judge uses ProbeAgent internally, so it respects the same environment variables (GOOGLE_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).
You can also set VISOR_JUDGE_MODEL environment variable as a global default.