Writing Assertions

Assertions live under expect: and cover several surfaces:

calls: step counts and provider effects (GitHub/Slack ops)
prompts: final AI prompts (post templating/context)
outputs: step outputs with history and selectors
workflow_output: workflow-level outputs (for workflow testing)
no_calls: assert that specific steps or provider ops were NOT called
fail: assert that the case failed with a specific message
strict_violation: assert strict mode failure for a missing expect on a step
llm_judge: semantic evaluation of outputs using an LLM (pass/fail verdicts or structured extraction)
use: reference reusable macros defined in tests.defaults.macros

Calls

expect:
  calls:
    - step: overview
      exactly: 1
    - provider: github
      op: labels.add
      at_least: 1
      args:
        contains: [feature, "review/effort:2"]
    - provider: slack
      op: chat.postMessage
      at_least: 1
      args:
        contains: ["Review complete"]

Counts are consistent everywhere: exactly, at_least, at_most.

Supported providers:

github: GitHub API operations (e.g., labels.add, issues.createComment, pulls.createReview, checks.create)
slack: Slack API operations (e.g., chat.postMessage)

The args field supports:

contains: array of values that must be present (for labels) or substrings (for Slack text)

Prompts

expect:
  prompts:
    - step: overview
      contains: ["feat: add user search", "diff --git a/src/search.ts"]
    - step: comment-assistant
      matches: "(?i)\\/visor\\s+help"
    - step: overview
      # Select the prompt that mentions a specific file
      where:
        contains: ["src/search.ts"]
      contains: ["diff --git a/src/search.ts"]

contains: required substrings
not_contains: forbidden substrings
matches: regex (prefix (?i) for case-insensitive)
index: first | last | N (default: last)
where: selector to choose a prompt from history using contains/not_contains/matches before applying the assertion

Tip: Enable --prompt-max-chars CLI flag or tests.defaults.prompt_max_chars config setting to cap stored prompt size for large diffs.

Outputs

Use path with dot/bracket syntax. You can select by index or by a where probe over the same output history.

expect:
  outputs:
    - step: validate-fact
      index: 0
      path: fact_id
      equals: f1
    - step: validate-fact
      where: { path: fact_id, equals: f2 }
      path: confidence
      equals: high
    - step: aggregate-validations
      path: all_valid
      equals: true

Supported comparators:

equals (primitive)
equalsDeep (structural)
matches (regex)
contains_unordered (array membership ignoring order)

Workflow Outputs

For workflow testing, use workflow_output to assert on workflow-level outputs (defined in the workflow's outputs: section):

expect:
  workflow_output:
    - path: summary
      contains: "Review completed"
    - path: issues_found
      equals: 3
    - path: categories
      contains_unordered: ["security", "performance"]

Supported comparators for workflow outputs:

equals (primitive)
equalsDeep (structural)
matches (regex)
contains (substring check, can be string or array)
not_contains (forbidden substrings)
contains_unordered (array membership ignoring order)
where (selector with path + equals/matches)

Strict mode and "no calls"

Strict mode (default) fails any executed step without a corresponding expect.calls entry. You can also assert absence explicitly:

expect:
  no_calls:
    - provider: github
      op: issues.createComment
    - provider: slack
      op: chat.postMessage
    - step: extract-facts

Failure Assertions

Assert that a test case fails with a specific error message:

expect:
  fail:
    message_contains: "validation failed"

Assert that strict mode caught an unexpected step execution:

expect:
  strict_violation:
    for_step: unexpected-step
    message_contains: "Step executed without expect"

Reusable Macros

Define reusable assertion blocks in tests.defaults.macros and reference them with use:

tests:
  defaults:
    macros:
      basic-github-check:
        calls:
          - provider: github
            op: checks.create
            at_least: 1

  cases:
    - name: my-test
      event: pr_opened
      expect:
        use: [basic-github-check]
        calls:
          - step: overview
            exactly: 1

Macros are merged with inline expectations, allowing you to compose reusable assertion patterns.

LLM Judge

Use llm_judge for semantic evaluation of outputs using an LLM. This is useful when exact string matching or regex isn't enough — for example, verifying that a response is technically accurate, helpful, or follows specific criteria.

Simple verdict (pass/fail)

expect:
  llm_judge:
    - step: chat
      path: text
      prompt: |
        The user asked "How does rate limiting work?"
        Evaluate whether the response:
        1. Actually explains the mechanism (not generic)
        2. Mentions specific technical details
        3. Is well-structured and helpful

The LLM returns { pass: boolean, reason: string }. If pass is false, the test fails with the reason.

Structured extraction with custom schema

Define a custom schema to extract structured fields, then assert on them:

expect:
  llm_judge:
    - step: generate-response
      path: text
      prompt: |
        Analyze this technical response about authentication.
        Extract the requested properties.
      schema:
        properties:
          mentions_oauth:
            type: boolean
            description: "Does the response mention OAuth?"
          mentions_jwt:
            type: boolean
            description: "Does the response mention JWT tokens?"
          quality:
            type: string
            enum: [poor, adequate, good, excellent]
            description: "Overall response quality"
        required: [mentions_oauth, mentions_jwt, quality]
      assert:
        mentions_oauth: true
        quality: "good"

Custom schemas always include pass and reason fields automatically.

Assertion types for `assert`

boolean: field: true or field: false — exact match
string: field: "value" — exact string match
array: field: ["item1", "item2"] — checks that all listed items are present in the array

Field reference

Field	Type	Description
`step`	string	Step name to evaluate (uses output history)
`path`	string	Dot/bracket path into the output
`index`	`first` \| `last` \| number	Which output from history (default: `last`)
`workflow_output`	boolean	Use workflow output instead of step output
`prompt`	string	Required. Evaluation criteria sent to the LLM
`model`	string	Override model (default: from config or env)
`schema`	`verdict` \| object	Schema mode (default: `verdict` = pass/fail)
`assert`	object	Field-level assertions on extracted result

Configuring the judge model

Set defaults for all llm_judge assertions in tests.defaults:

tests:
  defaults:
    llm_judge:
      model: gemini-2.0-flash
      provider: google      # google | openai | anthropic

Or override per-assertion with the model field. The judge uses ProbeAgent internally, so it respects the same environment variables (GOOGLE_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).

You can also set VISOR_JUDGE_MODEL environment variable as a global default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing Assertions

Calls

Prompts

Outputs

Workflow Outputs

Strict mode and "no calls"

Failure Assertions

Reusable Macros

LLM Judge

Simple verdict (pass/fail)

Structured extraction with custom schema

Assertion types for `assert`

Field reference

Configuring the judge model

FilesExpand file tree

assertions.md

Latest commit

History

assertions.md

File metadata and controls

Writing Assertions

Calls

Prompts

Outputs

Workflow Outputs

Strict mode and "no calls"

Failure Assertions

Reusable Macros

LLM Judge

Simple verdict (pass/fail)

Structured extraction with custom schema

Assertion types for assert

Field reference

Configuring the judge model

Assertion types for `assert`