Test DSL Reference

This page documents the .visor.tests.yaml schema used by the Visor test runner.

version: "1.0"
extends: ".visor.yaml"   # required; base config to run under tests

tests:
  defaults:
    strict: true                # default strict mode
    ai_provider: mock           # force AI provider to mock
    prompt_max_chars: 16000     # truncate captured prompts (optional)
    ai_include_code_context: false  # include PR diff/context in AI prompts (default: false)
    fail_on_unexpected_calls: false # fail if unexpected provider calls occur
    frontends: ["github"]       # enable specific frontends during tests
    github_recorder:            # optional negative modes
      error_code: 0             # e.g., 429
      timeout_ms: 0             # e.g., 1000
    macros:                     # reusable expect blocks (see Reusable Macros)
      basic-check:
        calls:
          - step: overview
            at_least: 1
    llm_judge:                    # defaults for LLM-as-judge assertions
      model: gemini-2.0-flash    # judge model (or VISOR_JUDGE_MODEL env)
      provider: google            # google | openai | anthropic
    # Optional: include/exclude checks by tags (same semantics as main CLI)
    tags: "local,fast"         # or [local, fast]
    exclude_tags: "experimental,slow"  # or [experimental, slow]

  hooks:                        # (optional) lifecycle hooks
    before_all:
      exec: <shell-command>     # runs once before all cases
    after_all:
      exec: <shell-command>     # runs once after all cases (always)
    before_each:
      exec: <shell-command>     # runs before each case
    after_each:
      exec: <shell-command>     # runs after each case (always)

  fixtures: []                  # (optional) suite-level custom fixtures

  cases:
    - name: <string>
      description: <markdown>
      skip: false|true
      ai_include_code_context: false  # per-case override

      hooks:                       # (optional) per-case lifecycle hooks
        before:
          exec: <shell-command>  # runs before this case
        after:
          exec: <shell-command>  # runs after this case (always)
          timeout: 10000         # optional timeout in ms

      # Single-event case
      event: pr_opened | pr_updated | pr_closed | issue_opened | issue_comment | manual
      fixture: <builtin|{ builtin, overrides }>
      env: { <KEY>: <VALUE>, ... }
      mocks: { <step>: <value>, <step>[]: [<value>...] }
      workflow_input: { <key>: <value>, ... }  # inputs for workflow testing
      expect: <expect-block>
      strict: true|false         # overrides defaults.strict
      tags: "security,fast"     # optional per-case include filter
      exclude_tags: "slow"      # optional per-case exclude filter
      github_recorder:           # per-case recorder overrides
        error_code: 429

      # OR conversation sugar (auto-expands to flow)
      conversation:
        - role: user|assistant
          text: <string>
          user: <string>                 # optional — sets conversation.current.user
          mocks: { <step>: <value> }     # per-turn mocks
          expect: <expect-block>         # per-turn assertions
      # OR conversation with config
      conversation:
        transport: slack               # default: slack
        thread_id: <string>            # default: auto-generated
        fixture: <string>              # default: local.minimal
        routing: { max_loops: 0 }      # default: { max_loops: 0 }
        turns:
          - role: user
            text: <string>
            user: <string>             # optional — sets conversation.current.user
            mocks: ...
            expect: ...

      # OR flow case
      flow:
        - name: <string>
          event: ...             # per-stage event and fixture
          fixture: ...
          env: ...
          mocks: ...             # merged with flow-level mocks
          routing:               # per-stage routing overrides
            max_loops: 10
          expect: <expect-block>
          strict: true|false     # per-stage fallback to case/defaults
          tags: "security"       # optional per-stage include filter
          exclude_tags: "slow"   # optional per-stage exclude filter
          github_recorder:       # per-stage recorder overrides
            error_code: 500

Lifecycle Hooks

Hooks let you run shell commands at key points in the test lifecycle — useful for seeding databases, starting servers, or cleaning up test data.

Suite-level hooks

Defined under tests.hooks:

tests:
  hooks:
    before_all:
      exec: npx tsx test-data/seed-db.ts
    after_all:
      exec: npx tsx test-data/clean-db.ts
    before_each:
      exec: npx tsx test-data/reset-state.ts
    after_each:
      exec: npx tsx test-data/cleanup-case.ts
  cases: [...]

Hook	When	Runs
`before_all`	Once before any case	If it fails, all cases are skipped
`after_all`	Once after all cases	Always runs (like `finally`)
`before_each`	Before every case	If it fails, that case is skipped
`after_each`	After every case	Always runs (like `finally`)

Case-level hooks

Defined under case.hooks:

cases:
  - name: update-settlement
    hooks:
      before:
        exec: npx tsx test-data/seed-db.ts --case update-settlement
      after:
        exec: npx tsx test-data/seed-db.ts --clean
        timeout: 10000   # optional, default 30000ms
    event: manual
    mocks: { ... }

Hook	When	Runs
`before`	Before this specific case (after `before_each`)	If it fails, the case is skipped
`after`	After this specific case (before `after_each`)	Always runs (like `finally`)

Hook properties

Property	Type	Required	Description
`exec`	string	yes	Shell command to run
`timeout`	number	no	Timeout in ms (default: 30000)

Execution order

For each case, hooks run in this order:

before_each (suite)
before (case)
test execution
after (case)
after_each (suite)

Hooks inherit all environment variables from the parent process, so seed scripts can use the same DB_PATH, API keys, etc. that your checks use.

Error handling

If before_all fails → all cases are skipped and reported as failed
If before_each or before fails → that case is skipped and reported as failed
after, after_each, and after_all always run, even if the test or a prior hook failed

Fixtures

Built-in GitHub fixtures: gh.pr_open.minimal, gh.pr_sync.minimal, gh.pr_closed.minimal, gh.issue_open.minimal, gh.issue_comment.standard, gh.issue_comment.visor_help, gh.issue_comment.visor_regenerate.
Use overrides to tweak titles, numbers, payload slices.

See Fixtures and Mocks for details.

Mocks

Keys are step names; for forEach children use step[] (e.g., validate-fact[]).
AI mocks may be structured JSON if a schema is configured for the step; otherwise use text and optional fields used by templates.
Command/HTTP mocks emulate provider shape (stdout, exit_code, or HTTP body/status headers) and bypass real execution.

See Fixtures and Mocks for detailed mock examples.

Inline example (AI with schema + list mocks):

mocks:
  overview:
    text: "Overview body"
    tags: { label: feature, review-effort: 2 }
  extract-facts:
    - { id: f1, claim: "max_parallelism defaults to 4" }
    - { id: f2, claim: "Fast mode is enabled by default" }
  validate-fact[]:
    - { fact_id: f1, is_valid: false, correction: "max_parallelism defaults to 3" }
    - { fact_id: f2, is_valid: true }

Expect block

expect:
  use: [macro-name]           # reference macros from tests.defaults.macros

  calls:
    - step: <name>
      exactly|at_least|at_most: <number>
    - provider: github|slack   # provider-level calls
      op: <rest.op>            # e.g., labels.add, chat.postMessage
      exactly|at_least|at_most: <number>
      args: { contains: [..] }   # provider args matching

  no_calls:
    - step: <name>
    - provider: github|slack
      op: <rest.op>

  prompts:
    - step: <name>
      index: first|last|<N>     # default: last
      where:                    # select a prompt from history, then assert
        contains: [..] | not_contains: [..] | matches: <regex>
      contains: [..]
      not_contains: [..]
      matches: <regex>

  outputs:
    - step: <name>
      index: first|last|<N>
      where: { path: <expr>, equals|matches: <v> }
      path: <expr>              # dot/bracket, e.g. tags['review-effort']
      equals: <primitive>
      equalsDeep: <object>
      matches: <regex>
      contains_unordered: [..]

  workflow_output:              # assert on workflow-level outputs (for workflow testing)
    - path: <output-name>       # path into workflow outputs object
      equals: <primitive>
      equalsDeep: <object>
      matches: <regex>
      contains: <string|[..]>   # substring check
      not_contains: <string|[..]>
      contains_unordered: [..]
      where: { path: <expr>, equals|matches: <v> }

  fail:
    message_contains: <string>  # assert overall case failure message

  strict_violation:             # assert strict failure for a missing expect on a step
    for_step: <name>
    message_contains: <string>

  llm_judge:                     # semantic evaluation via LLM
    - step: <name>               # step to evaluate (uses output history)
      path: <expr>               # dot/bracket path into output
      index: first|last|<N>      # which output (default: last)
      turn: <N>|current          # 1-based turn number (conversation sugar only)
      workflow_output: true       # use workflow output instead
      prompt: <string>           # evaluation criteria (required)
      model: <string>            # override judge model
      schema: verdict|<object>   # verdict (default) or custom schema
      assert:                    # field-level assertions on result
        <field>: <expected>

Supported providers for calls and no_calls:

github: GitHub API operations (labels.add, issues.createComment, pulls.createReview, checks.create, checks.update)
slack: Slack API operations (chat.postMessage)

See Assertions for detailed assertion syntax and examples (including LLM-as-judge).

Inline example (calls + prompts + outputs):

expect:
  calls:
    - step: overview
      exactly: 1
    - provider: github
      op: labels.add
      at_least: 1
      args: { contains: [feature] }
  prompts:
    - step: overview
      contains: ["feat:", "diff --git a/"]
  outputs:
    - step: overview
      path: "tags['review-effort']"
      equals: 2

Note on dependencies: test execution honors your base config routing, including depends_on. You can express ANY‑OF groups using pipe syntax in the base config (e.g., depends_on: ["issue-assistant|comment-assistant"]). The runner mixes these with normal ALL‑OF deps.

Conversation Sugar

The conversation: format is a shorthand for multi-turn conversation tests. It auto-expands into flow: stages, building execution_context.conversation.messages from prior turns and inserting mock responses into the history.

- name: multi-turn-test
  strict: false
  conversation:
    - role: user
      text: "What is ticket TT-5000 about?"
      mocks:
        chat: { text: "TT-5000 is about WebSocket support.", intent: chat }
      expect:
        calls:
          - step: chat
            exactly: 1
    - role: user
      text: "What middleware changes are needed?"
      mocks:
        chat: { text: "The middleware changes involve...", intent: chat }
      expect:
        outputs:
          - step: chat
            turn: 1              # reference turn 1's output (1-based)
            path: text
            matches: "TT-5000"
        llm_judge:
          - step: chat
            turn: current        # current turn's output
            path: text
            prompt: Does this discuss middleware specifics?

Key features:

turn: N (1-based) — references the Nth user turn's output across the conversation. Transformed to index: N-1 internally.
turn: current — aliases to index: last
Mock response text is automatically added as assistant messages in subsequent turns' history
Explicit assistant turns can override mock-inferred responses in the history
Config overrides (transport, fixture, routing) via object format with turns: key

Strict mode semantics

When strict: true (default), any executed step must appear in expect.calls with a matching count; otherwise the case/stage fails.
Use no_calls for explicit absence checks.

Selectors and paths

index: first, last, or 0‑based integer.
where: evaluates against the same prompt/output history and selects a single item by content.
path: dot/bracket (supports quoted keys: tags['review-effort']).

CLI shortcuts

Validate only: visor test --validate --config <path>
Run one case: visor test --only label-flow
Run one stage: visor test --only pr-review-e2e-flow#facts-invalid
JSON/JUnit/Markdown reporters: --json, --report junit:<path>, --summary md:<path>

See CLI Reference for all available options.

Reusable Macros

Define reusable assertion blocks in tests.defaults.macros and reference them with use:

tests:
  defaults:
    macros:
      basic-github-check:
        calls:
          - provider: github
            op: checks.create
            at_least: 1
      overview-ran:
        calls:
          - step: overview
            exactly: 1

  cases:
    - name: my-test
      event: pr_opened
      expect:
        use: [basic-github-check, overview-ran]
        calls:
          - step: extra-step
            exactly: 1

Macros are merged with inline expectations, allowing you to compose reusable assertion patterns.

Workflow Testing

Test standalone workflows by providing workflow_input and asserting on workflow_output:

tests:
  cases:
    - name: test-workflow
      event: manual
      workflow_input:
        repo_url: "https://github.com/example/repo"
        branch: "main"
      mocks:
        fetch-data:
          status: 200
          data: { items: [1, 2, 3] }
      expect:
        workflow_output:
          - path: summary
            contains: "completed"
          - path: items_count
            equals: 3

JavaScript in Tests and Routing (run_js, goto_js, value_js, transform_js)

Tags default semantics in tests

The test runner passes tags to the engine using the same rules as the main CLI.
If no tags/exclude_tags are specified anywhere (suite defaults, case, or stage), only untagged checks run by default; tagged checks are skipped. This keeps tests deterministic and fast unless you explicitly opt into groups (for example, github).
To run GitHub‑tagged checks in tests, add:

tests:
  defaults:
    tags: "github"

Visor evaluates your run_js, goto_js, value_js and transform_js snippets inside a hardened JavaScript sandbox. The goal is to provide a great developer experience with modern JS, while keeping the engine safe and deterministic.

What you can use by default (Node 24, ES2023)

Language features: const/let, arrow functions, template strings, destructuring, spread, async/await, Array.prototype.at, findLast/findLastIndex.
Arrays: iteration helpers (map, filter, some, every, reduce, keys/values/entries, forEach), non‑mutating helpers (toReversed, toSorted, toSpliced, with), and flat/flatMap.
Strings: replaceAll, matchAll, trimStart/End, at, repeat, normalize.
Maps/Sets: get/set/has/delete/keys/values/entries/forEach.
Date/RegExp: toISOString, getTime, test, exec.

What remains intentionally restricted

Prototype mutation and reflective escape hatches (e.g., Object.defineProperty, __proto__, setPrototypeOf) are not exposed to sandboxed code.
if: and fail_if: conditions are parsed by a small expression DSL (not full JS). Keep them simple (no optional chaining or nullish coalescing in those), or move complex logic to run_js/goto_js.

Tips

Prefer non‑mutating array helpers (toReversed, toSorted, with) when deriving new arrays for clarity and correctness.
Use Array.prototype.at(-1) to read the last item. Example: const last = (outputs_history['validate-fact'] || []).at(-1) || [];.
For reshaping small maps, Object.entries + Object.fromEntries is concise and readable.

Example: wave-scoped correction gate

run_js: |
  const facts = (outputs_history['extract-facts'] || []).at(-1) || [];
  const ids = facts.map(f => String(f.id || '')).filter(Boolean);
  const vf = outputs_history['validate-fact'] || [];
  const lastItems = vf.filter(v => ids.includes(String((v && v.fact_id) || '')));
  const hasProblems = lastItems.some(v => v.is_valid !== true || v.confidence !== 'high');
  if (!hasProblems) return [];
  return (event && event.name) === 'issue_opened' ? ['issue-assistant'] : ['comment-assistant'];

This evaluates the last extract-facts wave, finds the corresponding validate-fact results, and schedules a single correction pass when any item is invalid or low-confidence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test DSL Reference

Lifecycle Hooks

Suite-level hooks

Case-level hooks

Hook properties

Execution order

Error handling

Fixtures

Mocks

Expect block

Conversation Sugar

Strict mode semantics

Selectors and paths

CLI shortcuts

Reusable Macros

Workflow Testing

JavaScript in Tests and Routing (run_js, goto_js, value_js, transform_js)

Tags default semantics in tests

FilesExpand file tree

dsl-reference.md

Latest commit

History

dsl-reference.md

File metadata and controls

Test DSL Reference

Lifecycle Hooks

Suite-level hooks

Case-level hooks

Hook properties

Execution order

Error handling

Fixtures

Mocks

Expect block

Conversation Sugar

Strict mode semantics

Selectors and paths

CLI shortcuts

Reusable Macros

Workflow Testing

JavaScript in Tests and Routing (run_js, goto_js, value_js, transform_js)

Tags default semantics in tests