This page documents the .visor.tests.yaml schema used by the Visor test runner.
version: "1.0"
extends: ".visor.yaml" # required; base config to run under tests
tests:
defaults:
strict: true # default strict mode
ai_provider: mock # force AI provider to mock
prompt_max_chars: 16000 # truncate captured prompts (optional)
ai_include_code_context: false # include PR diff/context in AI prompts (default: false)
fail_on_unexpected_calls: false # fail if unexpected provider calls occur
frontends: ["github"] # enable specific frontends during tests
github_recorder: # optional negative modes
error_code: 0 # e.g., 429
timeout_ms: 0 # e.g., 1000
macros: # reusable expect blocks (see Reusable Macros)
basic-check:
calls:
- step: overview
at_least: 1
llm_judge: # defaults for LLM-as-judge assertions
model: gemini-2.0-flash # judge model (or VISOR_JUDGE_MODEL env)
provider: google # google | openai | anthropic
# Optional: include/exclude checks by tags (same semantics as main CLI)
tags: "local,fast" # or [local, fast]
exclude_tags: "experimental,slow" # or [experimental, slow]
hooks: # (optional) lifecycle hooks
before_all:
exec: <shell-command> # runs once before all cases
after_all:
exec: <shell-command> # runs once after all cases (always)
before_each:
exec: <shell-command> # runs before each case
after_each:
exec: <shell-command> # runs after each case (always)
fixtures: [] # (optional) suite-level custom fixtures
cases:
- name: <string>
description: <markdown>
skip: false|true
ai_include_code_context: false # per-case override
hooks: # (optional) per-case lifecycle hooks
before:
exec: <shell-command> # runs before this case
after:
exec: <shell-command> # runs after this case (always)
timeout: 10000 # optional timeout in ms
# Single-event case
event: pr_opened | pr_updated | pr_closed | issue_opened | issue_comment | manual
fixture: <builtin|{ builtin, overrides }>
env: { <KEY>: <VALUE>, ... }
mocks: { <step>: <value>, <step>[]: [<value>...] }
workflow_input: { <key>: <value>, ... } # inputs for workflow testing
expect: <expect-block>
strict: true|false # overrides defaults.strict
tags: "security,fast" # optional per-case include filter
exclude_tags: "slow" # optional per-case exclude filter
github_recorder: # per-case recorder overrides
error_code: 429
# OR conversation sugar (auto-expands to flow)
conversation:
- role: user|assistant
text: <string>
user: <string> # optional — sets conversation.current.user
mocks: { <step>: <value> } # per-turn mocks
expect: <expect-block> # per-turn assertions
# OR conversation with config
conversation:
transport: slack # default: slack
thread_id: <string> # default: auto-generated
fixture: <string> # default: local.minimal
routing: { max_loops: 0 } # default: { max_loops: 0 }
turns:
- role: user
text: <string>
user: <string> # optional — sets conversation.current.user
mocks: ...
expect: ...
# OR flow case
flow:
- name: <string>
event: ... # per-stage event and fixture
fixture: ...
env: ...
mocks: ... # merged with flow-level mocks
routing: # per-stage routing overrides
max_loops: 10
expect: <expect-block>
strict: true|false # per-stage fallback to case/defaults
tags: "security" # optional per-stage include filter
exclude_tags: "slow" # optional per-stage exclude filter
github_recorder: # per-stage recorder overrides
error_code: 500Hooks let you run shell commands at key points in the test lifecycle — useful for seeding databases, starting servers, or cleaning up test data.
Defined under tests.hooks:
tests:
hooks:
before_all:
exec: npx tsx test-data/seed-db.ts
after_all:
exec: npx tsx test-data/clean-db.ts
before_each:
exec: npx tsx test-data/reset-state.ts
after_each:
exec: npx tsx test-data/cleanup-case.ts
cases: [...]| Hook | When | Runs |
|---|---|---|
before_all |
Once before any case | If it fails, all cases are skipped |
after_all |
Once after all cases | Always runs (like finally) |
before_each |
Before every case | If it fails, that case is skipped |
after_each |
After every case | Always runs (like finally) |
Defined under case.hooks:
cases:
- name: update-settlement
hooks:
before:
exec: npx tsx test-data/seed-db.ts --case update-settlement
after:
exec: npx tsx test-data/seed-db.ts --clean
timeout: 10000 # optional, default 30000ms
event: manual
mocks: { ... }| Hook | When | Runs |
|---|---|---|
before |
Before this specific case (after before_each) |
If it fails, the case is skipped |
after |
After this specific case (before after_each) |
Always runs (like finally) |
| Property | Type | Required | Description |
|---|---|---|---|
exec |
string | yes | Shell command to run |
timeout |
number | no | Timeout in ms (default: 30000) |
For each case, hooks run in this order:
before_each(suite)before(case)- test execution
after(case)after_each(suite)
Hooks inherit all environment variables from the parent process, so seed scripts can use the same DB_PATH, API keys, etc. that your checks use.
- If
before_allfails → all cases are skipped and reported as failed - If
before_eachorbeforefails → that case is skipped and reported as failed after,after_each, andafter_allalways run, even if the test or a prior hook failed
- Built-in GitHub fixtures:
gh.pr_open.minimal,gh.pr_sync.minimal,gh.pr_closed.minimal,gh.issue_open.minimal,gh.issue_comment.standard,gh.issue_comment.visor_help,gh.issue_comment.visor_regenerate. - Use
overridesto tweak titles, numbers, payload slices.
See Fixtures and Mocks for details.
- Keys are step names; for forEach children use
step[](e.g.,validate-fact[]). - AI mocks may be structured JSON if a schema is configured for the step; otherwise use
textand optional fields used by templates. - Command/HTTP mocks emulate provider shape (
stdout,exit_code, or HTTP body/status headers) and bypass real execution.
See Fixtures and Mocks for detailed mock examples.
Inline example (AI with schema + list mocks):
mocks:
overview:
text: "Overview body"
tags: { label: feature, review-effort: 2 }
extract-facts:
- { id: f1, claim: "max_parallelism defaults to 4" }
- { id: f2, claim: "Fast mode is enabled by default" }
validate-fact[]:
- { fact_id: f1, is_valid: false, correction: "max_parallelism defaults to 3" }
- { fact_id: f2, is_valid: true }expect:
use: [macro-name] # reference macros from tests.defaults.macros
calls:
- step: <name>
exactly|at_least|at_most: <number>
- provider: github|slack # provider-level calls
op: <rest.op> # e.g., labels.add, chat.postMessage
exactly|at_least|at_most: <number>
args: { contains: [..] } # provider args matching
no_calls:
- step: <name>
- provider: github|slack
op: <rest.op>
prompts:
- step: <name>
index: first|last|<N> # default: last
where: # select a prompt from history, then assert
contains: [..] | not_contains: [..] | matches: <regex>
contains: [..]
not_contains: [..]
matches: <regex>
outputs:
- step: <name>
index: first|last|<N>
where: { path: <expr>, equals|matches: <v> }
path: <expr> # dot/bracket, e.g. tags['review-effort']
equals: <primitive>
equalsDeep: <object>
matches: <regex>
contains_unordered: [..]
workflow_output: # assert on workflow-level outputs (for workflow testing)
- path: <output-name> # path into workflow outputs object
equals: <primitive>
equalsDeep: <object>
matches: <regex>
contains: <string|[..]> # substring check
not_contains: <string|[..]>
contains_unordered: [..]
where: { path: <expr>, equals|matches: <v> }
fail:
message_contains: <string> # assert overall case failure message
strict_violation: # assert strict failure for a missing expect on a step
for_step: <name>
message_contains: <string>
llm_judge: # semantic evaluation via LLM
- step: <name> # step to evaluate (uses output history)
path: <expr> # dot/bracket path into output
index: first|last|<N> # which output (default: last)
turn: <N>|current # 1-based turn number (conversation sugar only)
workflow_output: true # use workflow output instead
prompt: <string> # evaluation criteria (required)
model: <string> # override judge model
schema: verdict|<object> # verdict (default) or custom schema
assert: # field-level assertions on result
<field>: <expected>Supported providers for calls and no_calls:
github: GitHub API operations (labels.add,issues.createComment,pulls.createReview,checks.create,checks.update)slack: Slack API operations (chat.postMessage)
See Assertions for detailed assertion syntax and examples (including LLM-as-judge).
Inline example (calls + prompts + outputs):
expect:
calls:
- step: overview
exactly: 1
- provider: github
op: labels.add
at_least: 1
args: { contains: [feature] }
prompts:
- step: overview
contains: ["feat:", "diff --git a/"]
outputs:
- step: overview
path: "tags['review-effort']"
equals: 2Note on dependencies: test execution honors your base config routing, including depends_on. You can express ANY‑OF groups using pipe syntax in the base config (e.g., depends_on: ["issue-assistant|comment-assistant"]). The runner mixes these with normal ALL‑OF deps.
The conversation: format is a shorthand for multi-turn conversation tests. It auto-expands into flow: stages, building execution_context.conversation.messages from prior turns and inserting mock responses into the history.
- name: multi-turn-test
strict: false
conversation:
- role: user
text: "What is ticket TT-5000 about?"
mocks:
chat: { text: "TT-5000 is about WebSocket support.", intent: chat }
expect:
calls:
- step: chat
exactly: 1
- role: user
text: "What middleware changes are needed?"
mocks:
chat: { text: "The middleware changes involve...", intent: chat }
expect:
outputs:
- step: chat
turn: 1 # reference turn 1's output (1-based)
path: text
matches: "TT-5000"
llm_judge:
- step: chat
turn: current # current turn's output
path: text
prompt: Does this discuss middleware specifics?Key features:
turn: N(1-based) — references the Nth user turn's output across the conversation. Transformed toindex: N-1internally.turn: current— aliases toindex: last- Mock response text is automatically added as assistant messages in subsequent turns' history
- Explicit
assistantturns can override mock-inferred responses in the history - Config overrides (transport, fixture, routing) via object format with
turns:key
- When
strict: true(default), any executed step must appear inexpect.callswith a matching count; otherwise the case/stage fails. - Use
no_callsfor explicit absence checks.
index:first,last, or 0‑based integer.where: evaluates against the same prompt/output history and selects a single item by content.path: dot/bracket (supports quoted keys:tags['review-effort']).
- Validate only:
visor test --validate --config <path> - Run one case:
visor test --only label-flow - Run one stage:
visor test --only pr-review-e2e-flow#facts-invalid - JSON/JUnit/Markdown reporters:
--json,--report junit:<path>,--summary md:<path>
See CLI Reference for all available options.
Define reusable assertion blocks in tests.defaults.macros and reference them with use:
tests:
defaults:
macros:
basic-github-check:
calls:
- provider: github
op: checks.create
at_least: 1
overview-ran:
calls:
- step: overview
exactly: 1
cases:
- name: my-test
event: pr_opened
expect:
use: [basic-github-check, overview-ran]
calls:
- step: extra-step
exactly: 1Macros are merged with inline expectations, allowing you to compose reusable assertion patterns.
Test standalone workflows by providing workflow_input and asserting on workflow_output:
tests:
cases:
- name: test-workflow
event: manual
workflow_input:
repo_url: "https://github.com/example/repo"
branch: "main"
mocks:
fetch-data:
status: 200
data: { items: [1, 2, 3] }
expect:
workflow_output:
- path: summary
contains: "completed"
- path: items_count
equals: 3- The test runner passes tags to the engine using the same rules as the main CLI.
- If no tags/exclude_tags are specified anywhere (suite defaults, case, or stage), only untagged checks run by default; tagged checks are skipped. This keeps tests deterministic and fast unless you explicitly opt into groups (for example,
github). - To run GitHub‑tagged checks in tests, add:
tests:
defaults:
tags: "github"Visor evaluates your run_js, goto_js, value_js and transform_js snippets inside a hardened JavaScript sandbox. The goal is to provide a great developer experience with modern JS, while keeping the engine safe and deterministic.
What you can use by default (Node 24, ES2023)
- Language features:
const/let, arrow functions, template strings, destructuring, spread, async/await,Array.prototype.at,findLast/findLastIndex. - Arrays: iteration helpers (
map,filter,some,every,reduce,keys/values/entries,forEach), non‑mutating helpers (toReversed,toSorted,toSpliced,with), andflat/flatMap. - Strings:
replaceAll,matchAll,trimStart/End,at,repeat,normalize. - Maps/Sets:
get/set/has/delete/keys/values/entries/forEach. - Date/RegExp:
toISOString,getTime,test,exec.
What remains intentionally restricted
- Prototype mutation and reflective escape hatches (e.g.,
Object.defineProperty,__proto__,setPrototypeOf) are not exposed to sandboxed code. if:andfail_if:conditions are parsed by a small expression DSL (not full JS). Keep them simple (no optional chaining or nullish coalescing in those), or move complex logic torun_js/goto_js.
Tips
- Prefer non‑mutating array helpers (
toReversed,toSorted,with) when deriving new arrays for clarity and correctness. - Use
Array.prototype.at(-1)to read the last item. Example:const last = (outputs_history['validate-fact'] || []).at(-1) || [];. - For reshaping small maps,
Object.entries+Object.fromEntriesis concise and readable.
Example: wave-scoped correction gate
run_js: |
const facts = (outputs_history['extract-facts'] || []).at(-1) || [];
const ids = facts.map(f => String(f.id || '')).filter(Boolean);
const vf = outputs_history['validate-fact'] || [];
const lastItems = vf.filter(v => ids.includes(String((v && v.fact_id) || '')));
const hasProblems = lastItems.some(v => v.is_valid !== true || v.confidence !== 'high');
if (!hasProblems) return [];
return (event && event.name) === 'issue_opened' ? ['issue-assistant'] : ['comment-assistant'];This evaluates the last extract-facts wave, finds the corresponding validate-fact results, and schedules a single correction pass when any item is invalid or low-confidence.