Common issues and how to fix them quickly.
- The runner forces
ai_provider=mockby default, but if a step overrides the provider to a real one, it can execute with keys from your env. - Add a mock under
mocks:for that step to silence the warning, or keepai_provider: mockand avoid overriding providers in tests.
- The step may not have executed in that stage (strict mode would also flag it).
- Your
indexmight be out of bounds (e.g.,lastwhen there are no prompts). Drop theindexor usewhereto select by content. - Ensure the step is an AI provider (prompts are only captured for AI steps).
- Check that your AI mock provides the fields your provider expects to do the op (e.g.,
overview.tags.label). - Use
VISOR_DEBUG=trueto print provider debug. Confirm the step ran and the recorder printed the intended op.
- Add the missing step to
expect.callswith a count. If the step should not run, add it toexpect.no_callsand fix your config to avoid running it. - Temporarily set
strict: falseat the case or stage level to iterate quickly (not recommended long term).
- Provide array mocks for
extract-factsand per‑call list mocks forvalidate-fact[]. - Remember that aggregation uses the latest validation wave (size inferred from the last
extract-factsoutput).
- Use
--prompt-max-chars(ortests.defaults.prompt_max_chars) to truncate captured text for assertions. - In config, you can keep AI focused with
skip_code_contextif available, or provide narrower mocks in tests.
- Use the
--debugCLI flag or setVISOR_DEBUG=true. - To isolate the problem, run a single stage:
visor test --only case#stage(name substring match) or--only case#N(1-based index). - Example:
visor test --debug --only pr-review-e2e-flow#facts-invalid
Symptom: a short pause appears before the runner prints the coverage table, often preceded by a line like:
⏭ on_finish: no result found for "extract-facts" — skip
What it was: the engine always entered the on_finish scan, even when none of the forEach parents had produced any results in the current run. Internally it waited for a (now redundant) scan window to complete.
What we changed: we added an early return in handleOnFinishHooks when there are zero forEach parents with results in the current grouped run. This preserves behavior (no hooks to run) and removes the delay entirely.
Downsides: none functionally. The only trade‑off is that debug visibility is slightly reduced in that specific “no parents ran” case; enable VISOR_DEBUG=true if you need to trace the discovery step regardless.
Guarantee: if a check executed and defines on_finish, on_finish still executes for that check once its forEach finishes. The early return only triggers when no eligible parent produced results in the run.
- Ensure mock shapes match what the step schema expects. For AI steps with a schema, provide structured fields directly (not wrapped in
returns:). - For command/HTTP provider steps, include
stdout,exit_code,status, orbodyas appropriate. - Review Fixtures and Mocks for detailed mock examples.
- Each stage computes coverage as a delta from the previous stage. If a step from a prior stage executes again, you need to account for it in the current stage's
expect.calls. - Check that mocks are merged correctly: stage mocks override flow-level mocks (
{...flow.mocks, ...stage.mocks}). - Use
--only case#stageto isolate and debug a single stage.
- Ensure
ai_provider: mockis set intests.defaultsfor offline, fast execution. - Use
--max-parallelto run cases concurrently within a suite. - Use
--max-suiteswhen running multiple test files. - Consider
--prompt-max-charsto reduce memory usage for large diffs.
- Check for environment variable differences. CI auto-detects and adjusts some defaults (e.g.,
VISOR_TEST_PROMPT_MAX_CHARS). - Ensure fixtures don't depend on local file paths or network access.
- Run with
--debugin CI to capture more diagnostic output.
- The judge uses ProbeAgent which needs API keys in the environment. Set
GOOGLE_API_KEY,OPENAI_API_KEY, orANTHROPIC_API_KEYdepending on your provider. - Configure the provider in test defaults:
tests.defaults.llm_judge.provider: google - The default model is
gemini-2.0-flash. Override withVISOR_JUDGE_MODELenv var ortests.defaults.llm_judge.model.
- The step name doesn't match any executed step. Check your mock structure — with
max_loops: 0andchat[]mocks, outputs land atchatnotchat.generate-response. - Use
--debugto see which steps executed and their output history keys.
- LLM judgments are inherently non-deterministic. Use broader assertions (e.g., check
mentions_redis: truerather than exact string matches). - For enum fields, consider if the LLM might reasonably choose a different value (e.g., "moderate" vs "deep").
- The
pass/reasonfields are always present — check thereasonin test output for the LLM's explanation.
- The LLM didn't return valid JSON. This is rare with schema-constrained output but can happen with some models.
- Try a more capable model (e.g.,
gemini-2.0-flashorgpt-4o). - The judge handles markdown-wrapped JSON (
\``json ... ````) automatically.
- Getting Started - Introduction to the test framework
- DSL Reference - Complete test YAML schema
- Assertions - Available assertion types
- Fixtures and Mocks - Managing test data
- Flows - Multi-stage test flows
- Cookbook - Copy-pasteable test recipes
- CLI - Test runner command line options
- CI Integration - Running tests in CI pipelines