EvalTuitor is a terminal-native evaluation runner for LLM applications. It allows developers to define evaluation suites in TOML, execute them against local or remote OpenAI-compatible models, and browse and compare results inside the terminal.
- Performance: Compiled Rust binary with asynchronous parallel execution support.
- Suite Definition: Configuration-driven test suites defined using standard TOML.
- Interactive Interface: Terminal workspace to inspect test runs, failure outputs, and logs.
- Directory Explorer: Integrated directory tree browser to preview files and switch project paths.
- Configuration Management: Interactive terminal configuration editor to adjust models, limits, and API endpoints.
- Run Comparison: Side-by-side comparison interface to examine output changes between historical runs.
- Git Integration: Direct installer for git hooks to automate evaluations prior to committing or pushing code.
- Report Export: Capability to export detailed run summaries to Markdown files.
A Rust toolchain must be installed on your system.
Clone the repository and compile the release binary:
cargo build --releaseThe resulting binary will be generated at ./target/release/evaltuitor.
Initialize a new project context in the current directory:
./target/release/evaltuitor --initThis creates:
evaltuitor.toml(Configuration file)evals/example.toml(Sample evaluation suite)
Execute all evaluation suites in the evals/ directory and launch the interface:
./target/release/evaltuitorExecute a specific suite:
./target/release/evaltuitor evals/suite.tomlOverride the default model endpoint:
./target/release/evaltuitor --model openai/gpt-4oRun evaluations and output results to stdout without launching the interface:
./target/release/evaltuitor --no-tuiGlobal and provider configurations are set in the project root:
[defaults]
model = "ollama/llama3.1"
temperature = 0.0
max_tokens = 2048
timeout_secs = 30
parallelism = 4
[providers.ollama]
base_url = "http://localhost:11434"
[providers.openai]
api_key_env = "OPENAI_API_KEY"
[providers.vllm]
base_url = "http://localhost:8000"Suite files should be stored under the evals/ directory:
[suite]
name = "Summarization Suite"
description = "Verifies LLM summary behaviors"
[config]
model = "openai/gpt-4o"
temperature = 0.2
[[tests]]
id = "summary-length-check"
prompt = "Summarize this: {{input}}"
input = "Evaluating AI systems requires structured testing..."
assert.type = "contains-all"
assert.values = ["AI", "testing"]contains-all/contains-any/contains-none(substring checks)exact-match(string equivalence)regex(regular expression verification)llm-judge(semantic evaluation scoring using a rubric prompt)max-length/min-length(character length boundaries)json-schema(structured output JSON validation)custom(arbitrary external shell command execution)
j / k or Arrow keys: Navigate suites and test cases.Tab: Cycle focus between Suites list, Tests list, and Details pane.f: Toggle the display of failed tests only./: Filter test cases by ID or output string.s: Toggle the configuration editor.o: Toggle the project and directory explorer.C: Open the run comparison list.E: Export current view details to Markdown.R: Re-execute failed test cases.?: Toggle the help overlay.q / Esc: Close overlays or exit the application.
j / k or Arrow keys: Move selection up and down.Space / Enter: Expand or collapse the selected directory tree node.a: Activate and open the selected folder (or parent folder) as the project.Backspace / u: Navigate to the parent directory.[ / ]: Scroll up and down in the preview content pane.q / Esc: Exit the project explorer.
Manage git pre-commit, post-merge, and pre-push hooks directly:
# Install hooks in the current repository
./target/release/evaltuitor --install-hooks
# List installed hooks
./target/release/evaltuitor --list-hooks
# Uninstall hooks
./target/release/evaltuitor --uninstall-hooks