Skip to content

afftab/evaltuitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EvalTuitor

EvalTuitor is a terminal-native evaluation runner for LLM applications. It allows developers to define evaluation suites in TOML, execute them against local or remote OpenAI-compatible models, and browse and compare results inside the terminal.


Technical Features

  • Performance: Compiled Rust binary with asynchronous parallel execution support.
  • Suite Definition: Configuration-driven test suites defined using standard TOML.
  • Interactive Interface: Terminal workspace to inspect test runs, failure outputs, and logs.
  • Directory Explorer: Integrated directory tree browser to preview files and switch project paths.
  • Configuration Management: Interactive terminal configuration editor to adjust models, limits, and API endpoints.
  • Run Comparison: Side-by-side comparison interface to examine output changes between historical runs.
  • Git Integration: Direct installer for git hooks to automate evaluations prior to committing or pushing code.
  • Report Export: Capability to export detailed run summaries to Markdown files.

Installation

Prerequisites

A Rust toolchain must be installed on your system.

Build Step

Clone the repository and compile the release binary:

cargo build --release

The resulting binary will be generated at ./target/release/evaltuitor.


Project Initialization

Initialize a new project context in the current directory:

./target/release/evaltuitor --init

This creates:

  • evaltuitor.toml (Configuration file)
  • evals/example.toml (Sample evaluation suite)

Command Line Usage

Execute all evaluation suites in the evals/ directory and launch the interface:

./target/release/evaltuitor

Execute a specific suite:

./target/release/evaltuitor evals/suite.toml

Override the default model endpoint:

./target/release/evaltuitor --model openai/gpt-4o

Run evaluations and output results to stdout without launching the interface:

./target/release/evaltuitor --no-tui

Configuration Schema (evaltuitor.toml)

Global and provider configurations are set in the project root:

[defaults]
model = "ollama/llama3.1"
temperature = 0.0
max_tokens = 2048
timeout_secs = 30
parallelism = 4

[providers.ollama]
base_url = "http://localhost:11434"

[providers.openai]
api_key_env = "OPENAI_API_KEY"

[providers.vllm]
base_url = "http://localhost:8000"

Test Suite Schema

Suite files should be stored under the evals/ directory:

[suite]
name = "Summarization Suite"
description = "Verifies LLM summary behaviors"

[config]
model = "openai/gpt-4o"
temperature = 0.2

[[tests]]
id = "summary-length-check"
prompt = "Summarize this: {{input}}"
input = "Evaluating AI systems requires structured testing..."
assert.type = "contains-all"
assert.values = ["AI", "testing"]

Supported Assertion Types

  • contains-all / contains-any / contains-none (substring checks)
  • exact-match (string equivalence)
  • regex (regular expression verification)
  • llm-judge (semantic evaluation scoring using a rubric prompt)
  • max-length / min-length (character length boundaries)
  • json-schema (structured output JSON validation)
  • custom (arbitrary external shell command execution)

Keyboard Reference

Standard Interface

  • j / k or Arrow keys: Navigate suites and test cases.
  • Tab: Cycle focus between Suites list, Tests list, and Details pane.
  • f: Toggle the display of failed tests only.
  • /: Filter test cases by ID or output string.
  • s: Toggle the configuration editor.
  • o: Toggle the project and directory explorer.
  • C: Open the run comparison list.
  • E: Export current view details to Markdown.
  • R: Re-execute failed test cases.
  • ?: Toggle the help overlay.
  • q / Esc: Close overlays or exit the application.

Directory Explorer & Project Opener

  • j / k or Arrow keys: Move selection up and down.
  • Space / Enter: Expand or collapse the selected directory tree node.
  • a: Activate and open the selected folder (or parent folder) as the project.
  • Backspace / u: Navigate to the parent directory.
  • [ / ]: Scroll up and down in the preview content pane.
  • q / Esc: Exit the project explorer.

Git Hooks Setup

Manage git pre-commit, post-merge, and pre-push hooks directly:

# Install hooks in the current repository
./target/release/evaltuitor --install-hooks

# List installed hooks
./target/release/evaltuitor --list-hooks

# Uninstall hooks
./target/release/evaltuitor --uninstall-hooks

About

evals in TUI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages