Local Model Bench

Local Model Bench is a local bilingual benchmark UI for comparing language models served through LM Studio.

It focuses on practical, objective model quality checks: every benchmark case expects one canonical JSON answer and is scored automatically. The UI supports matched German and English benchmark suites, stores suite metadata with each run, and keeps rankings separated by suite language. It also shows live progress, per-category results, per-test comparisons, speed metrics, quantization metadata, and ranking views.

English UI	German UI

What It Does

Runs matched German and English 200-case benchmark suites against LM Studio models.
Scores model outputs with exact, objective JSON checks.
Switches the UI and active suite between German and English.
Keeps German and English runs separate for comparisons and auto-batch decisions.
Shows live test progress, streaming output, TTFT, prefill timing, tokens per second, and total time.
Compares models by overall score, category score, individual tests, speed, size, quantization, and model type.
Detects reported reasoning support and requests the strongest available reasoning mode.
Supports batch testing of not-yet-tested model variants.
Explicitly unloads the previous LM Studio model before starting the next batch model.
Stores all benchmark results locally.

Requirements

Node.js 18 or newer.
LM Studio with the local server enabled.
One or more local chat models available in LM Studio.

No npm install step is required for the current app.

Quick Start

Start LM Studio.
Enable the local LM Studio server, usually at http://localhost:1234/v1.
Start Local Model Bench.

On Windows:

.\start-local-model-bench.cmd

On Ubuntu/Linux:

chmod +x ./start-local-model-bench.sh
./start-local-model-bench.sh

The starter chooses a free port starting at 8787, starts the UI server, and opens the browser automatically.

Without opening the browser automatically:

node start_ui.mjs --no-open

Using the UI

Load one or more models in LM Studio.
Open Local Model Bench.
Check that the top-right status shows loaded models.
Select a specific model variant, or choose auto to batch-test all not-yet-tested loaded variants.
Keep temperature at 0 and top_p at 1 for deterministic comparisons.
Start the run.

Results are written to the local runs folder. That folder is ignored by Git because it can contain private prompts, model outputs, timings, and model names.

Language And Suite Handling

Local Model Bench ships with two matched suites:

German UI loads the frozen German reference suite.
English UI loads the English suite derived from that reference.

Both suites contain the same 200 case IDs in the same order, with the same categories, tags, difficulty metadata, and points. The English suite changes only language, field names, and language-specific enum values where needed. Numbers, dates, IDs, ordering constraints, and the expected underlying solution are kept aligned.

Runs store the suite language, suite ID, and suite file. Older runs without this metadata are treated as German runs. Ranking views, comparison filters, and auto-batch mode use the currently selected UI language by default, so a German run does not silently compete with an English run.

The language switch is disabled while a benchmark is running. This keeps a live run tied to the suite it started with.

Batch Mode

When auto is selected, the app tests loaded model variants that do not already have a complete run for the current suite language and test selection.

Between two batch runs, Local Model Bench:

sends an unload request to LM Studio for the completed model,
checks LM Studio's loaded_instances,
waits until the model is fully unloaded,
starts the next model only after that confirmation.

This avoids loading a large second model while the previous one still occupies GPU or unified memory.

Benchmark Design

Each suite contains 10 equally weighted categories with 20 tests each:

Instruction & Format
Documents & Context
Data & Tables
Finance & Business
Reasoning & Planning
Coding: Bugfixing
Coding: Review & Architecture
Tool Use & OS
Agentic Behavior & Safety
Multi-Turn & Context

Every default test has exactly one expected canonical JSON result. A semantically similar but structurally different answer is scored as wrong. This keeps model comparisons objective and reproducible.

The German suite is the frozen reference for historical comparability. The English suite is maintained as a paired translation and validated against that reference before release.

Metrics

Local Model Bench stores quality and speed metrics per test and per run:

final score,
category scores,
pass/fail status per individual test,
time to first token,
prompt processing or prefill time,
tokens per second,
total runtime,
input and output tokens,
model format,
quantization,
model size,
model type when inferable.

CLI

Validate the benchmark cases:

node run_eval.mjs --dry-run

Validate the English suite:

node run_eval.mjs --dry-run --lang en

Run a CLI benchmark against the currently loaded model:

node run_eval.mjs

Run the English CLI benchmark:

node run_eval.mjs --lang en

Compare two saved runs:

node compare_runs.mjs runs/<run-a> runs/<run-b> --out runs/comparison.md

The graphical UI is the recommended path for most users.

Privacy

Local Model Bench runs locally and talks to your configured LM Studio server. It does not upload benchmark results anywhere.

Be careful when sharing the runs folder or screenshots. They may include model names, model outputs, timings, prompts, and local configuration details.

Development

Check the project:

npm run check

Rebuild and validate the paired English benchmark suite:

npm run build:cases

Start the UI without opening a browser:

npm start -- --no-open

Maintainer publishing steps are documented in docs/PUBLISHING.md.

Support The Project

If you find Local Model Bench useful, you can support the maintainer through the repository's GitHub Sponsor button once funding links are configured.

Maintainers can configure .github/FUNDING.yml with GitHub Sponsors, Ko-fi, Buy Me a Coffee, PayPal, or another supported funding link.

License

Local Model Bench is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
docs		docs
scripts		scripts
testfaelle		testfaelle
web		web
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
compare_runs.mjs		compare_runs.mjs
compare_runs.py		compare_runs.py
eval_lib.mjs		eval_lib.mjs
package.json		package.json
run_eval.mjs		run_eval.mjs
run_eval.py		run_eval.py
start-local-model-bench.cmd		start-local-model-bench.cmd
start-local-model-bench.sh		start-local-model-bench.sh
start_ui.mjs		start_ui.mjs
ui_server.mjs		ui_server.mjs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Local Model Bench

What It Does

Requirements

Quick Start

Using the UI

Language And Suite Handling

Batch Mode

Benchmark Design

Metrics

CLI

Privacy

Development

Support The Project

License

About

Uh oh!

Releases 2

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Local Model Bench

What It Does

Requirements

Quick Start

Using the UI

Language And Suite Handling

Batch Mode

Benchmark Design

Metrics

CLI

Privacy

Development

Support The Project

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages