Skip to content

Latest commit

 

History

History
25 lines (21 loc) · 514 Bytes

File metadata and controls

25 lines (21 loc) · 514 Bytes

Model-Benchmark-Suite

A user-friendly streamlit UI for running various lm_eval supported benchmarks on large language models and to compare them with one another.

Supported Benchmarks:

  • gpqa_diamond_zeroshot
  • gsm8k
  • winogrande
  • arc_challenge
  • hellaswag
  • truthfulqa_mc2
  • mmlu

Quick Start

Clone into the repo:

git clone https://github.com/TeichAI/Model-Benchmark-Suite.git
cd Model-Benchmark-Suite

Install deps and start the app:

pip install -r requirements.txt
streamlit run app.py