Priority: Medium
README-only docs won't scale. SWE-bench has a full MkDocs site.
Pages Needed
- Getting Started (install, first run)
- Scenarios (how they work, how to write custom ones)
- Dimensions (what each metric means, how to interpret)
- Comparing Configs (workflow for A/B testing model configs)
- API Reference (types, scoring, engine)
- Contributing (scenario format spec)
Priority: Medium
README-only docs won't scale. SWE-bench has a full MkDocs site.
Pages Needed