Priority: Future
SWE-bench has swebench.com with public rankings.
Proposal
Accept JSON result submissions via PR to a results/ directory. GitHub Pages site renders a leaderboard table sorted by the 4 dimensions. Each entry links to the full JSON with model card.
Categories:
- By model (qwen3-coder, llama, mistral, etc.)
- By hardware (GB10, A100, H100, consumer GPU)
- By config (FP8, NVFP4, speculative, etc.)
Priority: Future
SWE-bench has swebench.com with public rankings.
Proposal
Accept JSON result submissions via PR to a
results/directory. GitHub Pages site renders a leaderboard table sorted by the 4 dimensions. Each entry links to the full JSON with model card.Categories: