Skip to content

Conversation

@mentatbot
Copy link
Contributor

@mentatbot mentatbot bot commented Aug 22, 2025

This PR updates the default-selected models on the main page chart:

  • Remove: GPT-4.1 (openai/gpt-4.1), o3 (openai/o3), Grok 3 (x-ai/grok-3-beta)
  • Add: GPT-5 (medium) (openai/gpt-5), GPT-5 (minimal) (openai/gpt-5minimal), Claude Opus 4.1 (anthropic/claude-opus-4.1)

Rationale:

  • Aligns defaults with the latest models we want to highlight by default
  • Keeps Sonnet 4, Gemini 2.5 Pro 06-05, and Grok 4 as part of the default comparison set

Implementation details:

  • Modify defaultSelectedModels in docs/index.html to reflect the new default set
  • No changes to data aggregation or chart rendering logic

No changes to benchmark data or generated results files.


🤖 This PR was created with Mentat. See my steps and cost here

  • Wake on any new activity.

…GPT-5 (medium + minimal) and Opus 4.1

Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/59642fbf-854d-4924-819c-c2aab6411962

Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
@mentatbot mentatbot bot requested a review from biobootloader August 22, 2025 20:31
mentatbot bot and others added 2 commits August 22, 2025 20:32
…_pages.py to add GPT-5 (medium + minimal) and Opus 4.1; remove GPT-4.1, o3, Grok 3

Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/10f2d4fe-7d66-4b91-a738-ef5687e1793b

Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
…for locodiff-250425

Mentat precommit script passed. Log: https://mentat.ai/gh/AbanteAI/LoCoDiff-bench/log/bb7eda81-0095-47f7-a49b-4df9e19b1b61

Co-authored-by: biobootloader <128252497+biobootloader@users.noreply.github.com>
@biobootloader biobootloader merged commit 96f9cbf into main Aug 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant