fix(model-stats): publish benchmarks for new models by lambertjosh · Pull Request #4051 · Kilo-Org/cloud

lambertjosh · 2026-06-16T13:40:50Z

Summary

This fixes an existing limitation in the terminal bench sync, which is that the core model stats data is not generated unless the model is recommended. Prior to this PR, the kilo bench data would not be added into the returned result for these models.

Since we do not want to limit kilo bench data to only recommended models, we need to also populate an initial model stats entry for these non-recommended models.

Changes

Promote Kimi K2.7 Code as the current recommended Kimi model.
Create canonical model stats rows when Terminal Bench promotions arrive before model catalog synchronization.
Repair existing orphaned promotions and backfill deterministic model identity metadata during later OpenRouter syncs.

Verification

Seeded production-shaped orphaned Kimi K2.7 Terminal Bench promotions locally.
Ran model-eval synchronization and confirmed the rows linked to a newly created model stats record.
Manually confirmed the local models API and CLI model sidebar expose the 50.3% Terminal Bench score.

Visual Changes

Models that were not recommended now show.

Reviewer Notes

The reconciliation runs during regular model-eval syncs, so existing orphaned Terminal Bench rows are repaired without a one-off migration. Non-Terminal-Bench promotions continue to leave unknown models untouched.

kilo-code-bot · 2026-06-16T13:45:26Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Executive Summary

The previously flagged onConflictDoNothing suggestion has been resolved by switching to untargeted conflict handling in b159da70a. No new issues found in the incremental change.

Files Reviewed (6 files)

apps/web/src/lib/ai-gateway/providers/moonshotai.ts
apps/web/src/lib/model-stats/sync-openrouter.ts
packages/worker-utils/src/kilo-model-id.test.ts
packages/worker-utils/src/kilo-model-id.ts
services/model-eval-ingest/src/sync.test.ts
services/model-eval-ingest/src/sync.ts

Previous Review Summary (commit 2baab6f)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit `2baab6f`)

Status: 1 Issue Found | Recommendation: Minor suggestion, safe to merge

Executive Summary

Clean refactor that extracts shared model identity logic and adds orphaned Terminal Bench promotion repair — the one concern is that onConflictDoNothing doesn't cover the unique slug constraint.

Overview

Severity	Count
CRITICAL	0
WARNING	0
SUGGESTION	1

Issue Details (click to expand)

SUGGESTION

File	Line	Issue
`services/model-eval-ingest/src/sync.ts`	143	`onConflictDoNothing` targets only `openrouterId`; unique `slug` column could cause a collision error

Files Reviewed (6 files)

apps/web/src/lib/ai-gateway/providers/moonshotai.ts — 0 issues
apps/web/src/lib/model-stats/sync-openrouter.ts — 0 issues
packages/worker-utils/src/kilo-model-id.test.ts — 0 issues
packages/worker-utils/src/kilo-model-id.ts — 0 issues
services/model-eval-ingest/src/sync.test.ts — 0 issues
services/model-eval-ingest/src/sync.ts — 1 issue

Fix these issues in Kilo Cloud

_{Reviewed by deepseek-v4-pro-20260423 · 196,307 tokens}

_{Review guidance: REVIEW.md from base branch main}

fix(model-stats): publish benchmarks for new models

2baab6f

kilo-code-bot Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread services/model-eval-ingest/src/sync.ts Outdated

fix(model-stats): tolerate identity collisions

b159da7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(model-stats): publish benchmarks for new models#4051

fix(model-stats): publish benchmarks for new models#4051
lambertjosh wants to merge 2 commits into
mainfrom
fix/model-eval-auto-stats

lambertjosh commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

kilo-code-bot Bot commented Jun 16, 2026 •

edited

Loading

Previous review (commit `2baab6f`)

Executive Summary

Overview

SUGGESTION

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lambertjosh commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Visual Changes

Reviewer Notes

Uh oh!

Uh oh!

kilo-code-bot Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Executive Summary

Previous review (commit 2baab6f)

Executive Summary

Overview

SUGGESTION

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lambertjosh commented Jun 16, 2026 •

edited

Loading

kilo-code-bot Bot commented Jun 16, 2026 •

edited

Loading

Previous review (commit `2baab6f`)