feat(auto-routing): auto-sync decider benchmark models by iscekic · Pull Request #4078 · Kilo-Org/cloud

iscekic · 2026-06-17T16:37:23Z

Summary

Adds automatic decider benchmark candidate syncing from Kilo Bench cost data. The benchmark worker now runs a daily scheduled sync, persists synced auto decider models and exclusions in D1, preserves per-model reasoning effort, and starts a decider benchmark when the effective model set changes.

Adds configurable auto-decider cost bounds to the benchmark config, defaulting to $15-$25 for existing configs. The web candidate endpoint filters terminal-bench models using the saved bounds, and the admin UI exposes compact controls for changing the min/max average run cost.

Also chunks carried summary inserts when starting a run so larger carried decider result sets stay under D1 bind limits.

Verification

Ran the local auto-routing stack in tmux with Next.js, auto-routing, and auto-routing-benchmark. Seeded local model_stats with two in-band terminal-bench models and one out-of-band control, seeded D1 prior decider summaries, saved benchmark config with $12-$24 auto bounds, and triggered the local scheduled handler. The sync added only the two in-band models, started and completed a decider run from carried summaries, and published a routing table with all 18 routes.

Visual Changes

N/A

Reviewer Notes

The local E2E uses carried prior summaries so the scheduled rerun exercises the start/complete/publish path without invoking real CLI benchmark containers or consuming benchmark credits.

kilo-code-bot · 2026-06-17T16:42:56Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Executive Summary

Incremental (7 files): Replaces unsafe as casts on reasoning_effort with runtime safeParse validation and strips the kilo/ gateway prefix from auto-decider candidate model IDs. Clean, well-tested improvements.

Incremental Files Reviewed (7 files)

apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
services/auto-routing-benchmark/src/config.test.ts
services/auto-routing-benchmark/src/config.ts
services/auto-routing-benchmark/src/db.ts
services/auto-routing-benchmark/src/reasoning-effort.ts
services/auto-routing-benchmark/src/run.ts

Carried Forward (unchanged since last review, 20 files)

apps/web/src/app/admin/api/auto-routing/benchmark-config/route.test.ts
apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
apps/web/src/lib/ai-gateway/auto-routing-benchmark-admin-client.test.ts
packages/auto-routing-contracts/src/benchmark.ts
packages/auto-routing-contracts/src/contracts.test.ts
services/auto-routing-benchmark/migrations/0003_chunky_ogun.sql
services/auto-routing-benchmark/migrations/meta/0003_snapshot.json
services/auto-routing-benchmark/migrations/meta/_journal.json
services/auto-routing-benchmark/src/admin.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.ts
services/auto-routing-benchmark/src/db-replace-summaries.test.ts
services/auto-routing-benchmark/src/db-schema.ts
services/auto-routing-benchmark/src/index.ts
services/auto-routing-benchmark/wrangler.jsonc
services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
services/auto-routing-benchmark/migrations/meta/0002_snapshot.json

Previous Review Summaries (3 snapshots, latest commit 9a3bf34)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit `9a3bf34`)

Status: No Issues Found | Recommendation: Merge

Executive Summary

Incremental (2 files): Removes the 5-attempt minimum gate from decider candidate filtering, allowing models with fewer benchmark attempts to be included as auto-decider candidates. Test updated to cover the single-attempt case.

Incremental Files Reviewed (2 files)

apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts

Carried Forward (unchanged since last review, 20 files)

apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
apps/web/src/lib/ai-gateway/auto-routing-benchmark-admin-client.test.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
packages/auto-routing-contracts/src/benchmark.ts
packages/auto-routing-contracts/src/contracts.test.ts
services/auto-routing-benchmark/migrations/0003_chunky_ogun.sql
services/auto-routing-benchmark/migrations/meta/0003_snapshot.json
services/auto-routing-benchmark/migrations/meta/_journal.json
services/auto-routing-benchmark/src/admin.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.ts
services/auto-routing-benchmark/src/config.test.ts
services/auto-routing-benchmark/src/config.ts
services/auto-routing-benchmark/src/db-replace-summaries.test.ts
services/auto-routing-benchmark/src/db-schema.ts
services/auto-routing-benchmark/src/db.ts
services/auto-routing-benchmark/wrangler.jsonc
services/auto-routing-benchmark/src/index.ts
services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
services/auto-routing-benchmark/migrations/meta/0002_snapshot.json

Previous review (commit `4b5212f`)

Status: No Issues Found | Recommendation: Merge

Executive Summary

Incremental changes add configurable auto decider cost bounds and fix D1 SQL variable limits with summary chunking — clean additions with comprehensive test coverage across contracts, API, sync, and UI layers.

Incremental Files Reviewed (17 changed files)

apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
apps/web/src/lib/ai-gateway/auto-routing-benchmark-admin-client.test.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
packages/auto-routing-contracts/src/benchmark.ts
packages/auto-routing-contracts/src/contracts.test.ts
services/auto-routing-benchmark/migrations/0003_chunky_ogun.sql
services/auto-routing-benchmark/migrations/meta/0003_snapshot.json
services/auto-routing-benchmark/migrations/meta/_journal.json
services/auto-routing-benchmark/src/admin.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.ts
services/auto-routing-benchmark/src/config.test.ts
services/auto-routing-benchmark/src/config.ts
services/auto-routing-benchmark/src/db-replace-summaries.test.ts
services/auto-routing-benchmark/src/db-schema.ts
services/auto-routing-benchmark/src/db.ts
services/auto-routing-benchmark/wrangler.jsonc

Carried Forward (unchanged since last review, 3 files)

services/auto-routing-benchmark/src/index.ts
services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
services/auto-routing-benchmark/migrations/meta/0002_snapshot.json

Previous review (commit `5dd959a`)

Status: No Issues Found | Recommendation: Merge

Executive Summary

Well-structured feature that adds automatic decider benchmark model syncing from Kilo Bench cost data with proper backward compatibility, exclusion management, and comprehensive test coverage across all layers.

Files Reviewed (20 files)

apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
packages/auto-routing-contracts/src/benchmark.ts
packages/auto-routing-contracts/src/contracts.test.ts
services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
services/auto-routing-benchmark/migrations/meta/0002_snapshot.json
services/auto-routing-benchmark/migrations/meta/_journal.json
services/auto-routing-benchmark/src/admin.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.test.ts
services/auto-routing-benchmark/src/auto-decider-sync.ts
services/auto-routing-benchmark/src/config.test.ts
services/auto-routing-benchmark/src/config.ts
services/auto-routing-benchmark/src/db-schema.ts
services/auto-routing-benchmark/src/db.ts
services/auto-routing-benchmark/src/index.ts
services/auto-routing-benchmark/wrangler.jsonc

_{Reviewed by deepseek-v4-pro-20260423 · 437,576 tokens}

_{Review guidance: REVIEW.md from base branch main}

feat(auto-routing): auto-sync decider benchmark models

5dd959a

iscekic self-assigned this Jun 17, 2026

feat(auto-routing): configure auto decider cost bounds

4b5212f

iscekic requested a review from pandemicsyn June 17, 2026 17:44

iscekic added 3 commits June 17, 2026 19:53

fix(auto-routing): allow one-attempt auto decider candidates

9a3bf34

fix(auto-routing): validate persisted reasoning effort

76f7b69

fix(auto-routing): normalize kilo bench model ids

2c8d393

pandemicsyn approved these changes Jun 17, 2026

View reviewed changes

iscekic merged commit 83f1cec into main Jun 17, 2026
58 checks passed

iscekic deleted the feat/auto-routing-auto-decider-models branch June 17, 2026 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(auto-routing): auto-sync decider benchmark models#4078

feat(auto-routing): auto-sync decider benchmark models#4078
iscekic merged 5 commits into
mainfrom
feat/auto-routing-auto-decider-models

iscekic commented Jun 17, 2026 •

edited

Loading

Uh oh!

kilo-code-bot Bot commented Jun 17, 2026 •

edited

Loading

Previous review (commit `9a3bf34`)

Executive Summary

Previous review (commit `4b5212f`)

Executive Summary

Previous review (commit `5dd959a`)

Executive Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iscekic commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Visual Changes

Reviewer Notes

Uh oh!

kilo-code-bot Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Executive Summary

Previous review (commit 9a3bf34)

Executive Summary

Previous review (commit 4b5212f)

Executive Summary

Previous review (commit 5dd959a)

Executive Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iscekic commented Jun 17, 2026 •

edited

Loading

kilo-code-bot Bot commented Jun 17, 2026 •

edited

Loading

Previous review (commit `9a3bf34`)

Previous review (commit `4b5212f`)

Previous review (commit `5dd959a`)