Skip to content

feat(auto-routing): auto-sync decider benchmark models#4078

Merged
iscekic merged 5 commits into
mainfrom
feat/auto-routing-auto-decider-models
Jun 17, 2026
Merged

feat(auto-routing): auto-sync decider benchmark models#4078
iscekic merged 5 commits into
mainfrom
feat/auto-routing-auto-decider-models

Conversation

@iscekic

@iscekic iscekic commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds automatic decider benchmark candidate syncing from Kilo Bench cost data. The benchmark worker now runs a daily scheduled sync, persists synced auto decider models and exclusions in D1, preserves per-model reasoning effort, and starts a decider benchmark when the effective model set changes.

Adds configurable auto-decider cost bounds to the benchmark config, defaulting to $15-$25 for existing configs. The web candidate endpoint filters terminal-bench models using the saved bounds, and the admin UI exposes compact controls for changing the min/max average run cost.

Also chunks carried summary inserts when starting a run so larger carried decider result sets stay under D1 bind limits.

Verification

Ran the local auto-routing stack in tmux with Next.js, auto-routing, and auto-routing-benchmark. Seeded local model_stats with two in-band terminal-bench models and one out-of-band control, seeded D1 prior decider summaries, saved benchmark config with $12-$24 auto bounds, and triggered the local scheduled handler. The sync added only the two in-band models, started and completed a decider run from carried summaries, and published a routing table with all 18 routes.

Visual Changes

N/A

Reviewer Notes

The local E2E uses carried prior summaries so the scheduled rerun exercises the start/complete/publish path without invoking real CLI benchmark containers or consuming benchmark credits.

@iscekic iscekic self-assigned this Jun 17, 2026
@kilo-code-bot

kilo-code-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Executive Summary

Incremental (7 files): Replaces unsafe as casts on reasoning_effort with runtime safeParse validation and strips the kilo/ gateway prefix from auto-decider candidate model IDs. Clean, well-tested improvements.

Incremental Files Reviewed (7 files)
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
  • services/auto-routing-benchmark/src/config.test.ts
  • services/auto-routing-benchmark/src/config.ts
  • services/auto-routing-benchmark/src/db.ts
  • services/auto-routing-benchmark/src/reasoning-effort.ts
  • services/auto-routing-benchmark/src/run.ts
Carried Forward (unchanged since last review, 20 files)
  • apps/web/src/app/admin/api/auto-routing/benchmark-config/route.test.ts
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
  • apps/web/src/lib/ai-gateway/auto-routing-benchmark-admin-client.test.ts
  • packages/auto-routing-contracts/src/benchmark.ts
  • packages/auto-routing-contracts/src/contracts.test.ts
  • services/auto-routing-benchmark/migrations/0003_chunky_ogun.sql
  • services/auto-routing-benchmark/migrations/meta/0003_snapshot.json
  • services/auto-routing-benchmark/migrations/meta/_journal.json
  • services/auto-routing-benchmark/src/admin.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.ts
  • services/auto-routing-benchmark/src/db-replace-summaries.test.ts
  • services/auto-routing-benchmark/src/db-schema.ts
  • services/auto-routing-benchmark/src/index.ts
  • services/auto-routing-benchmark/wrangler.jsonc
  • services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
  • services/auto-routing-benchmark/migrations/meta/0002_snapshot.json
Previous Review Summaries (3 snapshots, latest commit 9a3bf34)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit 9a3bf34)

Status: No Issues Found | Recommendation: Merge

Executive Summary

Incremental (2 files): Removes the 5-attempt minimum gate from decider candidate filtering, allowing models with fewer benchmark attempts to be included as auto-decider candidates. Test updated to cover the single-attempt case.

Incremental Files Reviewed (2 files)
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
Carried Forward (unchanged since last review, 20 files)
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
  • apps/web/src/lib/ai-gateway/auto-routing-benchmark-admin-client.test.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
  • packages/auto-routing-contracts/src/benchmark.ts
  • packages/auto-routing-contracts/src/contracts.test.ts
  • services/auto-routing-benchmark/migrations/0003_chunky_ogun.sql
  • services/auto-routing-benchmark/migrations/meta/0003_snapshot.json
  • services/auto-routing-benchmark/migrations/meta/_journal.json
  • services/auto-routing-benchmark/src/admin.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.ts
  • services/auto-routing-benchmark/src/config.test.ts
  • services/auto-routing-benchmark/src/config.ts
  • services/auto-routing-benchmark/src/db-replace-summaries.test.ts
  • services/auto-routing-benchmark/src/db-schema.ts
  • services/auto-routing-benchmark/src/db.ts
  • services/auto-routing-benchmark/wrangler.jsonc
  • services/auto-routing-benchmark/src/index.ts
  • services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
  • services/auto-routing-benchmark/migrations/meta/0002_snapshot.json

Previous review (commit 4b5212f)

Status: No Issues Found | Recommendation: Merge

Executive Summary

Incremental changes add configurable auto decider cost bounds and fix D1 SQL variable limits with summary chunking — clean additions with comprehensive test coverage across contracts, API, sync, and UI layers.

Incremental Files Reviewed (17 changed files)
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
  • apps/web/src/lib/ai-gateway/auto-routing-benchmark-admin-client.test.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
  • packages/auto-routing-contracts/src/benchmark.ts
  • packages/auto-routing-contracts/src/contracts.test.ts
  • services/auto-routing-benchmark/migrations/0003_chunky_ogun.sql
  • services/auto-routing-benchmark/migrations/meta/0003_snapshot.json
  • services/auto-routing-benchmark/migrations/meta/_journal.json
  • services/auto-routing-benchmark/src/admin.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.ts
  • services/auto-routing-benchmark/src/config.test.ts
  • services/auto-routing-benchmark/src/config.ts
  • services/auto-routing-benchmark/src/db-replace-summaries.test.ts
  • services/auto-routing-benchmark/src/db-schema.ts
  • services/auto-routing-benchmark/src/db.ts
  • services/auto-routing-benchmark/wrangler.jsonc
Carried Forward (unchanged since last review, 3 files)
  • services/auto-routing-benchmark/src/index.ts
  • services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
  • services/auto-routing-benchmark/migrations/meta/0002_snapshot.json

Previous review (commit 5dd959a)

Status: No Issues Found | Recommendation: Merge

Executive Summary

Well-structured feature that adds automatic decider benchmark model syncing from Kilo Bench cost data with proper backward compatibility, exclusion management, and comprehensive test coverage across all layers.

Files Reviewed (20 files)
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.test.ts
  • apps/web/src/app/admin/auto-routing/BenchmarksSection.tsx
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.test.ts
  • apps/web/src/app/api/internal/auto-routing-benchmark/decider-candidates/route.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.test.ts
  • apps/web/src/lib/model-stats/auto-routing-decider-candidates.ts
  • packages/auto-routing-contracts/src/benchmark.ts
  • packages/auto-routing-contracts/src/contracts.test.ts
  • services/auto-routing-benchmark/migrations/0002_magical_wendell_rand.sql
  • services/auto-routing-benchmark/migrations/meta/0002_snapshot.json
  • services/auto-routing-benchmark/migrations/meta/_journal.json
  • services/auto-routing-benchmark/src/admin.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.test.ts
  • services/auto-routing-benchmark/src/auto-decider-sync.ts
  • services/auto-routing-benchmark/src/config.test.ts
  • services/auto-routing-benchmark/src/config.ts
  • services/auto-routing-benchmark/src/db-schema.ts
  • services/auto-routing-benchmark/src/db.ts
  • services/auto-routing-benchmark/src/index.ts
  • services/auto-routing-benchmark/wrangler.jsonc

Reviewed by deepseek-v4-pro-20260423 · 437,576 tokens

Review guidance: REVIEW.md from base branch main

@iscekic iscekic requested a review from pandemicsyn June 17, 2026 17:44
@iscekic iscekic merged commit 83f1cec into main Jun 17, 2026
58 checks passed
@iscekic iscekic deleted the feat/auto-routing-auto-decider-models branch June 17, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants