Skip to content

feat(routing): syftr-pattern Pareto search over debate-workflow configs#8496

Draft
scarmani wants to merge 1 commit into
mainfrom
feat/debate-config-pareto-search
Draft

feat(routing): syftr-pattern Pareto search over debate-workflow configs#8496
scarmani wants to merge 1 commit into
mainfrom
feat/debate-config-pareto-search

Conversation

@scarmani

Copy link
Copy Markdown
Collaborator

What

A dependency-free, syftr-inspired Pareto search over debate-workflow configurations (rounds × consensus mode × reviewing model-family set), complementing the existing provider-level pareto_frontier in cost_quality_optimizer.

Why

Directly serves the predictable-cost mandate: surfaces the cost/quality/latency-optimal debate configs so the cheap path (e.g. claude + deepseek, 1 round, majority) is used when it suffices and premium configs are reserved for high-stakes decisions.

Design

  • No new deps — bounded enumeration of the small discrete config space + 3-objective non-domination, instead of Optuna/MOTPE (syftr's approach).
  • Injectable objectiveevaluator: DebateConfig -> ConfigEvaluation keeps the costly, credential-bound debate execution outside the searchable core, so search/frontier/recommend logic is fully unit-testable offline.
  • Cost-aware — trials are bounded (max_trials, default 6); each trial is a real debate = real spend. Run once, reuse the recommended config.

Follow-up (separate PR)

A live evaluator that runs a real debate and reads cost from billing.cost_tracker, quality from evaluation.llm_judge, latency from wall-clock — wiring the search to production (needs credentials).

Tests

6 passing: domination, tradeoff non-domination, frontier extraction, bounded trials, constraint-aware recommend.

🤖 Generated with Claude Code

Complements the provider-level pareto_frontier (cost_quality_optimizer) by
searching the debate *workflow* space — rounds × consensus mode × reviewing
family set — for the cost/quality/latency-optimal configs.

Inspired by DataRobot's syftr (Pareto-optimized agentic workflows) but
dependency-free: a bounded enumeration of the small discrete config space +
3-objective non-domination, instead of Optuna/MOTPE. The objective is injectable
(evaluator: DebateConfig -> ConfigEvaluation) so the costly, credential-bound
debate execution lives outside the searchable core — the search/frontier/recommend
logic is fully unit-testable offline.

- DebateSearchSpace (cheapest-leaning defaults, claude+deepseek first)
- ConfigEvaluation.dominates (cost↓ quality↑ latency↓), pareto_optimal
- search_pareto_configs (bounded max_trials — each trial is a real debate = spend)
- SearchResult.recommend (constraint-aware, always returns a usable config)

Tests: domination, tradeoff non-domination, frontier extraction, bounded trials,
constraint-aware recommend. 6 passing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@scarmani

Copy link
Copy Markdown
Collaborator Author

Closing this stale draft as part of the queue-drain close path.

Reason: this PR has been idle as a draft since 2026-06-17 and depends on a separate live, credential-bound follow-up before it has an autonomous merge path. Current live checks show no active non-Codex lane owner, no unread steering, no reviews, and no local worktree to preserve. Closing it reduces queue pressure while leaving the work recoverable.

No branch deletion was requested; branch feat/debate-config-pareto-search is preserved for revival if the operator wants to continue this feature outside the drain loop.

Head closed: fec3f26b47307030a1fd27bf811584f07f23c7b6.

@scarmani scarmani closed this Jun 30, 2026
@scarmani scarmani reopened this Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant