feat(routing): syftr-pattern Pareto search over debate-workflow configs#8496
Draft
scarmani wants to merge 1 commit into
Draft
feat(routing): syftr-pattern Pareto search over debate-workflow configs#8496scarmani wants to merge 1 commit into
scarmani wants to merge 1 commit into
Conversation
Complements the provider-level pareto_frontier (cost_quality_optimizer) by searching the debate *workflow* space — rounds × consensus mode × reviewing family set — for the cost/quality/latency-optimal configs. Inspired by DataRobot's syftr (Pareto-optimized agentic workflows) but dependency-free: a bounded enumeration of the small discrete config space + 3-objective non-domination, instead of Optuna/MOTPE. The objective is injectable (evaluator: DebateConfig -> ConfigEvaluation) so the costly, credential-bound debate execution lives outside the searchable core — the search/frontier/recommend logic is fully unit-testable offline. - DebateSearchSpace (cheapest-leaning defaults, claude+deepseek first) - ConfigEvaluation.dominates (cost↓ quality↑ latency↓), pareto_optimal - search_pareto_configs (bounded max_trials — each trial is a real debate = spend) - SearchResult.recommend (constraint-aware, always returns a usable config) Tests: domination, tradeoff non-domination, frontier extraction, bounded trials, constraint-aware recommend. 6 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 17, 2026
Collaborator
Author
|
Closing this stale draft as part of the queue-drain close path. Reason: this PR has been idle as a draft since 2026-06-17 and depends on a separate live, credential-bound follow-up before it has an autonomous merge path. Current live checks show no active non-Codex lane owner, no unread steering, no reviews, and no local worktree to preserve. Closing it reduces queue pressure while leaving the work recoverable. No branch deletion was requested; branch Head closed: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A dependency-free, syftr-inspired Pareto search over debate-workflow configurations (rounds × consensus mode × reviewing model-family set), complementing the existing provider-level
pareto_frontierincost_quality_optimizer.Why
Directly serves the predictable-cost mandate: surfaces the cost/quality/latency-optimal debate configs so the cheap path (e.g.
claude + deepseek, 1 round, majority) is used when it suffices and premium configs are reserved for high-stakes decisions.Design
evaluator: DebateConfig -> ConfigEvaluationkeeps the costly, credential-bound debate execution outside the searchable core, so search/frontier/recommend logic is fully unit-testable offline.max_trials, default 6); each trial is a real debate = real spend. Run once, reuse the recommended config.Follow-up (separate PR)
A live
evaluatorthat runs a real debate and reads cost frombilling.cost_tracker, quality fromevaluation.llm_judge, latency from wall-clock — wiring the search to production (needs credentials).Tests
6 passing: domination, tradeoff non-domination, frontier extraction, bounded trials, constraint-aware recommend.
🤖 Generated with Claude Code