F2-flip — confer + orchestrate + audit honor ctx.cheap_mode#105
Merged
Conversation
The payoff to F2-prep: closes the ~96%-scaffolding-cost leak the
2026-06-07 incident review uncovered in create_cheap. Before F2-flip,
create_cheap's scope-opinion + peer-review steps ran on the full
premium panel regardless of cheap_mode; the cheap routing only kicked
in inside orchestrate's per-node executor. With F2-flip, ctx.cheap_mode
threads through confer, orchestrate, and audit and biases each of
their pickers toward the low/med tier as appropriate.
What lands
src/tools/confer.ts — the substantial change:
Cheap-mode tier-aware panel narrowing right after resolveProviders.
Triggers when ALL three hold:
- opts.ctx?.cheap_mode === true
- caller did NOT explicitly pin a `providers` list (so we
respect explicit caller intent; explicit > inherited)
- selected.length > 1 (no narrowing needed for a single-prov panel)
Path:
1. Build availableSet from selected.
2. selectForDifficulty({pricing, tier: "low", available, weights, allowOnly})
— same ranker the orchestrate cheap-mode branch uses.
3. On pick:
- lookup the base provider in opts.providers
- retargetProvider(base, pick.model) — the picked CHEAP
model, not the provider's default
- replace `selected` with [retargeted]
4. On miss (no pricing / no low-tier candidate):
- keep the full panel
- surface the reason in `cheap_mode_panel.reason` so
operators see we tried
Envelope gains an optional `cheap_mode_panel` object:
{ before, after, picked: {provider, model} | null, reason }
src/tools/orchestrate.ts (1 line of logic):
args["cheap_mode"] default is now `opts.ctx?.cheap_mode ?? false`
instead of plain `false`. Explicit args still win — caller pinning
beats inherited context, so a sub-task can opt out of a cheap parent
if it really needs the full premium DAG.
src/tools/audit.ts (1 line of logic):
Same pattern: args["cheap_mode"] default is
`opts.ctx?.cheap_mode ?? true`. Audit's historical default was
true (cheap auditor); ctx.cheap_mode=false from a premium parent
macro now correctly biases toward a non-cheap auditor.
Tests (full suite green — 1,303 passing, +2 over F2-prep's 1,301)
test/core/call-context-threading.test.ts is rewritten from the
F2-prep "no-op invariant" canary into the F2-flip "ctx is active"
canary. Six tests:
confer (4):
- pricing wired → panel narrows to 1, retarget to cheap model,
envelope surfaces cheap_mode_panel meta
- no ctx → full panel runs, no meta (baseline / regression guard)
- ctx but no pricing → graceful fallback, full panel kept,
meta.reason explains
- caller-supplied providers list overrides ctx.cheap_mode
(explicit > inherited)
orchestrate (2):
- ctx.cheap_mode=true (no args.cheap_mode) → worker calls
retarget to low-tier model
- args.cheap_mode=false explicit override beats ctx.cheap_mode=true
No churn elsewhere: every existing test (confer, orchestrate,
audit, create, parity, etc.) stays green because:
- The narrowing path requires ctx.cheap_mode=true; legacy callers
don't set ctx, so they don't trigger it.
- args["cheap_mode"] explicit values still win — no envelope
drift for any test that pinned cheap_mode directly.
Operator visibility
When create_cheap runs, confer's envelope now carries:
"cheap_mode_panel": {
"before": 3, "after": 1,
"picked": {"provider": "gemini", "model": "gemini-2.5-flash-lite"},
"reason": "ctx.cheap_mode=true; narrowed to cheapest low-tier candidate"
}
Bubbles up through runCreate (review wraps confer, audit reports
its own auditor pick) so the full cost-savings path is auditable
in the resulting envelope without extra instrumentation.
Scoreboard
TS: 1,303 / 1,303
typecheck + build: clean
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Re-opened after #101 auto-closed when its base branch (#100's `feature/f2-prep-callcontext`) was deleted. Same diff as #101, rebased onto current main (which now has F2-prep squashed in).
The payoff to F2-prep (#100, now merged): closes the ~96%-scaffolding-cost leak the 2026-06-07 incident review uncovered in `create_cheap`. `ctx.cheap_mode` now threads through `confer`, `orchestrate`, and `audit`, biasing each picker toward the appropriate tier.
What lands
`src/tools/confer.ts` — the substantial change
Cheap-mode tier-aware panel narrowing right after `resolveProviders`. Triggers when ALL three hold:
`selectForDifficulty({tier: "low", ...})` → retarget base provider to the picked cheap model → replace `selected` with the single retargeted entry. Falls back gracefully (full panel kept + reason surfaced in envelope) when no pricing or no low-tier candidate.
Envelope gains optional `cheap_mode_panel: {before, after, picked, reason}`.
`src/tools/orchestrate.ts` + `src/tools/audit.ts` — 1 line each
`args["cheap_mode"]` default now reads from `opts.ctx?.cheap_mode`. Explicit args still win.
Scoreboard
Test plan
🤖 Generated with Claude Code