feat(outside-voice): add llm CLI fallback between Codex and Claude subagent#1631
Open
diogolealassis wants to merge 1 commit into
Open
feat(outside-voice): add llm CLI fallback between Codex and Claude subagent#1631diogolealassis wants to merge 1 commit into
diogolealassis wants to merge 1 commit into
Conversation
…bagent When Codex is unavailable or errors, plan-review and second-opinion skills fall back to the Claude subagent. That fallback works, but loses cross-model independence — the subagent is the same model family as the primary review. This adds a middle step: try the `llm` CLI (datasette/llm) with a configured model. When `llm` is installed and the model is registered (e.g. `llm-xai` plugin for Grok), the outside voice goes through a genuinely different model family before falling back to Claude. ## Why this matters today Codex CLI currently fails at startup for any user whose `~/.codex/config.toml` has an xAI provider with `wire_api = "chat"` — which is the default per xAI's setup docs but deprecated per github.com/openai/codex#7782: Error loading config.toml: `wire_api = "chat"` is no longer supported. in `model_providers.xai.wire_api` The error fires at CLI startup BEFORE the provider is selected, so calls targeted at OpenAI also fail. Users in this state currently get the Claude subagent fallback on every outside-voice step. They lose the cross-model value the skill is designed to deliver. Hit twice in a single session 2026-05-20 on /plan-eng-review and /plan-ceo-review. Existing memory `codex-cli-broken-for-xai` documents the upstream Codex side; this PR adds the gstack-side fallback path. ## What changed Two functions in scripts/resolvers/review.ts produce the outside-voice section that gets templated into 4 skill files (office-hours, plan-ceo-review, plan-devex-review, plan-eng-review): - `generateCodexPlanReview` — used by /plan-*-review skills - `generateCodexSecondOpinion` — used by /office-hours Both now produce: Codex → llm → Claude subagent chain. The llm step is gated on: 1. `outside_voice_llm_model` config key (new, default `grok-4-fast`) 2. `command -v llm` succeeds 3. `llm models` lists the configured model If any check fails, the chain falls through cleanly to the existing Claude subagent path. No behavior change for users who don't have llm installed. The default model is `grok-4-fast` rather than `grok-4-latest` because the latter's reasoning mode can hang 2-5 minutes on big prompts (observed 2026-05-18 — the llm stream is empty during hidden reasoning, looks like a hang). `grok-4-fast` is the right shape for routine outside-voice work. ## Config key ``` gstack-config set outside_voice_llm_model grok-4-fast # default gstack-config set outside_voice_llm_model "" # disable, skip llm step gstack-config set outside_voice_llm_model gpt-4o # any model `llm models` lists ``` Empty string skips the llm step entirely (back to Codex → subagent direct). ## File-level diff - `bin/gstack-config` — new key registration in `lookup_default` case statement + CONFIG_HEADER docs section - `scripts/resolvers/review.ts` — llm fallback block inserted between Codex error handling and the "If CODEX_NOT_AVAILABLE" section in both generator functions. Codex cleanup in `generateCodexSecondOpinion` changed to defer `$CODEX_PROMPT_FILE` deletion until end of chain (the llm step reads it). - Regenerated SKILL.md outputs for the 4 affected skills via `bun run gen:skill-docs`. `.gbrain/` outputs also regenerated locally (gitignored, not committed). ## Verification - `bun run scripts/skill-check.ts` — passes, all freshness checks green - Spot-checked generated SKILL.md files: llm fallback section present, bash blocks syntactically correct, `outside_voice_llm_model` referenced consistently - Manually traced the 4-state matrix (codex/llm available × works/errors) through the generated bash to confirm fallthrough behavior ## What this doesn't do - Doesn't change `generateAdversarialStep` (the /ship + /review adversarial flow) — that's a different shape (always-on, not configurable). Could benefit from the same fallback but out of scope for this PR; happy to do as a follow-up if you want it. - Doesn't bump VERSION or add a CHANGELOG entry — leaving that to the maintainer's release workflow. - Doesn't update the `outside_voice_llm_model` to be settable via AskUserQuestion during a first-run flow. The default is sensible enough that most users won't need to touch it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
One config-surface gap: this adds That means Please add |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
When Codex is unavailable or errors, plan-review and second-opinion skills currently fall back to the Claude subagent. That fallback works, but loses cross-model independence — the subagent is the same model family as the primary review. This PR adds a middle step: try the
llmCLI (datasette/llm) with a configured model before falling back to Claude.New chain: Codex → llm → Claude subagent
When
llmis installed and the configured model is registered (e.g. viallm install llm-xaifor Grok), the outside voice goes through a genuinely different model family before falling through to Claude. Whenllmisn't available, behavior is unchanged.Why this matters today
Codex CLI currently fails at startup for users whose
~/.codex/config.tomlhas anxaiprovider withwire_api = "chat"— which is the default per xAI's setup docs but deprecated per openai/codex#7782:The error fires at CLI startup BEFORE the provider is selected, so calls targeted at OpenAI also fail. Users in this state get the Claude subagent fallback on every outside-voice step. They lose the cross-model value the skill is designed to deliver.
Hit twice in a single 2026-05-20 session on
/plan-eng-reviewand/plan-ceo-review. The fix on the user side is one line (wire_api = "responses"), but for users who never had OpenAI auth in the first place (they use Codex/llm because of Grok/Anthropic preference), restoring Codex doesn't actually help — they need a non-Codex cross-model path.How
Two functions in
scripts/resolvers/review.tsproduce the outside-voice section that gets templated into 4 skill files:generateCodexPlanReview— used by/plan-*-reviewskillsgenerateCodexSecondOpinion— used by/office-hoursBoth now produce a Codex → llm → Claude subagent chain. The llm step is gated on:
outside_voice_llm_modelconfig key (defaultgrok-4-fast)command -v llmsucceedsllm modelslists the configured modelIf any check fails, the chain falls through cleanly to the existing Claude subagent path. No behavior change for users who don't have llm installed.
The default
grok-4-fastrather thangrok-4-latestis deliberate — the latter's reasoning mode can hang 2-5 minutes on big prompts (thellmstream is empty during hidden reasoning, looks like a hang).grok-4-fastis the right shape for routine outside-voice work.Configuration
Empty string skips the llm step entirely (Codex → subagent direct).
Files changed
The SKILL.md changes are mechanically regenerated by
bun run gen:skill-docs— the only hand-edits are inbin/gstack-configandscripts/resolvers/review.ts.Verification
bun run scripts/skill-check.tspasses, all freshness checks greenoutside_voice_llm_modelreferenced consistentlygenerateCodexSecondOpinion:$CODEX_PROMPT_FILEdeletion moved from "after Codex" to "after chain end" since llm step reuses the prompt filellminvocation with my Grok key — the resolver functions produce templated prose instructions for the agent, not executable code per se. Happy to do this if you want a smoke-test screenshot.What this doesn't do
generateAdversarialStep(the/ship+/reviewadversarial flow). Different shape (always-on, not configurable). Could benefit from the same fallback but out of scope for this PR; happy to do as a follow-up.VERSIONor add aCHANGELOG.mdentry — leaving those to your release workflow.outside_voice_llm_modelviaAskUserQuestion. The defaultgrok-4-fastis sensible enough that most users won't need to touch it. If you want a first-run prompt I can add it.Generated by
A real
/plan-eng-review+ outside-voice flow on a downstream user project (diogolealassis/gws-platform) hit the Codex failure twice in one session and surfaced the gap. Existing project memorycodex-cli-broken-for-xaiand the newly loggedcodex-cli-global-config-blocks-all-providersdocument the trigger. This PR closes the loop on the gstack side.Happy to break this into smaller commits or split out the
generateCodexSecondOpinioncleanup-sequencing fix if you'd prefer.🤖 Generated with Claude Code