Skip to content

feat(codex): wire cross-model verification into verify, ship, fix, nuclear-review#59

Merged
arzafran merged 2 commits into
mainfrom
feat/codex-verifier-integration
Jun 19, 2026
Merged

feat(codex): wire cross-model verification into verify, ship, fix, nuclear-review#59
arzafran merged 2 commits into
mainfrom
feat/codex-verifier-integration

Conversation

@arzafran

@arzafran arzafran commented Jun 19, 2026

Copy link
Copy Markdown
Member

What this does

Wires the Codex bridge into the six skills where Claude reviews its own work — verify, ship, fix, nuclear-review, review, and refactor. In each, an independent review from a different model family (Codex) runs alongside Claude's, so a blind spot Claude shares with itself gets a second pair of eyes. It's opt-in by availability: if Codex isn't installed or logged in, every one of these skills runs exactly as before, Claude-only.

The point is the self-preferential-bias surface — Claude judging Claude tends to wave its own output through. A different family doesn't share those blind spots, so this is where the bridge earns its keep. Codex is the roomy pool (600–3,000 messages / 5h on Pro), so quota is not a reason to be sparing — the high-frequency review skills are wired in too.

Summary

  • verify — adds a non-Claude finder (codex-verifier) in parallel with the Claude Finder, merged into the issue superset before the Adversary stage. The skill's whole reason to exist is breaking self-review bias; its three-agent panel was 100% Claude.
  • ship — parallel cross-model review at Step 6, right before a commit lands. A Claude/Codex disagreement on a Critical/HIGH becomes a human gate before Step 7.
  • fix — cross-model review alongside the reviewer for security-sensitive fixes (auth, crypto, permissions, input validation).
  • nuclear-review — a whole-codebase ask pass folded into Phase 3 synthesis. Deliberately not the diff-scoped review — a repo-wide audit has no diff, so it uses codex-run.ts ask pointed at the tree.
  • review — runs as the reviewer agent (has Bash, no Agent tool), so it calls codex-run.ts review directly and reconciles findings into the verdict (HIGH→Critical, MEDIUM→Warning, LOW→Suggestion).
  • refactorcodex-verifier in parallel with the reviewer; refactors are a top source of subtle regressions, exactly where a second family pays off.

Diff-based skills with Agent access use the codex-verifier agent; review (a forked reviewer) and nuclear-review (diff-less) call codex-run.ts directly. Every insertion states the bridge is gated and fails open.

Test Plan

  • bun run lint:skills — 35 skills, 0 errors/warnings
  • Diff scoped to the six SKILL.md files only
  • Reviewer: confirm nuclear-review uses ask (correct for a diff-less audit) and review uses the direct codex-run.ts call (the reviewer fork has no Agent tool)

arzafran added 2 commits June 19, 2026 13:19
…clear-review

Slots the Codex bridge into the four skills where Claude judges its own output —
the self-preferential-bias surface cross-model review exists to break. Each
insertion is gated and fails open: if the bridge is unavailable, the skill runs
Claude-only.

- verify: adds a non-Claude finder (codex-verifier) in parallel with the Claude
  Finder, merged into the superset before the Adversary stage — a different
  model family in an otherwise all-Claude panel.
- ship: parallel cross-model review at Step 6, right before commit; a Claude/Codex
  disagreement on a Critical becomes a human gate before Step 7.
- fix: cross-model review alongside the reviewer for security-sensitive fixes
  (auth, crypto, permissions, input validation).
- nuclear-review: a whole-codebase 'ask' pass (NOT diff-scoped review — there is
  no diff in a repo-wide audit) folded into Phase 3 synthesis as a second opinion.

Verifier uses the codex-verifier agent for diff-based skills; nuclear-review uses
codex-run.ts ask directly since it audits the whole tree, not a change.
Codex is the roomy pool (600–3000 msgs / 5h on Pro) — quota was never the
constraint, so the two high-frequency review skills get the cross-model pass as
well. review runs as the reviewer agent (Bash, no Agent tool), so it calls
codex-run.ts review directly; refactor is in orchestration mode and uses the
codex-verifier agent in parallel — refactors are a top source of subtle
regressions, exactly where a second model family pays off. Both gated, fail open.
@arzafran arzafran merged commit ac5f632 into main Jun 19, 2026
15 checks passed
@arzafran arzafran deleted the feat/codex-verifier-integration branch June 19, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant