feat(codex): advisory cross-model semantic probe in proof-of-work#60
Merged
Conversation
The mechanical battery (typecheck/test/lint) proves a diff is self-consistent, not correct — a bug that compiles and passes the tests you wrote sails through. Adds an opt-in Codex review as a semantic probe on top of the gate, treated like react-doctor/deslop: advisory, reported alongside the verdict, never flips it. Deliberately kept OUT of `bun run proof` — that gate is cheapest-first and runs constantly, so a remote model call would slow every proof. Run on non-trivial diffs only. Gated and fails open.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Adds an optional cross-model check to
proof-of-work. The machine gate (bun run proof) proves a diff compiles, passes its tests, and lints clean — but not that it's correct; a bug that typechecks and passes the tests you wrote goes straight through. This lets you run a Codex review from a different model family as a semantic probe on top of that gate, so a whole class of "green but wrong" diffs gets a second look before a human spends attention on it.It's advisory and opt-in: it never changes the review-ready verdict, and it's silent when Codex isn't available.
Summary
skills/proof-of-work/SKILL.md.codex-run.ts review, treated exactly like the existingreact-doctor/deslopadvisory probes — reported alongside the verdict, never flips it.bun run proof: that gate is cheapest-first and runs constantly, so baking in a remote model call would slow every proof. Documented to run deliberately on non-trivial diffs.Test Plan
bun run lint:skills— 35 skills, 0 errors/warningsproofscript itself