fix(review): cycle to next model when output fails publish validation#315
Conversation
…validation The model pool ran opencode_review_normalize_output.py on the raw candidate output, but the publish step ANSI-strips the output before running the same normalizer. A model (e.g. deepseek-v3) whose raw output passed but whose ANSI-stripped form failed was recorded as the pool's "success", so the pool stopped instead of trying the next model — and the publish step then failed the whole review with "Selected successful OpenCode output did not include a valid control conclusion" (observed after models were reordered by quota). Strip terminal escapes in normalize_opencode_output before validating, matching the publish step exactly, so the pool only records success for output the publish step will accept and otherwise falls through to the next model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RTAMs4bpSZS77Xe3RQjv9P
OpenCode Review Overview
Pull request overviewOpenCode reviewed the current-head bounded evidence and found no blocking issues. FindingsNo blocking findings. SummaryApproval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Changed-File Evidence Mapflowchart LR
PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
Evidence --> S1["CI script: run_opencode_review_model_pool.sh"]
S1 --> I1["review and security gate shell path"]
I1 --> R1["Review risk: CI script: run_opencode_review_model_pool.sh"]
R1 --> V1["bash -n plus Strix self-test"]
|
There was a problem hiding this comment.
Pull request overview
OpenCode reviewed the current-head bounded evidence and found no blocking issues.
Findings
No blocking findings.
Summary
Approval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Verification posture: CodeGraph evidence was initialized and bounded current-head evidence reviewed for changed-file evidence including scripts/ci/run_opencode_review_model_pool.sh.
Linter/static: workflow/static review evidence is bounded by the current-head GitHub Checks gate and changed-file evidence.
TDD/regression: coverage execution evidence and focused changed hunks were reviewed from bounded-review-evidence.md.
Coverage: coverage execution evidence reports test coverage as not applicable because no supported changed source files or package manifests were found.
Docstring coverage: coverage execution evidence reports docstring coverage as not applicable because no supported changed source files or package manifests were found.
DAG: CodeGraph/source-backed behavior map connects scripts/ci/run_opencode_review_model_pool.sh to the affected review, runtime, or workflow path and required checks.
PoC/execution: coverage-evidence job executed on the current head and reported PASS.
DDD/domain: workflow and repository-governance invariants were reviewed against changed files in bounded evidence.
CDD/context: CodeGraph evidence, changed-file history, and focused hunks were reviewed from bounded-review-evidence.md.
Similar issues: changed-file history evidence was reviewed for comparable local precedents.
Claim/concept check: bounded evidence, repository source, current-head workflow evidence, and, where numeric, scientific, statistical, or literature-backed claims are affected, original-paper/formula evidence and parameter-recovery expectations were used for claims.
Standards search: standards and external-source checks are delegated to configured OpenCode web_search/Context7/DeepWiki sources when applicable; no evidence-backed standards blocker is present in bounded evidence.
Compatibility/convention: changed workflow/script conventions, object naming, and reserved-word safety for schema/API/config/code surfaces were checked in bounded evidence.
Breaking-change/backcompat: deployment evidence and changed-file history were checked for backward-compatibility risk.
Performance: changed surfaces were checked for performance risk in bounded evidence.
Developer experience: changed automation, review, test, setup, and maintenance surfaces were checked for helpful or obstructive DX impact in bounded evidence.
User experience: connected user, operator, API, CLI, documentation, review-comment, status-check, rendering, and workflow-reader behavior was checked for contradictions against code, docs, and tests in bounded evidence.
Visual/DOM: Playwright visual, DOM locator, ARIA snapshot, console, and responsive evidence were checked when a web UI surface was present; for non-web surfaces, API/CLI/log/docs/workflow interaction evidence was reviewed instead.
Accessibility/i18n: accessibility, localization, and human-readable text surfaces were checked where UI, CLI, API message, docs, logs, or review text changed.
Supply-chain/license: dependency, package, model, container, and external-tool changes were checked in bounded evidence.
Packaging: package, build, test, lint, and security contracts were checked in bounded evidence.
Security/privacy: workflow-token, review-gate, and repository-automation security/privacy boundaries were checked in bounded evidence.
- Result: APPROVE
- Reason: The PR addresses a critical issue where the model pool could incorrectly mark a model as successful when its ANSI-stripped output fails validation, preventing the pool from cycling to the next model. The fix ensures consistency between the pool's validation and the publish step's validation.
- Head SHA:
58e66f703eb51a763f7a4e475abf1f3ea48b27c9 - Workflow run: 28740596616
- Workflow attempt: 1
Changed-File Evidence Map
flowchart LR
PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
Evidence --> S1["CI script: run_opencode_review_model_pool.sh"]
S1 --> I1["review and security gate shell path"]
I1 --> R1["Review risk: CI script: run_opencode_review_model_pool.sh"]
R1 --> V1["bash -n plus Strix self-test"]
문제 (순환 버그)
모델 풀은
normalize_output.py를 원본에 돌려 성공 판정하지만, publish 스텝은 ANSI 제거 후 같은 검증을 돌립니다. 원본은 통과하나 ANSI 제거본은 실패하는 모델(예: deepseek-v3)이 풀의 "성공"으로 기록돼 다음 모델로 순환하지 않고 종료 → publish가 "Selected successful OpenCode output did not include a valid control conclusion"로 리뷰 전체 실패. (모델을 쿼터순 재배치한 뒤 표면화됨.)수정
normalize_opencode_output가 검증 전에 먼저 ANSI를 제거(publish와 동일 regex)하도록 정렬. 이제 풀은 publish가 받아들일 출력만 성공으로 기록하고, 아니면 다음 모델(o4-mini 등 형식 안정 모델)로 폴백합니다.범위: 순환 버그만 수정. 모델 순서·타임아웃·ATTEMPTS 등은 미변경.
테스트
shell-syntax + normalize 테스트 26개 통과.
🤖 Generated with Claude Code