fix(review): cycle to next model when output fails publish validation by seonghobae · Pull Request #315 · ContextualWisdomLab/.github

seonghobae · 2026-07-05T12:21:22Z

문제 (순환 버그)

모델 풀은 normalize_output.py를 원본에 돌려 성공 판정하지만, publish 스텝은 ANSI 제거 후 같은 검증을 돌립니다. 원본은 통과하나 ANSI 제거본은 실패하는 모델(예: deepseek-v3)이 풀의 "성공"으로 기록돼 다음 모델로 순환하지 않고 종료 → publish가 "Selected successful OpenCode output did not include a valid control conclusion"로 리뷰 전체 실패. (모델을 쿼터순 재배치한 뒤 표면화됨.)

수정

normalize_opencode_output가 검증 전에 먼저 ANSI를 제거(publish와 동일 regex)하도록 정렬. 이제 풀은 publish가 받아들일 출력만 성공으로 기록하고, 아니면 다음 모델(o4-mini 등 형식 안정 모델)로 폴백합니다.

범위: 순환 버그만 수정. 모델 순서·타임아웃·ATTEMPTS 등은 미변경.

테스트

shell-syntax + normalize 테스트 26개 통과.

🤖 Generated with Claude Code

…validation The model pool ran opencode_review_normalize_output.py on the raw candidate output, but the publish step ANSI-strips the output before running the same normalizer. A model (e.g. deepseek-v3) whose raw output passed but whose ANSI-stripped form failed was recorded as the pool's "success", so the pool stopped instead of trying the next model — and the publish step then failed the whole review with "Selected successful OpenCode output did not include a valid control conclusion" (observed after models were reordered by quota). Strip terminal escapes in normalize_opencode_output before validating, matching the publish step exactly, so the pool only records success for output the publish step will accept and otherwise falls through to the next model. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RTAMs4bpSZS77Xe3RQjv9P

opencode-agent · 2026-07-05T12:28:35Z

OpenCode Review Overview

Head SHA: 58e66f703eb51a763f7a4e475abf1f3ea48b27c9
Workflow run: 28740596616
Workflow attempt: 1
Gate result: APPROVE (approval step)

Pull request overview

OpenCode reviewed the current-head bounded evidence and found no blocking issues.

Findings

No blocking findings.

Summary

Approval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Verification posture: CodeGraph evidence was initialized and bounded current-head evidence reviewed for changed-file evidence including scripts/ci/run_opencode_review_model_pool.sh.
Linter/static: workflow/static review evidence is bounded by the current-head GitHub Checks gate and changed-file evidence.
TDD/regression: coverage execution evidence and focused changed hunks were reviewed from bounded-review-evidence.md.
Coverage: coverage execution evidence reports test coverage as not applicable because no supported changed source files or package manifests were found.
Docstring coverage: coverage execution evidence reports docstring coverage as not applicable because no supported changed source files or package manifests were found.
DAG: CodeGraph/source-backed behavior map connects scripts/ci/run_opencode_review_model_pool.sh to the affected review, runtime, or workflow path and required checks.
PoC/execution: coverage-evidence job executed on the current head and reported PASS.
DDD/domain: workflow and repository-governance invariants were reviewed against changed files in bounded evidence.
CDD/context: CodeGraph evidence, changed-file history, and focused hunks were reviewed from bounded-review-evidence.md.
Similar issues: changed-file history evidence was reviewed for comparable local precedents.
Claim/concept check: bounded evidence, repository source, current-head workflow evidence, and, where numeric, scientific, statistical, or literature-backed claims are affected, original-paper/formula evidence and parameter-recovery expectations were used for claims.
Standards search: standards and external-source checks are delegated to configured OpenCode web_search/Context7/DeepWiki sources when applicable; no evidence-backed standards blocker is present in bounded evidence.
Compatibility/convention: changed workflow/script conventions, object naming, and reserved-word safety for schema/API/config/code surfaces were checked in bounded evidence.
Breaking-change/backcompat: deployment evidence and changed-file history were checked for backward-compatibility risk.
Performance: changed surfaces were checked for performance risk in bounded evidence.
Developer experience: changed automation, review, test, setup, and maintenance surfaces were checked for helpful or obstructive DX impact in bounded evidence.
User experience: connected user, operator, API, CLI, documentation, review-comment, status-check, rendering, and workflow-reader behavior was checked for contradictions against code, docs, and tests in bounded evidence.
Visual/DOM: Playwright visual, DOM locator, ARIA snapshot, console, and responsive evidence were checked when a web UI surface was present; for non-web surfaces, API/CLI/log/docs/workflow interaction evidence was reviewed instead.
Accessibility/i18n: accessibility, localization, and human-readable text surfaces were checked where UI, CLI, API message, docs, logs, or review text changed.
Supply-chain/license: dependency, package, model, container, and external-tool changes were checked in bounded evidence.
Packaging: package, build, test, lint, and security contracts were checked in bounded evidence.
Security/privacy: workflow-token, review-gate, and repository-automation security/privacy boundaries were checked in bounded evidence.

Result: APPROVE
Reason: The PR addresses a critical issue where the model pool could incorrectly mark a model as successful when its ANSI-stripped output fails validation, preventing the pool from cycling to the next model. The fix ensures consistency between the pool's validation and the publish step's validation.
Head SHA: 58e66f703eb51a763f7a4e475abf1f3ea48b27c9
Workflow run: 28740596616
Workflow attempt: 1

Changed-File Evidence Map

flowchart LR
  PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
  Evidence --> S1["CI script: run_opencode_review_model_pool.sh"]
  S1 --> I1["review and security gate shell path"]
  I1 --> R1["Review risk: CI script: run_opencode_review_model_pool.sh"]
  R1 --> V1["bash -n plus Strix self-test"]

opencode-agent

Pull request overview

OpenCode reviewed the current-head bounded evidence and found no blocking issues.

Findings

No blocking findings.

Summary

Approval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Verification posture: CodeGraph evidence was initialized and bounded current-head evidence reviewed for changed-file evidence including scripts/ci/run_opencode_review_model_pool.sh.
Linter/static: workflow/static review evidence is bounded by the current-head GitHub Checks gate and changed-file evidence.
TDD/regression: coverage execution evidence and focused changed hunks were reviewed from bounded-review-evidence.md.
Coverage: coverage execution evidence reports test coverage as not applicable because no supported changed source files or package manifests were found.
Docstring coverage: coverage execution evidence reports docstring coverage as not applicable because no supported changed source files or package manifests were found.
DAG: CodeGraph/source-backed behavior map connects scripts/ci/run_opencode_review_model_pool.sh to the affected review, runtime, or workflow path and required checks.
PoC/execution: coverage-evidence job executed on the current head and reported PASS.
DDD/domain: workflow and repository-governance invariants were reviewed against changed files in bounded evidence.
CDD/context: CodeGraph evidence, changed-file history, and focused hunks were reviewed from bounded-review-evidence.md.
Similar issues: changed-file history evidence was reviewed for comparable local precedents.
Claim/concept check: bounded evidence, repository source, current-head workflow evidence, and, where numeric, scientific, statistical, or literature-backed claims are affected, original-paper/formula evidence and parameter-recovery expectations were used for claims.
Standards search: standards and external-source checks are delegated to configured OpenCode web_search/Context7/DeepWiki sources when applicable; no evidence-backed standards blocker is present in bounded evidence.
Compatibility/convention: changed workflow/script conventions, object naming, and reserved-word safety for schema/API/config/code surfaces were checked in bounded evidence.
Breaking-change/backcompat: deployment evidence and changed-file history were checked for backward-compatibility risk.
Performance: changed surfaces were checked for performance risk in bounded evidence.
Developer experience: changed automation, review, test, setup, and maintenance surfaces were checked for helpful or obstructive DX impact in bounded evidence.
User experience: connected user, operator, API, CLI, documentation, review-comment, status-check, rendering, and workflow-reader behavior was checked for contradictions against code, docs, and tests in bounded evidence.
Visual/DOM: Playwright visual, DOM locator, ARIA snapshot, console, and responsive evidence were checked when a web UI surface was present; for non-web surfaces, API/CLI/log/docs/workflow interaction evidence was reviewed instead.
Accessibility/i18n: accessibility, localization, and human-readable text surfaces were checked where UI, CLI, API message, docs, logs, or review text changed.
Supply-chain/license: dependency, package, model, container, and external-tool changes were checked in bounded evidence.
Packaging: package, build, test, lint, and security contracts were checked in bounded evidence.
Security/privacy: workflow-token, review-gate, and repository-automation security/privacy boundaries were checked in bounded evidence.

Result: APPROVE
Reason: The PR addresses a critical issue where the model pool could incorrectly mark a model as successful when its ANSI-stripped output fails validation, preventing the pool from cycling to the next model. The fix ensures consistency between the pool's validation and the publish step's validation.
Head SHA: 58e66f703eb51a763f7a4e475abf1f3ea48b27c9
Workflow run: 28740596616
Workflow attempt: 1

Changed-File Evidence Map

flowchart LR
  PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
  Evidence --> S1["CI script: run_opencode_review_model_pool.sh"]
  S1 --> I1["review and security gate shell path"]
  I1 --> R1["Review risk: CI script: run_opencode_review_model_pool.sh"]
  R1 --> V1["bash -n plus Strix self-test"]

seonghobae merged commit 80bf91e into main Jul 5, 2026
12 checks passed

opencode-agent Bot approved these changes Jul 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(review): cycle to next model when output fails publish validation#315

fix(review): cycle to next model when output fails publish validation#315
seonghobae merged 1 commit into
mainfrom
fix/review-pool-cycle-on-invalid

seonghobae commented Jul 5, 2026

Uh oh!

Uh oh!

opencode-agent Bot commented Jul 5, 2026 •

edited by github-actions Bot

Loading

Uh oh!

opencode-agent Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

seonghobae commented Jul 5, 2026

문제 (순환 버그)

수정

테스트

Uh oh!

Uh oh!

opencode-agent Bot commented Jul 5, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OpenCode Review Overview

Pull request overview

Findings

Summary

Changed-File Evidence Map

Uh oh!

opencode-agent Bot left a comment

Choose a reason for hiding this comment

Pull request overview

Findings

Summary

Changed-File Evidence Map

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

opencode-agent Bot commented Jul 5, 2026 •

edited by github-actions Bot

Loading