Skip to content

fix(review): give the publish gate the same evidence file as the model pool#317

Merged
seonghobae merged 2 commits into
mainfrom
fix/publish-evidence-repair-parity
Jul 5, 2026
Merged

fix(review): give the publish gate the same evidence file as the model pool#317
seonghobae merged 2 commits into
mainfrom
fix/publish-evidence-repair-parity

Conversation

@seonghobae

Copy link
Copy Markdown
Contributor

근본 원인 (CI 정밀 디버그로 확정)

리뷰 실패 "Selected successful OpenCode output did not include a valid control conclusion"의 원인: APPROVE 검증 시 valid_control()repair_approval_summary()누락된 24개 리뷰 라벨을 bounded evidence 파일에서 채웁니다. evidence 파일은 OPENCODE_EVIDENCE_FILE env로 찾습니다.

  • 모델 풀 스텝: OPENCODE_EVIDENCE_FILE 설정 → repair 성공 → 성공 기록
  • publish 스텝: OPENCODE_EVIDENCE_FILE 미설정 → repair 불가 → 라벨 없음 → NO_CONCLUSION(exit 4) → 리뷰 실패

비추론 모델(bare APPROVE summary)을 쿼터순 1순위로 올리자 표면화됨.

수정

publish 스텝 env에 풀 스텝과 동일한 OPENCODE_EVIDENCE_FILE 추가 → 두 스텝이 동일하게 repair·검증. 이제 유효 APPROVE가 실제로 게시됨.

🤖 Generated with Claude Code

seonghobae and others added 2 commits July 5, 2026 21:57
…succeeds

Follow-up to the pool-cycling fix. opencode_review_normalize_output.py rewrites
its input in place and is NOT idempotent, and the publish step normalizes the
selected output again. The pool was normalizing (mutating) the very file it then
handed to the publish step, so a model whose FIRST normalize passed (recorded as
the pool's success) failed the publish step's SECOND normalize — ending the run
in "Selected successful OpenCode output did not include a valid control
conclusion" instead of the review completing.

Normalize/approve-gate a throwaway ANSI-stripped copy and leave the model output
pristine, so the publish step performs the one-and-only normalize of that
content and its result matches the pool's decision. Verified locally against a
non-idempotent normalizer stub: pool passes, output stays pristine, publish
normalize passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RTAMs4bpSZS77Xe3RQjv9P
…l pool

Root cause of reviews failing with "Selected successful OpenCode output did not
include a valid control conclusion": for an APPROVE, valid_control() repairs the
summary by filling the required review labels from the bounded evidence file
resolved via OPENCODE_EVIDENCE_FILE / OPENCODE_APPROVAL_REPAIR_EVIDENCE_FILE. The
model pool step sets OPENCODE_EVIDENCE_FILE, so the pool accepts a repaired
APPROVE and records success. The publish step did NOT set it, so its re-run of
the normalizer could not repair the same summary, rejected it (NO_CONCLUSION,
exit 4), and failed the whole review — even though a model had produced a valid
APPROVE. This surfaced once non-reasoning models (which emit a bare APPROVE
summary) were ordered first.

Expose the same OPENCODE_EVIDENCE_FILE in the publish step so both steps repair
and validate identically.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RTAMs4bpSZS77Xe3RQjv9P
@seonghobae seonghobae merged commit 7b5285a into main Jul 5, 2026
12 checks passed

@opencode-agent opencode-agent Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

OpenCode reviewed the current-head bounded evidence and found no blocking issues.

Findings

No blocking findings.

Summary

Approval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Verification posture: CodeGraph evidence was initialized and bounded current-head evidence reviewed for changed-file evidence including .github/workflows/opencode-review.yml, scripts/ci/run_opencode_review_model_pool.sh.
Linter/static: workflow/static review evidence is bounded by the current-head GitHub Checks gate and changed-file evidence.
TDD/regression: coverage execution evidence and focused changed hunks were reviewed from bounded-review-evidence.md.
Coverage: coverage execution evidence reports test coverage as not applicable because no supported changed source files or package manifests were found.
Docstring coverage: coverage execution evidence reports docstring coverage as not applicable because no supported changed source files or package manifests were found.
DAG: CodeGraph/source-backed behavior map connects .github/workflows/opencode-review.yml to the affected review, runtime, or workflow path and required checks.
PoC/execution: coverage-evidence job executed on the current head and reported PASS.
DDD/domain: workflow and repository-governance invariants were reviewed against changed files in bounded evidence.
CDD/context: CodeGraph evidence, changed-file history, and focused hunks were reviewed from bounded-review-evidence.md.
Similar issues: changed-file history evidence was reviewed for comparable local precedents.
Claim/concept check: bounded evidence, repository source, current-head workflow evidence, and, where numeric, scientific, statistical, or literature-backed claims are affected, original-paper/formula evidence and parameter-recovery expectations were used for claims.
Standards search: standards and external-source checks are delegated to configured OpenCode web_search/Context7/DeepWiki sources when applicable; no evidence-backed standards blocker is present in bounded evidence.
Compatibility/convention: changed workflow/script conventions, object naming, and reserved-word safety for schema/API/config/code surfaces were checked in bounded evidence.
Breaking-change/backcompat: deployment evidence and changed-file history were checked for backward-compatibility risk.
Performance: changed surfaces were checked for performance risk in bounded evidence.
Developer experience: changed automation, review, test, setup, and maintenance surfaces were checked for helpful or obstructive DX impact in bounded evidence.
User experience: connected user, operator, API, CLI, documentation, review-comment, status-check, rendering, and workflow-reader behavior was checked for contradictions against code, docs, and tests in bounded evidence.
Visual/DOM: Playwright visual, DOM locator, ARIA snapshot, console, and responsive evidence were checked when a web UI surface was present; for non-web surfaces, API/CLI/log/docs/workflow interaction evidence was reviewed instead.
Accessibility/i18n: accessibility, localization, and human-readable text surfaces were checked where UI, CLI, API message, docs, logs, or review text changed.
Supply-chain/license: dependency, package, model, container, and external-tool changes were checked in bounded evidence.
Packaging: package, build, test, lint, and security contracts were checked in bounded evidence.
Security/privacy: workflow-token, review-gate, and repository-automation security/privacy boundaries were checked in bounded evidence.

  • Result: APPROVE
  • Reason: The changes ensure consistency in evidence file handling between the model pool and publish gate, resolving a previous validation issue.
  • Head SHA: 13ce9bfe3fe2361e1f5fddba0058c11b63bda841
  • Workflow run: 28742457026
  • Workflow attempt: 1

Changed-File Evidence Map

flowchart LR
  PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
  Evidence --> S1["Workflow: opencode-review.yml"]
  S1 --> I1["GitHub Actions review job"]
  I1 --> R1["Review risk: Workflow: opencode-review.yml"]
  R1 --> V1["actionlint plus required checks"]
  Evidence --> S2["CI script: run_opencode_review_model_pool.sh"]
  S2 --> I2["review and security gate shell path"]
  I2 --> R2["Review risk: CI script: run_opencode_review_model_pool.sh"]
  R2 --> V2["bash -n plus Strix self-test"]
Loading

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

OpenCode Review Overview

  • Head SHA: 13ce9bfe3fe2361e1f5fddba0058c11b63bda841
  • Workflow run: 28742457026
  • Workflow attempt: 1
  • Gate result: APPROVE (approval step)

Pull request overview

OpenCode reviewed the current-head bounded evidence and found no blocking issues.

Findings

No blocking findings.

Summary

Approval sufficiency: bounded evidence supplied affirmative approval evidence for changed files, coverage/docstring posture, risk surfaces, and current-head verification; approval is not based merely on the absence of known blockers.
Verification posture: CodeGraph evidence was initialized and bounded current-head evidence reviewed for changed-file evidence including .github/workflows/opencode-review.yml, scripts/ci/run_opencode_review_model_pool.sh.
Linter/static: workflow/static review evidence is bounded by the current-head GitHub Checks gate and changed-file evidence.
TDD/regression: coverage execution evidence and focused changed hunks were reviewed from bounded-review-evidence.md.
Coverage: coverage execution evidence reports test coverage as not applicable because no supported changed source files or package manifests were found.
Docstring coverage: coverage execution evidence reports docstring coverage as not applicable because no supported changed source files or package manifests were found.
DAG: CodeGraph/source-backed behavior map connects .github/workflows/opencode-review.yml to the affected review, runtime, or workflow path and required checks.
PoC/execution: coverage-evidence job executed on the current head and reported PASS.
DDD/domain: workflow and repository-governance invariants were reviewed against changed files in bounded evidence.
CDD/context: CodeGraph evidence, changed-file history, and focused hunks were reviewed from bounded-review-evidence.md.
Similar issues: changed-file history evidence was reviewed for comparable local precedents.
Claim/concept check: bounded evidence, repository source, current-head workflow evidence, and, where numeric, scientific, statistical, or literature-backed claims are affected, original-paper/formula evidence and parameter-recovery expectations were used for claims.
Standards search: standards and external-source checks are delegated to configured OpenCode web_search/Context7/DeepWiki sources when applicable; no evidence-backed standards blocker is present in bounded evidence.
Compatibility/convention: changed workflow/script conventions, object naming, and reserved-word safety for schema/API/config/code surfaces were checked in bounded evidence.
Breaking-change/backcompat: deployment evidence and changed-file history were checked for backward-compatibility risk.
Performance: changed surfaces were checked for performance risk in bounded evidence.
Developer experience: changed automation, review, test, setup, and maintenance surfaces were checked for helpful or obstructive DX impact in bounded evidence.
User experience: connected user, operator, API, CLI, documentation, review-comment, status-check, rendering, and workflow-reader behavior was checked for contradictions against code, docs, and tests in bounded evidence.
Visual/DOM: Playwright visual, DOM locator, ARIA snapshot, console, and responsive evidence were checked when a web UI surface was present; for non-web surfaces, API/CLI/log/docs/workflow interaction evidence was reviewed instead.
Accessibility/i18n: accessibility, localization, and human-readable text surfaces were checked where UI, CLI, API message, docs, logs, or review text changed.
Supply-chain/license: dependency, package, model, container, and external-tool changes were checked in bounded evidence.
Packaging: package, build, test, lint, and security contracts were checked in bounded evidence.
Security/privacy: workflow-token, review-gate, and repository-automation security/privacy boundaries were checked in bounded evidence.

  • Result: APPROVE
  • Reason: The changes ensure consistency in evidence file handling between the model pool and publish gate, resolving a previous validation issue.
  • Head SHA: 13ce9bfe3fe2361e1f5fddba0058c11b63bda841
  • Workflow run: 28742457026
  • Workflow attempt: 1

Changed-File Evidence Map

flowchart LR
  PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
  Evidence --> S1["Workflow: opencode-review.yml"]
  S1 --> I1["GitHub Actions review job"]
  I1 --> R1["Review risk: Workflow: opencode-review.yml"]
  R1 --> V1["actionlint plus required checks"]
  Evidence --> S2["CI script: run_opencode_review_model_pool.sh"]
  S2 --> I2["review and security gate shell path"]
  I2 --> R2["Review risk: CI script: run_opencode_review_model_pool.sh"]
  R2 --> V2["bash -n plus Strix self-test"]
Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant