fix(review): order review models by GitHub Models quota, not gpt-5 first#314
Conversation
#295 put openai/gpt-5 first in OPENCODE_MODEL_CANDIDATES. On GitHub Models, gpt-5/o3 are the "Reasoning" tier with the smallest quota (8-12 requests/day, 1-2/min) and hang or rate-limit constantly, so with gpt-5 first the pool burned the entire 350-min step on a stalled flagship and never fell back — reviews ran ~6 h and failed org-wide (observed: appguardrail review stuck >5 h in the model pool step, PRs unmergeable). Reorder candidates by quota allowance, largest first: non-reasoning "Low" tier (deepseek-v3, mistral-medium, llama-4: 150-450 req/day) then mini-reasoning (o4-mini, o3-mini, gpt-5-mini/nano/chat) then DeepSeek-R1 then the flagships (o3, gpt-5) last as quality fallback. Only the candidate order changes; ATTEMPTS(5), RUN_TIMEOUT(20400), step timeout(350) are left as-is per request. Reconcile test_opencode_agent_contract.py with the new order and with the already-current #295 settings it still asserted stale (ATTEMPTS 1->5, RUN_TIMEOUT 600->20400, step 285->350) so the guard test is green and accurate again. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RTAMs4bpSZS77Xe3RQjv9P
There was a problem hiding this comment.
Pull request overview
OpenCode cannot approve yet because required coverage evidence did not pass.
Review outcome
1. HIGH .github/workflows/opencode-review.yml:1 - Coverage evidence did not prove required test/docstring evidence
-
Problem: The required coverage-evidence job result was
failure, so OpenCode cannot establish approval sufficiency for this head. -
Root cause: Automated approval is only valid when the same-head coverage-evidence job proves supported repository test suites passed and configured docstring gates passed or were advisory, or reports not applicable because no supported source files or package manifests exist. Missing, failed, skipped, unavailable, or unsupported-tooling test evidence is a blocker.
-
Fix: Install or configure the repository test/docstring evidence tooling when source files or package manifests exist, rerun the current-head coverage-evidence job, and approve only after it reports
successwith required evidence or explicit no-source not-applicable evidence. -
Regression test: Keep the approval branch checking
needs.coverage-evidence.result == successbefore posting APPROVE, and publish REQUEST_CHANGES when coverage-evidence blocker states such as cancelled, skipped, failed, unsupported-tooling, or below-100 evidence are present. -
Result: REQUEST_CHANGES
-
Reason: coverage-evidence result was
failure, so required test/docstring evidence was not proven for current head38ac3ebe6f2ad5c6a2b6a697a343a9314ee20a8a. -
Head SHA:
38ac3ebe6f2ad5c6a2b6a697a343a9314ee20a8a -
Workflow run: 28737665901
-
Workflow attempt: 1
Coverage evidence
Coverage Evidence
- Head SHA:
38ac3ebe6f2ad5c6a2b6a697a343a9314ee20a8a - Required test evidence: supported repository test suites must pass.
- Required docstring evidence: repository-owned docstring gates must pass when configured; otherwise docstring coverage is advisory.
Python project dependencies (.)
Using CPython 3.12.3 interpreter at: /usr/bin/python3
Creating virtual environment at: .venv
Resolved 17 packages in 118ms
Downloading pygments (1.2MiB)
Downloaded pygments
Prepared 13 packages in 100ms
Installed 13 packages in 16ms
+ attrs==26.1.0
+ click==8.4.2
+ colorama==0.4.6
+ coverage==7.15.0
+ iniconfig==2.3.0
+ interrogate==1.7.0
+ packaging==26.2
+ pluggy==1.6.0
+ py==1.11.0
+ pygments==2.20.0
+ pytest==9.1.1
+ pytest-cov==7.1.0
+ tabulate==0.10.0
- Result: PASS
Python coverage with missing-line report (.)
============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-9.1.1, pluggy-1.6.0
rootdir: /home/runner/work/.github/.github/pr-head
configfile: pyproject.toml
plugins: cov-7.1.0
collected 166 items
tests/test_assert_opencode_reasoning_effort.py ........ [ 4%]
tests/test_codeql_pr_workflow_contract.py . [ 5%]
tests/test_noema_review_gate.py .......F... [ 12%]
tests/test_opencode_agent_contract.py ............. [ 19%]
tests/test_opencode_review_normalize_output.py ......................... [ 34%]
[ 34%]
tests/test_opencode_workflow_shell_syntax.py . [ 35%]
tests/test_pr_governance_audit_contract.py ... [ 37%]
tests/test_pr_review_fix_scheduler.py ................... [ 48%]
tests/test_pr_review_fix_scheduler_coverage.py .. [ 50%]
tests/test_pr_review_merge_scheduler.py ................................ [ 69%]
.............................. [ 87%]
tests/test_render_opencode_prompt_template.py .... [ 89%]
tests/test_review_execution_contracts.py .. [ 90%]
tests/test_sandboxed_verify.py ......... [ 96%]
tests/test_sandboxed_web_e2e.py ...... [100%]
=================================== FAILURES ===================================
_______________ test_call_llm_handles_configuration_and_verdicts _______________
monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f77ba18bd40>
def test_call_llm_handles_configuration_and_verdicts(monkeypatch):
pr = make_pr()
monkeypatch.delenv("NOEMA_LLM_API_URL", raising=False)
monkeypatch.delenv("NOEMA_LLM_API_KEY", raising=False)
assert noema.call_llm("owner/repo", 1, pr, "diff", False) is None
monkeypatch.setenv("NOEMA_LLM_API_URL", "file:///etc/passwd")
monkeypatch.setenv("NOEMA_LLM_API_KEY", "secret")
> with pytest.raises(ValueError, match="must start with http:// or https://"):
E AssertionError: Regex pattern did not match.
E Expected regex: 'must start with http:// or https://'
E Actual message: 'URL scheme must be http or https'
tests/test_noema_review_gate.py:209: AssertionError
----------------------------- Captured stdout call -----------------------------
Noema LLM review unavailable: NOEMA_LLM_API_URL or NOEMA_LLM_API_KEY is not configured.
=============================== warnings summary ===============================
tests/test_assert_opencode_reasoning_effort.py::test_module_entrypoint_success
<frozen runpy>:128: RuntimeWarning: 'scripts.ci.assert_opencode_reasoning_effort' found in sys.modules after import of package 'scripts.ci', but prior to execution of 'scripts.ci.assert_opencode_reasoning_effort'; this may result in unpredictable behaviour
tests/test_render_opencode_prompt_template.py::test_module_entrypoint
<frozen runpy>:128: RuntimeWarning: 'scripts.ci.render_opencode_prompt_template' found in sys.modules after import of package 'scripts.ci', but prior to execution of 'scripts.ci.render_opencode_prompt_template'; this may result in unpredictable behaviour
tests/test_review_execution_contracts.py::test_discovers_package_managers_java_r_json_and_main
<frozen runpy>:128: RuntimeWarning: 'scripts.ci.review_execution_contracts' found in sys.modules after import of package 'scripts.ci', but prior to execution of 'scripts.ci.review_execution_contracts'; this may result in unpredictable behaviour
tests/test_sandboxed_verify.py::test_module_main_entrypoint
<frozen runpy>:128: RuntimeWarning: 'scripts.ci.sandboxed_verify' found in sys.modules after import of package 'scripts.ci', but prior to execution of 'scripts.ci.sandboxed_verify'; this may result in unpredictable behaviour
tests/test_sandboxed_web_e2e.py::test_module_import_and_main_entrypoint
<frozen runpy>:128: RuntimeWarning: 'scripts.ci.sandboxed_web_e2e' found in sys.modules after import of package 'scripts.ci', but prior to execution of 'scripts.ci.sandboxed_web_e2e'; this may result in unpredictable behaviour
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/test_noema_review_gate.py::test_call_llm_handles_configuration_and_verdicts - AssertionError: Regex pattern did not match.
Expected regex: 'must start with http:// or https://'
Actual message: 'URL scheme must be http or https'
================== 1 failed, 165 passed, 5 warnings in 5.85s ===================
- Result: FAIL (exit 1)
Python docstring coverage advisory
RESULT: PASSED (minimum: 100.0%, actual: 100.0%)
- Result: PASS
Coverage Decision
- Result: FAIL
- Test evidence: not proven passing
- Docstring evidence: not proven passing when configured
- Failure count: 1
Changed-File Evidence Map
flowchart LR
PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
Evidence --> S1["Workflow: opencode-review.yml"]
S1 --> I1["GitHub Actions review job"]
I1 --> R1["Review risk: Workflow: opencode-review.yml"]
R1 --> V1["actionlint plus required checks"]
Evidence --> S2["Test: test_opencode_agent_contract.py"]
S2 --> I2["regression suite"]
I2 --> R2["Review risk: Test: test_opencode_agent_contract.py"]
R2 --> V2["targeted test run"]
OpenCode Review Overview
Pull request overviewOpenCode cannot approve yet because required coverage evidence did not pass. Review outcome1. HIGH .github/workflows/opencode-review.yml:1 - Coverage evidence did not prove required test/docstring evidence
Coverage evidenceCoverage Evidence
Python project dependencies (.)
Python coverage with missing-line report (.)
Python docstring coverage advisory
Coverage Decision
Changed-File Evidence Mapflowchart LR
PR["PR changed files"] --> Evidence["OpenCode bounded evidence"]
Evidence --> S1["Workflow: opencode-review.yml"]
S1 --> I1["GitHub Actions review job"]
I1 --> R1["Review risk: Workflow: opencode-review.yml"]
R1 --> V1["actionlint plus required checks"]
Evidence --> S2["Test: test_opencode_agent_contract.py"]
S2 --> I2["regression suite"]
I2 --> R2["Review risk: Test: test_opencode_agent_contract.py"]
R2 --> V2["targeted test run"]
|
문제 (진단)
#295가
OPENCODE_MODEL_CANDIDATES1순위를openai/gpt-5로 바꿈. GitHub Models에서 gpt-5/o3는 "Reasoning" 티어로 일일 쿼터가 가장 작음(812 req/day, 12/min) — rate-limit·hang이 잦음. gpt-5가 1순위 +ATTEMPTS=5+RUN_TIMEOUT=20400+ 스텝 350분이라, 멈춘 flagship에 스텝 예산 전체를 소진하고 폴백 못 함 → 리뷰가 ~6시간 돌다 실패, 조직 전체 머지 차단.실측: appguardrail 04:59 리뷰의 model pool 스텝이 5시간+ 정체.
변경 (지시대로 후보 순서만)
쿼터 허용량 큰 순서로 재배치:
ATTEMPTS=5,RUN_TIMEOUT=20400, 스텝timeout-minutes=350은 그대로 둠(요청대로 설정 미변경).테스트
test_opencode_agent_contract.py를 새 순서 + #295가 방치한 현 설정값(ATTEMPTS 1→5, RUN_TIMEOUT 600→20400, 스텝 285→350)에 맞춰 정합화 → 13개 전부 green(설정값은 변경 아님, 테스트를 현실에 맞춤).주의
리뷰가 현재 깨진 상태(gpt-5 정체)라 이 PR 자체 리뷰도 막힘 → break-glass 머지 예정.
🤖 Generated with Claude Code