Skip to content

chore(pa): backfill PR #221 — tests + review fixes that didn't push before merge#249

Merged
galanko merged 7 commits into
mainfrom
chore/pa-pr1-backfill
May 31, 2026
Merged

chore(pa): backfill PR #221 — tests + review fixes that didn't push before merge#249
galanko merged 7 commits into
mainfrom
chore/pa-pr1-backfill

Conversation

@galanko
Copy link
Copy Markdown
Collaborator

@galanko galanko commented May 31, 2026

Why this PR exists

The squash-merge of #221 captured the state of the branch at `c602567` (OpenAI probe fix) — the last commit that successfully reached origin. Four commits I made after that bounced on push auth and never landed on the PR branch, so the merge silently shipped without them. This PR backfills the lost work onto `main`.

These are not new ideas — every commit here was reviewed in the PR #221 discussion thread and already approved in principle. The fix here is mechanical: push-then-merge in a clean PR.

What's in this PR (cherry-picks, in order)

Commit What
`de3d120` Test A + Test B — `TestPaResolverWiring::test_run_pa_no_tools_resolves_env_and_model_per_call` locks down that the executor awaits both resolvers per call (no caching) + `test_executor_provider_chain.py` parametrized over all 6 providers, asserts `build_model` produces the right Model subclass + `model_name` for each. 7 new test cases.
`64021e7` Review nits — `pipeline.py` stale "All 4" comment → count-independent; `executor.py` per-attempt-timeout comment in the rate-limit loop; `CHANGELOG.md` `[Unreleased] Fixed` entry for the OpenAI BYOK probe fix in #221.
`e13a406` `/code-review` findings (5 of 6 actionable): `e2e` fixture wires both resolvers (was constructing executor without them — 3 e2e tests would have hit the resolver-not-wired guard); `owner_resolver` → `exposure_analyzer` assertion (PR #1 dropped owner_resolver from PIPELINE_ORDER); `raw_text` → `structured_output` assertion (PA returns parsed output directly, no prose); progress-callback test inverted to lock down the "no streaming for no-tools agents" behavior; `UnexpectedModelBehavior` corrective re-roll in `_run_pa_no_tools` (1 attempt, separate counter — PA's internal output_type retry doesn't fire for prose-only responses, this restores the OpenCode-era `_RETRY_PROMPT` safety net); `ContextDocument._evidence_section`/`_plan_section` promoted to public (`runtime/_prompts.py` was reaching into private API); `no_tools.py` raw `KeyError` on unknown agent_type → typed `ValueError` guard.
`194891a` e2e OpenCode isolation — `settings.opencode_port = 4097` at conftest module-import time + rebind `config_manager.opencode_client`. The default `opencode_port=4096` was causing the e2e suite to silently share OpenCode with a running daemon (`OpenCodeProcess.start()` succeeds if anyone answers on that port). The model-update e2e test consequently flapped against whatever model the user had selected. With the dedicated port, the full e2e suite is 31/33 PASS / 2 docs-skipped / 0 FAIL.

Test plan

  • `uv run ruff check cliff/ tests/` — clean
  • `uv run pytest tests/test_executor.py tests/test_pipeline.py tests/agents/ -m "not e2e"` — 124 passed, 2 docs-skipped (~7m, slow because tests/agents/ includes the LLM-driven `test_plain_description_eval.py`)
  • e2e suite verified pre-merge in the previous PR session — 31/33 pass / 2 docs-skipped / 0 fail

Why these mattered enough to recover

  • The two new test files close the resolver-staleness invariant and the per-provider chain — same provider testing feat(agents): migrate six no-tools agents to Pydantic AI substrate #221's review thread explicitly asked for.
  • The `UnexpectedModelBehavior` retry restores parity with the OpenCode-era `_RETRY_PROMPT` for weak-model users (prose-only responses without it surface as an immediate `AgentProcessError`).
  • The e2e port isolation is the difference between an e2e suite that works and one that silently shares state with the local daemon — anyone running the e2e suite while a daemon is up will get bitten until this lands.
  • The ContextDocument rename promotes private API that `runtime/_prompts.py` already depends on — without it, a future refactor of ContextDocument silently breaks every PA agent prompt.

Risk

Low. Each cherry-pick applied cleanly, no conflicts. The runtime changes are additive (new retry branch, new guard, new tests). The renames are internal (`_evidence_section` had only two call sites, both updated in the same commit). No dependency, no migration, no API contract change.

Stacked PRs

PR #2 (`feat/pa-migrate-executor`) is queued behind this — should rebase onto main once this merges.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Restored OpenAI BYOK key saving.
    • Clearer handling and messages when an unknown no-tools agent type is used.
  • Improvements

    • One automatic retry for certain unparseable agent outputs to reduce run failures.
    • Improved assembly of prior-context (evidence/plan) in prompts for more accurate agent context.
  • Documentation

    • Changelog updated with the BYOK fix.

galanko and others added 4 commits May 31, 2026 21:07
Adds the two tests the survey flagged as the highest-value gaps before
the PA migration merges:

1. test_executor.py — TestPaResolverWiring asserts the executor awaits
   both ai_env_resolver and ai_model_resolver on EVERY _run_pa_no_tools
   call (not cached at __init__), and that build_model receives the
   per-call resolved values — prevents the silent staleness regression
   where a UI provider-switch wouldn't take effect until daemon restart.

2. tests/agents/test_executor_provider_chain.py — parametrized over the
   six advertised providers (openrouter / anthropic / openai / google /
   ollama / custom). Uses the REAL build_model factory and captures the
   Model that run_no_tools_agent actually receives. Asserts subclass +
   model_name per provider. Closes the handshake gap between the
   per-provider factory tests and the per-agent runtime tests, and
   doubles as a regression net for any future probe-shape bug like the
   gpt-5 max_tokens one fixed in c602567.

7 new test cases, ~0.5s combined runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from the PR review:

- pipeline.py: drop the stale "All 4 pipeline sections" count from the
  fall-through comment. Owner_resolver was dropped from _PIPELINE_CHECKS
  in this PR, taking the count to 5; future drops would re-stale a
  fixed number. "All pipeline sections" is count-independent.

- executor.py: note that ``timeout`` in the PA rate-limit retry loop
  is per-attempt, not a wall-clock cap on the whole agent run. Mirrors
  the OpenCode-era loop semantics and prevents the next reader from
  assuming a budget-cap that isn't there.

- CHANGELOG.md: add an Unreleased "Fixed" entry for the OpenAI BYOK
  probe-model change in c602567 (gpt-5 → gpt-4o-mini). User-visible
  behavior change; deserved a release note.

No code-behavior changes, no test changes; ruff + the 26 PA-touched
tests (TestPaResolverWiring + provider-chain + test_pipeline) stay
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five fixes from the extra-high-recall code review pass:

1. tests/e2e/test_agent_pipeline_e2e.py — the executor fixture was
   constructing AgentExecutor(pool, context_builder) without the
   ai_env_resolver / ai_model_resolver this PR added; three tests
   would have hit the resolvers-not-wired guard at executor.py:1401
   on the first PA call. Extract the resolver pair to module-level
   helpers shared by the pool and executor fixtures; e2e job now
   exercises the PA path end-to-end.

2. Same file, test_suggest_next_advances_after_enrichment — the
   assertion expected owner_resolver as the next step after
   enrichment. owner_resolver was dropped from PIPELINE_ORDER in
   PR #1, so the next step is now exposure_analyzer. Updated docstring
   + assertion to match the new 5-agent forward walk.

3. executor.py — added one-attempt corrective re-roll on
   UnexpectedModelBehavior. PA's internal output_type retry only fires
   on schema-shaped-but-invalid responses; for prose-only / empty /
   thinking-only responses (the OpenCode-era _RETRY_PROMPT target),
   PA raises immediately. One re-roll at the same prompt often
   produces a well-formed response on the next sample. Separate
   parse-retry counter so it doesn't share budget with rate-limit
   retries.

4. context_document.py — promote _evidence_section and _plan_section
   from private (single-underscore) to public. runtime/_prompts.py
   reaches into both methods; the private naming made any future
   ContextDocument refactor a silent breakage with no type-checker
   warning. Both names are now stable public API.

5. runtime/no_tools.py — replace raw _RUNTIME_BUILDERS[agent_type]
   dict access with a guarded .get() that raises ValueError("Unknown
   no-tools agent type: ...") if missing. Defense in depth: the
   executor.py:1391 gate normally protects, but a regression or a
   direct caller would otherwise surface a bare KeyError instead of
   the typed taxonomy. Test updated to expect ValueError.

Tests:
- ruff clean
- 208 unit + integration pass (8 min runtime)
- 5/5 e2e pipeline tests pass post-fix (was 3/5 pre-fix)
- 1 pre-existing e2e failure surfaces (test_settings_e2e.py::test_update_model_and_verify — TestClient noop_lifespan reads installed-app model state instead of the in-memory fixture; file last touched in PR #9, not by this branch)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_update_model_and_verify was failing because the e2e session
shared OpenCode state with the user's installed daemon:

- ``settings.opencode_port`` defaults to 4096. The e2e session calls
  ``OpenCodeProcess()`` which tries to start an OpenCode subprocess
  on that port. If the daemon is already listening on 4096,
  ``OpenCodeProcess.start()`` succeeds as long as the port answers
  — even though the answerer is somebody else's process — and the
  e2e suite silently runs against the daemon's OpenCode.
- ``opencode_client = OpenCodeClient()`` is constructed at module-
  import time with ``settings.opencode_url`` baked in. The
  ``config_manager`` import path binds its own reference to this
  singleton. ``app_client`` rebinds the ``sessions``/``chat`` module
  references with a fresh client but never rebound
  ``config_manager.opencode_client``, so even with the port override
  the settings/model route hit the daemon's OpenCode.

Two-line fix:

1. Mutate ``settings.opencode_port = 4097`` at conftest module-import
   time so the session-scoped OpenCodeProcess starts on the dedicated
   port and the fresh client built in ``app_client`` picks it up.
2. Also rebind ``config_manager.opencode_client`` (and the source
   singleton in ``cliff.engine.client``) in ``app_client`` so the
   settings/model route hits the e2e-isolated OpenCode.

Verification:
- ``pytest tests/e2e/test_settings_e2e.py`` — 5/5 PASS (was 4/5)
- ``pytest tests/e2e/`` — 31/33 PASS, 2 docs-skipped, 0 FAIL
  (was 28/33 + 2 skipped + 3 failed on the pre-this-session full run)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 473887dc-494f-45c8-8c16-b700367bbd3c

📥 Commits

Reviewing files that changed from the base of the PR and between c7c8a22 and 53ed6a4.

📒 Files selected for processing (1)
  • backend/tests/e2e/conftest.py

📝 Walkthrough

Walkthrough

This PR hardens the Pydantic‑AI no‑tools agent path with a one-time re-roll for unparseable responses, promotes ContextDocument section helpers to public, improves error messages, and updates E2E fixtures and tests to match the new runtime behavior.

Changes

Pydantic-AI No-Tools Path Improvements

Layer / File(s) Summary
Context document public API surface
backend/cliff/workspace/context_document.py, backend/cliff/agents/runtime/_prompts.py
ContextDocument._evidence_section() and _plan_section() are renamed to public evidence_section() and plan_section(). The generate() method and _prompts.py caller are updated to use the public names.
No-tools runtime error handling
backend/cliff/agents/runtime/no_tools.py, backend/tests/agents/test_runtime_no_tools.py
run_no_tools_agent now uses guarded .get() instead of direct dict indexing, raising a typed ValueError with registered builder keys when agent_type is unknown instead of letting KeyError propagate. Tests/docstring updated accordingly.
PA unparseable response retry mechanism
backend/cliff/agents/executor.py, backend/tests/agents/test_executor_provider_chain.py
AgentExecutor._run_pa_no_tools adds a parse_retries_used counter and re-rolls once on UnexpectedModelBehavior when PA returns prose-only/empty outputs; raises AgentProcessError on the second occurrence. ModelHTTPError handling prefers exc.message when present. New provider-chain test verifies the concrete model instance and model_name passed to the agent.
Provider chain & unit tests
backend/tests/agents/test_executor_provider_chain.py, backend/tests/test_executor.py
Adds provider-chain E2E test and TestPaResolverWiring unit test to ensure per-call AI env/model resolution and that build_model receives the latest resolved values.
E2E environment isolation and shared resolvers
backend/tests/e2e/conftest.py, backend/tests/e2e/test_agent_pipeline_e2e.py
Conftest forces an isolated OpenCode port at import time and resets OpenCode client singletons across modules. Adds module-level async resolver helpers and wires them into WorkspaceProcessPool and AgentExecutor.
E2E assertion and pipeline updates
backend/tests/e2e/test_agent_pipeline_e2e.py
Enricher assertions now check parse_result.success and structured_output. Pipeline advancement expects exposure_analyzer after enrichment. Progress callback test asserts progress_texts is empty.
Changelog and minor comment
CHANGELOG.md, backend/cliff/agents/pipeline.py
Changelog documents OpenAI BYOK save fix (validator probe updated to gpt-4o-mini); minor comment wording generalized in suggest_next().

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • cliff-security/cliff#221: The main PR's Pydantic-AI no-tools retry logic and error handling directly build on the same no-tools substrate introduced in this retrieved PR, which also includes the OpenAI gpt-4o-mini validator probe change.

Poem

🐰 I found a stray validator stuck on gpt-5,
I nudged the parser once — then tried it twice,
Sections shed their underscores and took a bow,
Tests hop to their own port and pass somehow,
I nibble logs and hum: the runtime's feeling nice.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the PR as backfilling tests and review fixes from PR #221 that didn't push before merge, which aligns with the changeset containing new tests, CHANGELOG updates, and code-review refinements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chore/pa-pr1-backfill

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/tests/agents/test_runtime_no_tools.py (1)

12-12: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update docstring to reflect ValueError contract.

The comment still references "KeyError" but the test now expects ValueError per the updated no-tools runtime contract.

📝 Proposed fix
-* ``test_run_no_tools_agent_unknown_agent_type`` — defense-in-depth
-  KeyError for an agent type that isn't registered.
+* ``test_run_no_tools_agent_unknown_agent_type`` — defense-in-depth
+  ValueError for an agent type that isn't registered.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/agents/test_runtime_no_tools.py` at line 12, Update the test
docstring in backend/tests/agents/test_runtime_no_tools.py to mention ValueError
instead of KeyError so the documentation matches the test's contract; locate the
test function (e.g., the test in this file that asserts a ValueError for
unregistered agent types) and replace any "KeyError" wording in its docstring
with "ValueError" (ensure the docstring language aligns with the existing
assertion that expects ValueError).
backend/tests/e2e/test_agent_pipeline_e2e.py (1)

236-273: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Dead else: pytest.skip branch contradicts the new hard asserts.

Lines 240-246 now assert result.parse_result.success and a truthy structured_output. That makes the if at line 249 always true and the else (pytest.skip) at 267-273 unreachable. The skip was the old graceful path for non-structured LLM output; with the asserts it can never run. Verify persistence unconditionally and drop the dead branch.

♻️ Remove unreachable branch
-    # If structured output was parsed successfully, verify persistence
-    if result.parse_result.success and result.parse_result.structured_output:
-        # Context file should exist on disk
-        enrichment_file = ws_dir / "context" / "enrichment.json"
-        assert enrichment_file.exists(), "enrichment.json not written to disk"
-
-        enrichment_data = json.loads(enrichment_file.read_text())
-        assert isinstance(enrichment_data, dict)
-
-        # CONTEXT.md should have been regenerated
-        context_md = ws_dir / "CONTEXT.md"
-        assert context_md.exists()
-        context_text = context_md.read_text()
-        assert "enrichment" in context_text.lower() or "CVE" in context_text
-
-        # Sidebar should have been updated
-        assert result.sidebar_updated is True
-        assert result.context_version is not None
-        assert result.context_version > 0
-    else:
-        # Even if parse failed, we should still have completed successfully
-        # (the LLM responded, just not in the expected format)
-        pytest.skip(
-            f"LLM responded but output wasn't structured JSON: "
-            f"{result.parse_result.error}"
-        )
+    # success / structured_output are asserted above — verify persistence.
+    enrichment_file = ws_dir / "context" / "enrichment.json"
+    assert enrichment_file.exists(), "enrichment.json not written to disk"
+
+    enrichment_data = json.loads(enrichment_file.read_text())
+    assert isinstance(enrichment_data, dict)
+
+    # CONTEXT.md should have been regenerated
+    context_md = ws_dir / "CONTEXT.md"
+    assert context_md.exists()
+    context_text = context_md.read_text()
+    assert "enrichment" in context_text.lower() or "CVE" in context_text
+
+    # Sidebar should have been updated
+    assert result.sidebar_updated is True
+    assert result.context_version is not None
+    assert result.context_version > 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/e2e/test_agent_pipeline_e2e.py` around lines 236 - 273, The
current test makes hard assertions on result.parse_result.success and
result.parse_result.structured_output, which renders the later "else:
pytest.skip" branch unreachable; remove the dead else branch (the pytest.skip
block) and unconditionally perform the persistence checks (existence and
contents of enrichment_file, CONTEXT.md checks, and sidebar/context_version
assertions) after the asserts; keep references to result.parse_result,
enrichment_file (ws_dir / "context" / "enrichment.json"), context_md (ws_dir /
"CONTEXT.md"), and the sidebar/context_version assertions so the persistence
verification runs always when the earlier asserts pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@backend/tests/agents/test_runtime_no_tools.py`:
- Line 12: Update the test docstring in
backend/tests/agents/test_runtime_no_tools.py to mention ValueError instead of
KeyError so the documentation matches the test's contract; locate the test
function (e.g., the test in this file that asserts a ValueError for unregistered
agent types) and replace any "KeyError" wording in its docstring with
"ValueError" (ensure the docstring language aligns with the existing assertion
that expects ValueError).

In `@backend/tests/e2e/test_agent_pipeline_e2e.py`:
- Around line 236-273: The current test makes hard assertions on
result.parse_result.success and result.parse_result.structured_output, which
renders the later "else: pytest.skip" branch unreachable; remove the dead else
branch (the pytest.skip block) and unconditionally perform the persistence
checks (existence and contents of enrichment_file, CONTEXT.md checks, and
sidebar/context_version assertions) after the asserts; keep references to
result.parse_result, enrichment_file (ws_dir / "context" / "enrichment.json"),
context_md (ws_dir / "CONTEXT.md"), and the sidebar/context_version assertions
so the persistence verification runs always when the earlier asserts pass.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 2ad5b25f-c1ec-41ea-8ca6-9bc2e5121371

📥 Commits

Reviewing files that changed from the base of the PR and between 6e29fd7 and 194891a.

📒 Files selected for processing (11)
  • CHANGELOG.md
  • backend/cliff/agents/executor.py
  • backend/cliff/agents/pipeline.py
  • backend/cliff/agents/runtime/_prompts.py
  • backend/cliff/agents/runtime/no_tools.py
  • backend/cliff/workspace/context_document.py
  • backend/tests/agents/test_executor_provider_chain.py
  • backend/tests/agents/test_runtime_no_tools.py
  • backend/tests/e2e/conftest.py
  • backend/tests/e2e/test_agent_pipeline_e2e.py
  • backend/tests/test_executor.py

galanko and others added 2 commits May 31, 2026 22:03
Two quick wins from the CodeRabbit review:

- test_runtime_no_tools.py module docstring: the line describing
  test_run_no_tools_agent_unknown_agent_type still said "KeyError" but
  the test was updated to expect ValueError in the parent PR. Aligned.

- test_agent_pipeline_e2e.py::test_execute_enricher_e2e: after the
  hard asserts on parse_result.success + structured_output were added,
  the subsequent ``if/else`` branch was unreachable (the else's
  pytest.skip could never fire). Removed the dead branch and made the
  persistence checks unconditional — they're guaranteed to run after
  the asserts pass.

No behavior change; same passes (6/6 on test_runtime_no_tools; e2e
verified during the prior PR cycle).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from the recall-mode /code-review pass on the backfill PR:

1. executor.py — UnexpectedModelBehavior and ModelHTTPError __str__ embed
   the raw provider response body, which can echo prompt content and
   therefore any credentials the agent was working with. Logging exc
   directly at WARNING level leaked the body to log sinks. Log + raise
   with exc.message (bounded reason) instead of str(exc) (which is
   "{message}, body:\n{body}" for these classes). The raw body stays
   on the chained __cause__ for DEBUG-level investigation.

2. tests/e2e/conftest.py — 5 modules import opencode_client via
   "from cliff.engine.client import opencode_client", but the
   app_client fixture only rebound 4 (sessions, chat, config_manager,
   client). The /health route was using its captured port-4096 client
   instead of the e2e-isolated port-4097 one — test_health_e2e tests
   would silently hit the running daemon's OpenCode. Added rebinds for
   health, routes/settings, integrations/normalizer, and main. The
   ai/service.py importer is lazy (re-imports per call) so it picks
   up the source-module rebind automatically.

3. tests/e2e/conftest.py — hardcoded port 4097 had the same silent-share
   risk on 4097 as the original 4096 (if anything else is on 4097,
   OpenCodeProcess.start succeeds against whoever answers). Added a
   bind-test at conftest module-load with a clear RuntimeError + a
   CLIFF_OPENCODE_PORT override knob. Failure now happens at conftest
   load, not mid-test.

Refuted findings:
- GoogleProvider ADC risk — explicit api_key short-circuits ADC.
- Missing-continue in UnexpectedModelBehavior handler — Python excepts
  are mutually exclusive, no fall-through.
- gpt-4.1-nano invalidity — it IS in OpenAI's catalog.

Verified:
- ruff clean
- 47 unit (TestPaResolverWiring + 6-provider chain + runtime_no_tools
  + pipeline + rate_limit_backoff) — all pass
- 7 e2e (test_health_e2e + test_settings_e2e — the routes the rebind
  fix actually targets) — all pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/tests/e2e/conftest.py`:
- Around line 54-56: Compute the e2e availability flags (_opencode_available and
_api_key_set) first and only then call _verify_e2e_port_free(_E2E_OPENCODE_PORT)
and assign settings.opencode_port = _E2E_OPENCODE_PORT when those flags indicate
e2e tests will run; in other words move the port-check and the global mutation
behind the availability gate so _verify_e2e_port_free and settings.opencode_port
are not executed at import time unconditionally, and remove the now-duplicate
_opencode_available/_api_key_set definitions at the later lines referenced in
the review.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e301582c-7f06-4052-b69c-c5206e3c17e7

📥 Commits

Reviewing files that changed from the base of the PR and between 194891a and c7c8a22.

📒 Files selected for processing (4)
  • backend/cliff/agents/executor.py
  • backend/tests/agents/test_runtime_no_tools.py
  • backend/tests/e2e/conftest.py
  • backend/tests/e2e/test_agent_pipeline_e2e.py

Comment thread backend/tests/e2e/conftest.py Outdated
CodeRabbit (correctly) flagged that the import-time port bind-test +
``settings.opencode_port = 4097`` mutation ran unconditionally:

- A combined run like ``pytest backend/tests`` imports this conftest
  at collection. If port 4097 happens to be busy, the RuntimeError
  aborted the ENTIRE tree — even though e2e would have skipped for
  a missing OpenCode binary or API key.
- The settings.opencode_port mutation is process-global; non-e2e tests
  sharing the cliff.config.settings singleton in the same run
  observed the e2e port. Defeats the "e2e is skip-eligible" contract.

Fix: compute ``_opencode_available`` / ``_api_key_set`` first, then
gate ``_verify_e2e_port_free()`` + the settings mutation behind both
flags. The previous duplicate flag computation is folded into the
single source — the skip markers below read the same values.

Behavior matrix after fix:
- prereqs missing + port busy   → conftest imports clean, e2e skipped
- prereqs missing + port free   → conftest imports clean, e2e skipped
- prereqs present + port busy   → fail loud at conftest load (desired)
- prereqs present + port free   → port check + mutation, e2e runs

Verified: ruff clean; test_health_e2e (2/2) passes (gate fires in the
prereqs-present path, fresh client routes through port 4097 as before).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@galanko galanko merged commit 325de45 into main May 31, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant