chore(pa): backfill PR #221 — tests + review fixes that didn't push before merge by galanko · Pull Request #249 · cliff-security/cliff

galanko · 2026-05-31T18:51:17Z

Why this PR exists

The squash-merge of #221 captured the state of the branch at `c602567` (OpenAI probe fix) — the last commit that successfully reached origin. Four commits I made after that bounced on push auth and never landed on the PR branch, so the merge silently shipped without them. This PR backfills the lost work onto `main`.

These are not new ideas — every commit here was reviewed in the PR #221 discussion thread and already approved in principle. The fix here is mechanical: push-then-merge in a clean PR.

What's in this PR (cherry-picks, in order)

Commit	What
`de3d120`	Test A + Test B — `TestPaResolverWiring::test_run_pa_no_tools_resolves_env_and_model_per_call` locks down that the executor awaits both resolvers per call (no caching) + `test_executor_provider_chain.py` parametrized over all 6 providers, asserts `build_model` produces the right Model subclass + `model_name` for each. 7 new test cases.
`64021e7`	Review nits — `pipeline.py` stale "All 4" comment → count-independent; `executor.py` per-attempt-timeout comment in the rate-limit loop; `CHANGELOG.md` `[Unreleased] Fixed` entry for the OpenAI BYOK probe fix in #221.
`e13a406`	`/code-review` findings (5 of 6 actionable): `e2e` fixture wires both resolvers (was constructing executor without them — 3 e2e tests would have hit the resolver-not-wired guard); `owner_resolver` → `exposure_analyzer` assertion (PR #1 dropped owner_resolver from PIPELINE_ORDER); `raw_text` → `structured_output` assertion (PA returns parsed output directly, no prose); progress-callback test inverted to lock down the "no streaming for no-tools agents" behavior; `UnexpectedModelBehavior` corrective re-roll in `_run_pa_no_tools` (1 attempt, separate counter — PA's internal output_type retry doesn't fire for prose-only responses, this restores the OpenCode-era `_RETRY_PROMPT` safety net); `ContextDocument._evidence_section`/`_plan_section` promoted to public (`runtime/_prompts.py` was reaching into private API); `no_tools.py` raw `KeyError` on unknown agent_type → typed `ValueError` guard.
`194891a`	e2e OpenCode isolation — `settings.opencode_port = 4097` at conftest module-import time + rebind `config_manager.opencode_client`. The default `opencode_port=4096` was causing the e2e suite to silently share OpenCode with a running daemon (`OpenCodeProcess.start()` succeeds if anyone answers on that port). The model-update e2e test consequently flapped against whatever model the user had selected. With the dedicated port, the full e2e suite is 31/33 PASS / 2 docs-skipped / 0 FAIL.

Test plan

`uv run ruff check cliff/ tests/` — clean
`uv run pytest tests/test_executor.py tests/test_pipeline.py tests/agents/ -m "not e2e"` — 124 passed, 2 docs-skipped (~7m, slow because tests/agents/ includes the LLM-driven `test_plain_description_eval.py`)
e2e suite verified pre-merge in the previous PR session — 31/33 pass / 2 docs-skipped / 0 fail

Why these mattered enough to recover

The two new test files close the resolver-staleness invariant and the per-provider chain — same provider testing feat(agents): migrate six no-tools agents to Pydantic AI substrate #221's review thread explicitly asked for.
The `UnexpectedModelBehavior` retry restores parity with the OpenCode-era `_RETRY_PROMPT` for weak-model users (prose-only responses without it surface as an immediate `AgentProcessError`).
The e2e port isolation is the difference between an e2e suite that works and one that silently shares state with the local daemon — anyone running the e2e suite while a daemon is up will get bitten until this lands.
The ContextDocument rename promotes private API that `runtime/_prompts.py` already depends on — without it, a future refactor of ContextDocument silently breaks every PA agent prompt.

Risk

Low. Each cherry-pick applied cleanly, no conflicts. The runtime changes are additive (new retry branch, new guard, new tests). The renames are internal (`_evidence_section` had only two call sites, both updated in the same commit). No dependency, no migration, no API contract change.

Stacked PRs

PR #2 (`feat/pa-migrate-executor`) is queued behind this — should rebase onto main once this merges.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Restored OpenAI BYOK key saving.
- Clearer handling and messages when an unknown no-tools agent type is used.
Improvements
- One automatic retry for certain unparseable agent outputs to reduce run failures.
- Improved assembly of prior-context (evidence/plan) in prompts for more accurate agent context.
Documentation
- Changelog updated with the BYOK fix.

Adds the two tests the survey flagged as the highest-value gaps before the PA migration merges: 1. test_executor.py — TestPaResolverWiring asserts the executor awaits both ai_env_resolver and ai_model_resolver on EVERY _run_pa_no_tools call (not cached at __init__), and that build_model receives the per-call resolved values — prevents the silent staleness regression where a UI provider-switch wouldn't take effect until daemon restart. 2. tests/agents/test_executor_provider_chain.py — parametrized over the six advertised providers (openrouter / anthropic / openai / google / ollama / custom). Uses the REAL build_model factory and captures the Model that run_no_tools_agent actually receives. Asserts subclass + model_name per provider. Closes the handshake gap between the per-provider factory tests and the per-agent runtime tests, and doubles as a regression net for any future probe-shape bug like the gpt-5 max_tokens one fixed in c602567. 7 new test cases, ~0.5s combined runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three follow-ups from the PR review: - pipeline.py: drop the stale "All 4 pipeline sections" count from the fall-through comment. Owner_resolver was dropped from _PIPELINE_CHECKS in this PR, taking the count to 5; future drops would re-stale a fixed number. "All pipeline sections" is count-independent. - executor.py: note that ``timeout`` in the PA rate-limit retry loop is per-attempt, not a wall-clock cap on the whole agent run. Mirrors the OpenCode-era loop semantics and prevents the next reader from assuming a budget-cap that isn't there. - CHANGELOG.md: add an Unreleased "Fixed" entry for the OpenAI BYOK probe-model change in c602567 (gpt-5 → gpt-4o-mini). User-visible behavior change; deserved a release note. No code-behavior changes, no test changes; ruff + the 26 PA-touched tests (TestPaResolverWiring + provider-chain + test_pipeline) stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five fixes from the extra-high-recall code review pass: 1. tests/e2e/test_agent_pipeline_e2e.py — the executor fixture was constructing AgentExecutor(pool, context_builder) without the ai_env_resolver / ai_model_resolver this PR added; three tests would have hit the resolvers-not-wired guard at executor.py:1401 on the first PA call. Extract the resolver pair to module-level helpers shared by the pool and executor fixtures; e2e job now exercises the PA path end-to-end. 2. Same file, test_suggest_next_advances_after_enrichment — the assertion expected owner_resolver as the next step after enrichment. owner_resolver was dropped from PIPELINE_ORDER in PR #1, so the next step is now exposure_analyzer. Updated docstring + assertion to match the new 5-agent forward walk. 3. executor.py — added one-attempt corrective re-roll on UnexpectedModelBehavior. PA's internal output_type retry only fires on schema-shaped-but-invalid responses; for prose-only / empty / thinking-only responses (the OpenCode-era _RETRY_PROMPT target), PA raises immediately. One re-roll at the same prompt often produces a well-formed response on the next sample. Separate parse-retry counter so it doesn't share budget with rate-limit retries. 4. context_document.py — promote _evidence_section and _plan_section from private (single-underscore) to public. runtime/_prompts.py reaches into both methods; the private naming made any future ContextDocument refactor a silent breakage with no type-checker warning. Both names are now stable public API. 5. runtime/no_tools.py — replace raw _RUNTIME_BUILDERS[agent_type] dict access with a guarded .get() that raises ValueError("Unknown no-tools agent type: ...") if missing. Defense in depth: the executor.py:1391 gate normally protects, but a regression or a direct caller would otherwise surface a bare KeyError instead of the typed taxonomy. Test updated to expect ValueError. Tests: - ruff clean - 208 unit + integration pass (8 min runtime) - 5/5 e2e pipeline tests pass post-fix (was 3/5 pre-fix) - 1 pre-existing e2e failure surfaces (test_settings_e2e.py::test_update_model_and_verify — TestClient noop_lifespan reads installed-app model state instead of the in-memory fixture; file last touched in PR #9, not by this branch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test_update_model_and_verify was failing because the e2e session shared OpenCode state with the user's installed daemon: - ``settings.opencode_port`` defaults to 4096. The e2e session calls ``OpenCodeProcess()`` which tries to start an OpenCode subprocess on that port. If the daemon is already listening on 4096, ``OpenCodeProcess.start()`` succeeds as long as the port answers — even though the answerer is somebody else's process — and the e2e suite silently runs against the daemon's OpenCode. - ``opencode_client = OpenCodeClient()`` is constructed at module- import time with ``settings.opencode_url`` baked in. The ``config_manager`` import path binds its own reference to this singleton. ``app_client`` rebinds the ``sessions``/``chat`` module references with a fresh client but never rebound ``config_manager.opencode_client``, so even with the port override the settings/model route hit the daemon's OpenCode. Two-line fix: 1. Mutate ``settings.opencode_port = 4097`` at conftest module-import time so the session-scoped OpenCodeProcess starts on the dedicated port and the fresh client built in ``app_client`` picks it up. 2. Also rebind ``config_manager.opencode_client`` (and the source singleton in ``cliff.engine.client``) in ``app_client`` so the settings/model route hits the e2e-isolated OpenCode. Verification: - ``pytest tests/e2e/test_settings_e2e.py`` — 5/5 PASS (was 4/5) - ``pytest tests/e2e/`` — 31/33 PASS, 2 docs-skipped, 0 FAIL (was 28/33 + 2 skipped + 3 failed on the pre-this-session full run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-31T18:51:27Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 473887dc-494f-45c8-8c16-b700367bbd3c

📥 Commits

Reviewing files that changed from the base of the PR and between c7c8a22 and 53ed6a4.

📒 Files selected for processing (1)

backend/tests/e2e/conftest.py

📝 Walkthrough

Walkthrough

This PR hardens the Pydantic‑AI no‑tools agent path with a one-time re-roll for unparseable responses, promotes ContextDocument section helpers to public, improves error messages, and updates E2E fixtures and tests to match the new runtime behavior.

Changes

Pydantic-AI No-Tools Path Improvements

Layer / File(s)	Summary
Context document public API surface `backend/cliff/workspace/context_document.py`, `backend/cliff/agents/runtime/_prompts.py`	`ContextDocument._evidence_section()` and `_plan_section()` are renamed to public `evidence_section()` and `plan_section()`. The `generate()` method and `_prompts.py` caller are updated to use the public names.
No-tools runtime error handling `backend/cliff/agents/runtime/no_tools.py`, `backend/tests/agents/test_runtime_no_tools.py`	`run_no_tools_agent` now uses guarded `.get()` instead of direct dict indexing, raising a typed `ValueError` with registered builder keys when `agent_type` is unknown instead of letting `KeyError` propagate. Tests/docstring updated accordingly.
PA unparseable response retry mechanism `backend/cliff/agents/executor.py`, `backend/tests/agents/test_executor_provider_chain.py`	`AgentExecutor._run_pa_no_tools` adds a `parse_retries_used` counter and re-rolls once on `UnexpectedModelBehavior` when PA returns prose-only/empty outputs; raises `AgentProcessError` on the second occurrence. `ModelHTTPError` handling prefers `exc.message` when present. New provider-chain test verifies the concrete model instance and `model_name` passed to the agent.
Provider chain & unit tests `backend/tests/agents/test_executor_provider_chain.py`, `backend/tests/test_executor.py`	Adds provider-chain E2E test and `TestPaResolverWiring` unit test to ensure per-call AI env/model resolution and that `build_model` receives the latest resolved values.
E2E environment isolation and shared resolvers `backend/tests/e2e/conftest.py`, `backend/tests/e2e/test_agent_pipeline_e2e.py`	Conftest forces an isolated OpenCode port at import time and resets OpenCode client singletons across modules. Adds module-level async resolver helpers and wires them into `WorkspaceProcessPool` and `AgentExecutor`.
E2E assertion and pipeline updates `backend/tests/e2e/test_agent_pipeline_e2e.py`	Enricher assertions now check `parse_result.success` and `structured_output`. Pipeline advancement expects `exposure_analyzer` after enrichment. Progress callback test asserts `progress_texts` is empty.
Changelog and minor comment `CHANGELOG.md`, `backend/cliff/agents/pipeline.py`	Changelog documents OpenAI BYOK save fix (validator probe updated to `gpt-4o-mini`); minor comment wording generalized in `suggest_next()`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

cliff-security/cliff#221: The main PR's Pydantic-AI no-tools retry logic and error handling directly build on the same no-tools substrate introduced in this retrieved PR, which also includes the OpenAI gpt-4o-mini validator probe change.

Poem

🐰 I found a stray validator stuck on gpt-5,
I nudged the parser once — then tried it twice,
Sections shed their underscores and took a bow,
Tests hop to their own port and pass somehow,
I nibble logs and hum: the runtime's feeling nice.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the PR as backfilling tests and review fixes from PR `#221` that didn't push before merge, which aligns with the changeset containing new tests, CHANGELOG updates, and code-review refinements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/pa-pr1-backfill

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/tests/agents/test_runtime_no_tools.py (1)

12-12: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update docstring to reflect ValueError contract.

The comment still references "KeyError" but the test now expects ValueError per the updated no-tools runtime contract.

📝 Proposed fix

-* ``test_run_no_tools_agent_unknown_agent_type`` — defense-in-depth
-  KeyError for an agent type that isn't registered.
+* ``test_run_no_tools_agent_unknown_agent_type`` — defense-in-depth
+  ValueError for an agent type that isn't registered.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/agents/test_runtime_no_tools.py` at line 12, Update the test
docstring in backend/tests/agents/test_runtime_no_tools.py to mention ValueError
instead of KeyError so the documentation matches the test's contract; locate the
test function (e.g., the test in this file that asserts a ValueError for
unregistered agent types) and replace any "KeyError" wording in its docstring
with "ValueError" (ensure the docstring language aligns with the existing
assertion that expects ValueError).

backend/tests/e2e/test_agent_pipeline_e2e.py (1)

236-273: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Dead else: pytest.skip branch contradicts the new hard asserts.

Lines 240-246 now assert result.parse_result.success and a truthy structured_output. That makes the if at line 249 always true and the else (pytest.skip) at 267-273 unreachable. The skip was the old graceful path for non-structured LLM output; with the asserts it can never run. Verify persistence unconditionally and drop the dead branch.

♻️ Remove unreachable branch

-    # If structured output was parsed successfully, verify persistence
-    if result.parse_result.success and result.parse_result.structured_output:
-        # Context file should exist on disk
-        enrichment_file = ws_dir / "context" / "enrichment.json"
-        assert enrichment_file.exists(), "enrichment.json not written to disk"
-
-        enrichment_data = json.loads(enrichment_file.read_text())
-        assert isinstance(enrichment_data, dict)
-
-        # CONTEXT.md should have been regenerated
-        context_md = ws_dir / "CONTEXT.md"
-        assert context_md.exists()
-        context_text = context_md.read_text()
-        assert "enrichment" in context_text.lower() or "CVE" in context_text
-
-        # Sidebar should have been updated
-        assert result.sidebar_updated is True
-        assert result.context_version is not None
-        assert result.context_version > 0
-    else:
-        # Even if parse failed, we should still have completed successfully
-        # (the LLM responded, just not in the expected format)
-        pytest.skip(
-            f"LLM responded but output wasn't structured JSON: "
-            f"{result.parse_result.error}"
-        )
+    # success / structured_output are asserted above — verify persistence.
+    enrichment_file = ws_dir / "context" / "enrichment.json"
+    assert enrichment_file.exists(), "enrichment.json not written to disk"
+
+    enrichment_data = json.loads(enrichment_file.read_text())
+    assert isinstance(enrichment_data, dict)
+
+    # CONTEXT.md should have been regenerated
+    context_md = ws_dir / "CONTEXT.md"
+    assert context_md.exists()
+    context_text = context_md.read_text()
+    assert "enrichment" in context_text.lower() or "CVE" in context_text
+
+    # Sidebar should have been updated
+    assert result.sidebar_updated is True
+    assert result.context_version is not None
+    assert result.context_version > 0

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/tests/e2e/test_agent_pipeline_e2e.py` around lines 236 - 273, The
current test makes hard assertions on result.parse_result.success and
result.parse_result.structured_output, which renders the later "else:
pytest.skip" branch unreachable; remove the dead else branch (the pytest.skip
block) and unconditionally perform the persistence checks (existence and
contents of enrichment_file, CONTEXT.md checks, and sidebar/context_version
assertions) after the asserts; keep references to result.parse_result,
enrichment_file (ws_dir / "context" / "enrichment.json"), context_md (ws_dir /
"CONTEXT.md"), and the sidebar/context_version assertions so the persistence
verification runs always when the earlier asserts pass.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@backend/tests/agents/test_runtime_no_tools.py`:
- Line 12: Update the test docstring in
backend/tests/agents/test_runtime_no_tools.py to mention ValueError instead of
KeyError so the documentation matches the test's contract; locate the test
function (e.g., the test in this file that asserts a ValueError for unregistered
agent types) and replace any "KeyError" wording in its docstring with
"ValueError" (ensure the docstring language aligns with the existing assertion
that expects ValueError).

In `@backend/tests/e2e/test_agent_pipeline_e2e.py`:
- Around line 236-273: The current test makes hard assertions on
result.parse_result.success and result.parse_result.structured_output, which
renders the later "else: pytest.skip" branch unreachable; remove the dead else
branch (the pytest.skip block) and unconditionally perform the persistence
checks (existence and contents of enrichment_file, CONTEXT.md checks, and
sidebar/context_version assertions) after the asserts; keep references to
result.parse_result, enrichment_file (ws_dir / "context" / "enrichment.json"),
context_md (ws_dir / "CONTEXT.md"), and the sidebar/context_version assertions
so the persistence verification runs always when the earlier asserts pass.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 2ad5b25f-c1ec-41ea-8ca6-9bc2e5121371

📥 Commits

Reviewing files that changed from the base of the PR and between 6e29fd7 and 194891a.

📒 Files selected for processing (11)

CHANGELOG.md
backend/cliff/agents/executor.py
backend/cliff/agents/pipeline.py
backend/cliff/agents/runtime/_prompts.py
backend/cliff/agents/runtime/no_tools.py
backend/cliff/workspace/context_document.py
backend/tests/agents/test_executor_provider_chain.py
backend/tests/agents/test_runtime_no_tools.py
backend/tests/e2e/conftest.py
backend/tests/e2e/test_agent_pipeline_e2e.py
backend/tests/test_executor.py

Two quick wins from the CodeRabbit review: - test_runtime_no_tools.py module docstring: the line describing test_run_no_tools_agent_unknown_agent_type still said "KeyError" but the test was updated to expect ValueError in the parent PR. Aligned. - test_agent_pipeline_e2e.py::test_execute_enricher_e2e: after the hard asserts on parse_result.success + structured_output were added, the subsequent ``if/else`` branch was unreachable (the else's pytest.skip could never fire). Removed the dead branch and made the persistence checks unconditional — they're guaranteed to run after the asserts pass. No behavior change; same passes (6/6 on test_runtime_no_tools; e2e verified during the prior PR cycle). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three findings from the recall-mode /code-review pass on the backfill PR: 1. executor.py — UnexpectedModelBehavior and ModelHTTPError __str__ embed the raw provider response body, which can echo prompt content and therefore any credentials the agent was working with. Logging exc directly at WARNING level leaked the body to log sinks. Log + raise with exc.message (bounded reason) instead of str(exc) (which is "{message}, body:\n{body}" for these classes). The raw body stays on the chained __cause__ for DEBUG-level investigation. 2. tests/e2e/conftest.py — 5 modules import opencode_client via "from cliff.engine.client import opencode_client", but the app_client fixture only rebound 4 (sessions, chat, config_manager, client). The /health route was using its captured port-4096 client instead of the e2e-isolated port-4097 one — test_health_e2e tests would silently hit the running daemon's OpenCode. Added rebinds for health, routes/settings, integrations/normalizer, and main. The ai/service.py importer is lazy (re-imports per call) so it picks up the source-module rebind automatically. 3. tests/e2e/conftest.py — hardcoded port 4097 had the same silent-share risk on 4097 as the original 4096 (if anything else is on 4097, OpenCodeProcess.start succeeds against whoever answers). Added a bind-test at conftest module-load with a clear RuntimeError + a CLIFF_OPENCODE_PORT override knob. Failure now happens at conftest load, not mid-test. Refuted findings: - GoogleProvider ADC risk — explicit api_key short-circuits ADC. - Missing-continue in UnexpectedModelBehavior handler — Python excepts are mutually exclusive, no fall-through. - gpt-4.1-nano invalidity — it IS in OpenAI's catalog. Verified: - ruff clean - 47 unit (TestPaResolverWiring + 6-provider chain + runtime_no_tools + pipeline + rate_limit_backoff) — all pass - 7 e2e (test_health_e2e + test_settings_e2e — the routes the rebind fix actually targets) — all pass Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/tests/e2e/conftest.py`:
- Around line 54-56: Compute the e2e availability flags (_opencode_available and
_api_key_set) first and only then call _verify_e2e_port_free(_E2E_OPENCODE_PORT)
and assign settings.opencode_port = _E2E_OPENCODE_PORT when those flags indicate
e2e tests will run; in other words move the port-check and the global mutation
behind the availability gate so _verify_e2e_port_free and settings.opencode_port
are not executed at import time unconditionally, and remove the now-duplicate
_opencode_available/_api_key_set definitions at the later lines referenced in
the review.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e301582c-7f06-4052-b69c-c5206e3c17e7

📥 Commits

Reviewing files that changed from the base of the PR and between 194891a and c7c8a22.

📒 Files selected for processing (4)

backend/cliff/agents/executor.py
backend/tests/agents/test_runtime_no_tools.py
backend/tests/e2e/conftest.py
backend/tests/e2e/test_agent_pipeline_e2e.py

CodeRabbit (correctly) flagged that the import-time port bind-test + ``settings.opencode_port = 4097`` mutation ran unconditionally: - A combined run like ``pytest backend/tests`` imports this conftest at collection. If port 4097 happens to be busy, the RuntimeError aborted the ENTIRE tree — even though e2e would have skipped for a missing OpenCode binary or API key. - The settings.opencode_port mutation is process-global; non-e2e tests sharing the cliff.config.settings singleton in the same run observed the e2e port. Defeats the "e2e is skip-eligible" contract. Fix: compute ``_opencode_available`` / ``_api_key_set`` first, then gate ``_verify_e2e_port_free()`` + the settings mutation behind both flags. The previous duplicate flag computation is folded into the single source — the skip markers below read the same values. Behavior matrix after fix: - prereqs missing + port busy → conftest imports clean, e2e skipped - prereqs missing + port free → conftest imports clean, e2e skipped - prereqs present + port busy → fail loud at conftest load (desired) - prereqs present + port free → port check + mutation, e2e runs Verified: ruff clean; test_health_e2e (2/2) passes (gate fires in the prereqs-present path, fresh client routes through port 4097 as before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

galanko and others added 4 commits May 31, 2026 21:07

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

galanko and others added 2 commits May 31, 2026 22:03

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

Comment thread backend/tests/e2e/conftest.py Outdated

galanko merged commit 325de45 into main May 31, 2026
7 checks passed

galanko mentioned this pull request Jun 1, 2026

feat(agents): migrate the remediation_executor to Pydantic AI (PR #2 of 3) #251

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pa): backfill PR #221 — tests + review fixes that didn't push before merge#249

chore(pa): backfill PR #221 — tests + review fixes that didn't push before merge#249
galanko merged 7 commits into
mainfrom
chore/pa-pr1-backfill

galanko commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

galanko commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this PR exists

What's in this PR (cherry-picks, in order)

Test plan

Why these mattered enough to recover

Risk

Stacked PRs

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

galanko commented May 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading