chore(pa): backfill PR #221 — tests + review fixes that didn't push before merge#249
Conversation
Adds the two tests the survey flagged as the highest-value gaps before the PA migration merges: 1. test_executor.py — TestPaResolverWiring asserts the executor awaits both ai_env_resolver and ai_model_resolver on EVERY _run_pa_no_tools call (not cached at __init__), and that build_model receives the per-call resolved values — prevents the silent staleness regression where a UI provider-switch wouldn't take effect until daemon restart. 2. tests/agents/test_executor_provider_chain.py — parametrized over the six advertised providers (openrouter / anthropic / openai / google / ollama / custom). Uses the REAL build_model factory and captures the Model that run_no_tools_agent actually receives. Asserts subclass + model_name per provider. Closes the handshake gap between the per-provider factory tests and the per-agent runtime tests, and doubles as a regression net for any future probe-shape bug like the gpt-5 max_tokens one fixed in c602567. 7 new test cases, ~0.5s combined runtime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from the PR review: - pipeline.py: drop the stale "All 4 pipeline sections" count from the fall-through comment. Owner_resolver was dropped from _PIPELINE_CHECKS in this PR, taking the count to 5; future drops would re-stale a fixed number. "All pipeline sections" is count-independent. - executor.py: note that ``timeout`` in the PA rate-limit retry loop is per-attempt, not a wall-clock cap on the whole agent run. Mirrors the OpenCode-era loop semantics and prevents the next reader from assuming a budget-cap that isn't there. - CHANGELOG.md: add an Unreleased "Fixed" entry for the OpenAI BYOK probe-model change in c602567 (gpt-5 → gpt-4o-mini). User-visible behavior change; deserved a release note. No code-behavior changes, no test changes; ruff + the 26 PA-touched tests (TestPaResolverWiring + provider-chain + test_pipeline) stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five fixes from the extra-high-recall code review pass: 1. tests/e2e/test_agent_pipeline_e2e.py — the executor fixture was constructing AgentExecutor(pool, context_builder) without the ai_env_resolver / ai_model_resolver this PR added; three tests would have hit the resolvers-not-wired guard at executor.py:1401 on the first PA call. Extract the resolver pair to module-level helpers shared by the pool and executor fixtures; e2e job now exercises the PA path end-to-end. 2. Same file, test_suggest_next_advances_after_enrichment — the assertion expected owner_resolver as the next step after enrichment. owner_resolver was dropped from PIPELINE_ORDER in PR #1, so the next step is now exposure_analyzer. Updated docstring + assertion to match the new 5-agent forward walk. 3. executor.py — added one-attempt corrective re-roll on UnexpectedModelBehavior. PA's internal output_type retry only fires on schema-shaped-but-invalid responses; for prose-only / empty / thinking-only responses (the OpenCode-era _RETRY_PROMPT target), PA raises immediately. One re-roll at the same prompt often produces a well-formed response on the next sample. Separate parse-retry counter so it doesn't share budget with rate-limit retries. 4. context_document.py — promote _evidence_section and _plan_section from private (single-underscore) to public. runtime/_prompts.py reaches into both methods; the private naming made any future ContextDocument refactor a silent breakage with no type-checker warning. Both names are now stable public API. 5. runtime/no_tools.py — replace raw _RUNTIME_BUILDERS[agent_type] dict access with a guarded .get() that raises ValueError("Unknown no-tools agent type: ...") if missing. Defense in depth: the executor.py:1391 gate normally protects, but a regression or a direct caller would otherwise surface a bare KeyError instead of the typed taxonomy. Test updated to expect ValueError. Tests: - ruff clean - 208 unit + integration pass (8 min runtime) - 5/5 e2e pipeline tests pass post-fix (was 3/5 pre-fix) - 1 pre-existing e2e failure surfaces (test_settings_e2e.py::test_update_model_and_verify — TestClient noop_lifespan reads installed-app model state instead of the in-memory fixture; file last touched in PR #9, not by this branch) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_update_model_and_verify was failing because the e2e session shared OpenCode state with the user's installed daemon: - ``settings.opencode_port`` defaults to 4096. The e2e session calls ``OpenCodeProcess()`` which tries to start an OpenCode subprocess on that port. If the daemon is already listening on 4096, ``OpenCodeProcess.start()`` succeeds as long as the port answers — even though the answerer is somebody else's process — and the e2e suite silently runs against the daemon's OpenCode. - ``opencode_client = OpenCodeClient()`` is constructed at module- import time with ``settings.opencode_url`` baked in. The ``config_manager`` import path binds its own reference to this singleton. ``app_client`` rebinds the ``sessions``/``chat`` module references with a fresh client but never rebound ``config_manager.opencode_client``, so even with the port override the settings/model route hit the daemon's OpenCode. Two-line fix: 1. Mutate ``settings.opencode_port = 4097`` at conftest module-import time so the session-scoped OpenCodeProcess starts on the dedicated port and the fresh client built in ``app_client`` picks it up. 2. Also rebind ``config_manager.opencode_client`` (and the source singleton in ``cliff.engine.client``) in ``app_client`` so the settings/model route hits the e2e-isolated OpenCode. Verification: - ``pytest tests/e2e/test_settings_e2e.py`` — 5/5 PASS (was 4/5) - ``pytest tests/e2e/`` — 31/33 PASS, 2 docs-skipped, 0 FAIL (was 28/33 + 2 skipped + 3 failed on the pre-this-session full run) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR hardens the Pydantic‑AI no‑tools agent path with a one-time re-roll for unparseable responses, promotes ContextDocument section helpers to public, improves error messages, and updates E2E fixtures and tests to match the new runtime behavior. ChangesPydantic-AI No-Tools Path Improvements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/tests/agents/test_runtime_no_tools.py (1)
12-12:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate docstring to reflect ValueError contract.
The comment still references "KeyError" but the test now expects
ValueErrorper the updated no-tools runtime contract.📝 Proposed fix
-* ``test_run_no_tools_agent_unknown_agent_type`` — defense-in-depth - KeyError for an agent type that isn't registered. +* ``test_run_no_tools_agent_unknown_agent_type`` — defense-in-depth + ValueError for an agent type that isn't registered.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/tests/agents/test_runtime_no_tools.py` at line 12, Update the test docstring in backend/tests/agents/test_runtime_no_tools.py to mention ValueError instead of KeyError so the documentation matches the test's contract; locate the test function (e.g., the test in this file that asserts a ValueError for unregistered agent types) and replace any "KeyError" wording in its docstring with "ValueError" (ensure the docstring language aligns with the existing assertion that expects ValueError).backend/tests/e2e/test_agent_pipeline_e2e.py (1)
236-273: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winDead
else: pytest.skipbranch contradicts the new hard asserts.Lines 240-246 now assert
result.parse_result.successand a truthystructured_output. That makes theifat line 249 always true and theelse(pytest.skip) at 267-273 unreachable. The skip was the old graceful path for non-structured LLM output; with the asserts it can never run. Verify persistence unconditionally and drop the dead branch.♻️ Remove unreachable branch
- # If structured output was parsed successfully, verify persistence - if result.parse_result.success and result.parse_result.structured_output: - # Context file should exist on disk - enrichment_file = ws_dir / "context" / "enrichment.json" - assert enrichment_file.exists(), "enrichment.json not written to disk" - - enrichment_data = json.loads(enrichment_file.read_text()) - assert isinstance(enrichment_data, dict) - - # CONTEXT.md should have been regenerated - context_md = ws_dir / "CONTEXT.md" - assert context_md.exists() - context_text = context_md.read_text() - assert "enrichment" in context_text.lower() or "CVE" in context_text - - # Sidebar should have been updated - assert result.sidebar_updated is True - assert result.context_version is not None - assert result.context_version > 0 - else: - # Even if parse failed, we should still have completed successfully - # (the LLM responded, just not in the expected format) - pytest.skip( - f"LLM responded but output wasn't structured JSON: " - f"{result.parse_result.error}" - ) + # success / structured_output are asserted above — verify persistence. + enrichment_file = ws_dir / "context" / "enrichment.json" + assert enrichment_file.exists(), "enrichment.json not written to disk" + + enrichment_data = json.loads(enrichment_file.read_text()) + assert isinstance(enrichment_data, dict) + + # CONTEXT.md should have been regenerated + context_md = ws_dir / "CONTEXT.md" + assert context_md.exists() + context_text = context_md.read_text() + assert "enrichment" in context_text.lower() or "CVE" in context_text + + # Sidebar should have been updated + assert result.sidebar_updated is True + assert result.context_version is not None + assert result.context_version > 0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/tests/e2e/test_agent_pipeline_e2e.py` around lines 236 - 273, The current test makes hard assertions on result.parse_result.success and result.parse_result.structured_output, which renders the later "else: pytest.skip" branch unreachable; remove the dead else branch (the pytest.skip block) and unconditionally perform the persistence checks (existence and contents of enrichment_file, CONTEXT.md checks, and sidebar/context_version assertions) after the asserts; keep references to result.parse_result, enrichment_file (ws_dir / "context" / "enrichment.json"), context_md (ws_dir / "CONTEXT.md"), and the sidebar/context_version assertions so the persistence verification runs always when the earlier asserts pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@backend/tests/agents/test_runtime_no_tools.py`:
- Line 12: Update the test docstring in
backend/tests/agents/test_runtime_no_tools.py to mention ValueError instead of
KeyError so the documentation matches the test's contract; locate the test
function (e.g., the test in this file that asserts a ValueError for unregistered
agent types) and replace any "KeyError" wording in its docstring with
"ValueError" (ensure the docstring language aligns with the existing assertion
that expects ValueError).
In `@backend/tests/e2e/test_agent_pipeline_e2e.py`:
- Around line 236-273: The current test makes hard assertions on
result.parse_result.success and result.parse_result.structured_output, which
renders the later "else: pytest.skip" branch unreachable; remove the dead else
branch (the pytest.skip block) and unconditionally perform the persistence
checks (existence and contents of enrichment_file, CONTEXT.md checks, and
sidebar/context_version assertions) after the asserts; keep references to
result.parse_result, enrichment_file (ws_dir / "context" / "enrichment.json"),
context_md (ws_dir / "CONTEXT.md"), and the sidebar/context_version assertions
so the persistence verification runs always when the earlier asserts pass.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 2ad5b25f-c1ec-41ea-8ca6-9bc2e5121371
📒 Files selected for processing (11)
CHANGELOG.mdbackend/cliff/agents/executor.pybackend/cliff/agents/pipeline.pybackend/cliff/agents/runtime/_prompts.pybackend/cliff/agents/runtime/no_tools.pybackend/cliff/workspace/context_document.pybackend/tests/agents/test_executor_provider_chain.pybackend/tests/agents/test_runtime_no_tools.pybackend/tests/e2e/conftest.pybackend/tests/e2e/test_agent_pipeline_e2e.pybackend/tests/test_executor.py
Two quick wins from the CodeRabbit review: - test_runtime_no_tools.py module docstring: the line describing test_run_no_tools_agent_unknown_agent_type still said "KeyError" but the test was updated to expect ValueError in the parent PR. Aligned. - test_agent_pipeline_e2e.py::test_execute_enricher_e2e: after the hard asserts on parse_result.success + structured_output were added, the subsequent ``if/else`` branch was unreachable (the else's pytest.skip could never fire). Removed the dead branch and made the persistence checks unconditional — they're guaranteed to run after the asserts pass. No behavior change; same passes (6/6 on test_runtime_no_tools; e2e verified during the prior PR cycle). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from the recall-mode /code-review pass on the backfill PR:
1. executor.py — UnexpectedModelBehavior and ModelHTTPError __str__ embed
the raw provider response body, which can echo prompt content and
therefore any credentials the agent was working with. Logging exc
directly at WARNING level leaked the body to log sinks. Log + raise
with exc.message (bounded reason) instead of str(exc) (which is
"{message}, body:\n{body}" for these classes). The raw body stays
on the chained __cause__ for DEBUG-level investigation.
2. tests/e2e/conftest.py — 5 modules import opencode_client via
"from cliff.engine.client import opencode_client", but the
app_client fixture only rebound 4 (sessions, chat, config_manager,
client). The /health route was using its captured port-4096 client
instead of the e2e-isolated port-4097 one — test_health_e2e tests
would silently hit the running daemon's OpenCode. Added rebinds for
health, routes/settings, integrations/normalizer, and main. The
ai/service.py importer is lazy (re-imports per call) so it picks
up the source-module rebind automatically.
3. tests/e2e/conftest.py — hardcoded port 4097 had the same silent-share
risk on 4097 as the original 4096 (if anything else is on 4097,
OpenCodeProcess.start succeeds against whoever answers). Added a
bind-test at conftest module-load with a clear RuntimeError + a
CLIFF_OPENCODE_PORT override knob. Failure now happens at conftest
load, not mid-test.
Refuted findings:
- GoogleProvider ADC risk — explicit api_key short-circuits ADC.
- Missing-continue in UnexpectedModelBehavior handler — Python excepts
are mutually exclusive, no fall-through.
- gpt-4.1-nano invalidity — it IS in OpenAI's catalog.
Verified:
- ruff clean
- 47 unit (TestPaResolverWiring + 6-provider chain + runtime_no_tools
+ pipeline + rate_limit_backoff) — all pass
- 7 e2e (test_health_e2e + test_settings_e2e — the routes the rebind
fix actually targets) — all pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/tests/e2e/conftest.py`:
- Around line 54-56: Compute the e2e availability flags (_opencode_available and
_api_key_set) first and only then call _verify_e2e_port_free(_E2E_OPENCODE_PORT)
and assign settings.opencode_port = _E2E_OPENCODE_PORT when those flags indicate
e2e tests will run; in other words move the port-check and the global mutation
behind the availability gate so _verify_e2e_port_free and settings.opencode_port
are not executed at import time unconditionally, and remove the now-duplicate
_opencode_available/_api_key_set definitions at the later lines referenced in
the review.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: e301582c-7f06-4052-b69c-c5206e3c17e7
📒 Files selected for processing (4)
backend/cliff/agents/executor.pybackend/tests/agents/test_runtime_no_tools.pybackend/tests/e2e/conftest.pybackend/tests/e2e/test_agent_pipeline_e2e.py
CodeRabbit (correctly) flagged that the import-time port bind-test + ``settings.opencode_port = 4097`` mutation ran unconditionally: - A combined run like ``pytest backend/tests`` imports this conftest at collection. If port 4097 happens to be busy, the RuntimeError aborted the ENTIRE tree — even though e2e would have skipped for a missing OpenCode binary or API key. - The settings.opencode_port mutation is process-global; non-e2e tests sharing the cliff.config.settings singleton in the same run observed the e2e port. Defeats the "e2e is skip-eligible" contract. Fix: compute ``_opencode_available`` / ``_api_key_set`` first, then gate ``_verify_e2e_port_free()`` + the settings mutation behind both flags. The previous duplicate flag computation is folded into the single source — the skip markers below read the same values. Behavior matrix after fix: - prereqs missing + port busy → conftest imports clean, e2e skipped - prereqs missing + port free → conftest imports clean, e2e skipped - prereqs present + port busy → fail loud at conftest load (desired) - prereqs present + port free → port check + mutation, e2e runs Verified: ruff clean; test_health_e2e (2/2) passes (gate fires in the prereqs-present path, fresh client routes through port 4097 as before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why this PR exists
The squash-merge of #221 captured the state of the branch at `c602567` (OpenAI probe fix) — the last commit that successfully reached origin. Four commits I made after that bounced on push auth and never landed on the PR branch, so the merge silently shipped without them. This PR backfills the lost work onto `main`.
These are not new ideas — every commit here was reviewed in the PR #221 discussion thread and already approved in principle. The fix here is mechanical: push-then-merge in a clean PR.
What's in this PR (cherry-picks, in order)
Test plan
Why these mattered enough to recover
Risk
Low. Each cherry-pick applied cleanly, no conflicts. The runtime changes are additive (new retry branch, new guard, new tests). The renames are internal (`_evidence_section` had only two call sites, both updated in the same commit). No dependency, no migration, no API contract change.
Stacked PRs
PR #2 (`feat/pa-migrate-executor`) is queued behind this — should rebase onto main once this merges.
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Improvements
Documentation