feat(agents): migrate the remediation_executor to Pydantic AI (PR #2 of 3)#251
Conversation
First slice of PR #2 (ADR-0047 / IMPL-0022) — the five in-process tool functions that replace OpenCode's tool dispatch for the remediation_executor, plus the permission tiering they enforce. New package backend/cliff/agents/runtime/tools/: - permissions.py — classify_tool_request + _CATASTROPHIC_BASH + _GATED_BASH + _is_pipe_to_shell ported VERBATIM from the OpenCode-era executor classifier (same patterns, same tiers). gate_tool_call is the new thin layer that maps a tier to PA's HITL vocabulary: deny → raise ValueError (deterministic, model pivots), ask → raise ApprovalRequired until ctx.tool_call_approved, auto → proceed. - bash.py — async subprocess via asyncio.to_thread (non-blocking), cwd=workspace_dir, 180s per-command timeout, output trimmed to last 200 lines. - edit.py — classifier gate + a second resolved-path containment check against workspace_dir (catches symlink / normalized-separator escapes the textual check misses). - read.py — auto tier, 50 KB cap with truncation marker. - webfetch.py — auto tier, 30s timeout, text/* + application/json allowlist, 50 KB cap. SSRF posture documented (single-tenant CE). - gh.py — thin wrapper that prepends "gh " and delegates to bash so the token stays out of the model-visible command string; GH_TOKEN flows via deps.env_vars. The OpenCode classifier copy in executor.py stays until PR2.E deletes it; both are identical and both are tested. Tests (44): - test_permissions.py — the full classifier table ported verbatim from the OpenCode-era TestClassifyToolRequest, plus gate behavior (deny→ValueError even when approved, ask→ApprovalRequired, ask-approved→proceeds). - test_tools.py — functional coverage of all five tools against a tmp_path workspace (run/trim/deny/ask/approve for bash; relative write + absolute/traversal/symlink-escape gating for edit; cap + missing-file for read; content-type allowlist for webfetch via httpx_mock; token-missing hint + bash delegation for gh). ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second slice of PR #2 (ADR-0047 / IMPL-0022): the tool-using remediation_executor agent definition, plus the MCP-config adapter and the runtime fixes that registering tools on a real Agent surfaced. New: - runtime/remediation_executor.py — build_agent(model, mcp_toolsets=()) returns Agent[WorkspaceDeps, RemediationExecutorOutput | DeferredToolRequests], registering the five tools. SYSTEM_PROMPT is lifted from the OpenCode-era Jinja template — workspace-safety block and all seven hard rules preserved verbatim (each = a real shipped regression); the dynamic finding/plan context now arrives via the shared build_user_prompt (as the no-tools agents consume it); the JSON output-contract block is gone (output_type enforces the schema). The union output_type is what lets a gated tool's ApprovalRequired pause the run with a DeferredToolRequests output instead of erroring. - runtime/tools/mcp.py — build_mcp_toolsets converts Cliff's resolved OpenCode-shaped MCP configs ({type, command, environment} / {type, url}) into PA MCPServerStdio / MCPServerStreamableHTTP toolsets, tool_prefix'd by integration id, malformed entries skipped not fatal. Two runtime discoveries from wiring tools onto an Agent: 1. PA introspects each tool's type hints via get_type_hints at registration. With `from __future__ import annotations` the RunContext[WorkspaceDeps] annotation is a string that must resolve against module globals — so RunContext + WorkspaceDeps move from TYPE_CHECKING to runtime imports in all five tool modules. Scoped ruff per-file-ignore (TC001/TC002) for runtime/tools/*, matching the existing cliff/models/*.py precedent. 2. deny tier raises ModelRetry, not ValueError. IMPL-0022 named ValueError, but PA propagates a raw tool ValueError (crashing the run) whereas ModelRetry is converted to a RetryPromptPart the model sees and pivots on — which is exactly ADR-0047's stated intent ("the model gets a deterministic error and pivots"). Verified in PA's tool_manager.py. A model that keeps retrying a denied command hits PA's per-tool retry cap; no infinite loop. The verbatim classifier (auto/ask/deny classification) is unchanged — only the tier→exception translation differs. Flagged for an IMPL-0022 text update. MCPServerStdio/StreamableHTTP are deprecated in PA 1.98 in favour of MCPToolset, but MCPToolset's arbitrary-command form needs fastmcp (not a dependency on the pinned line) — so the deprecated-but-working classes are the only ones that express our `npx …` configs. Deprecation suppressed at the construction site with a documented reason; revisit at the PA v2 bump (tracked with the PR #3 config rename). Tests (55 new): test_remediation_executor.py drives the agent with TestModel (normal completion → RemediationExecutorOutput) and FunctionModel (gated `rm -rf` → DeferredToolRequests with the pending approval + command metadata; catastrophic `sudo` → ModelRetry → model pivots, no defer). test_mcp.py covers the adapter (local→stdio, remote→http, disabled/malformed/unknown skipped). Full agents suite: 98 passed. ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…C/1) Durable storage for the PA DeferredToolRequests resume cycle, ahead of the executor dispatch + resume wiring: - migration 023 — adds nullable agent_run.pa_message_history TEXT. When the remediation_executor calls a gated tool the run pauses with a DeferredToolRequests output; resuming needs the full conversation fed back via agent.run(message_history=...). Storing PA's all_messages_json() in the DB (not in-memory / not a workspace file) keeps it atomic with the existing permission_request marker and survives a daemon restart mid-approval — same durability the marker got in migration 022. - AgentRunUpdate.pa_message_history field so update_agent_run writes it; deliberately NOT on the API-facing AgentRun model (large internal blob). - repo_agent_run.get_pa_message_history(run_id) — targeted reader for the resume path. - reconcile_orphaned_agent_runs also NULLs pa_message_history so a restart mid-pause can't leave stale resume state (mirrors how it already clears permission_pending/permission_request). Verified: migration applies + column present; 84 model/db/migration tests pass; ruff clean. Dispatch + _run_pa_executor + resume API land in the next commit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….C/2) Routes the remediation_executor off OpenCode onto the in-process Pydantic AI agent + the durable DeferredToolRequests park/resume cycle. Executor (executor.py): - _finalize_run: extracted from execute()'s post-parse+persist tail (ref-verify, evidence-guard, PR-verify, context/sidebar/advance, run-update) so the initial run AND the resume path reach identical finalization — one source of truth. Pure refactor, verified no-op against the existing suite before the new code landed. - _resolve_active_model + _run_pa_call: shared helpers (resolver gather → build_model; and the rate-limit/timeout/UnexpectedModelBehavior translation loop) used by both the no-tools path and the executor — removes ~145 lines of duplication. - _run_pa_executor: builds the executor agent (build_agent + resolved MCP toolsets), runs it via _run_pa_call. A DeferredToolRequests output → _PaExecutorOutcome carrying the permission marker + all_messages_json(); a completed output → a ParseResult shaped like the no-tools path. - _park_for_permission: persists the marker (permission_pending=1 + permission_request) + pa_message_history, returns the new ``awaiting_permission`` AgentExecutionResult status (run row stays ``running`` so issue_derivation routes to "Needs you"). - resume_executor: reloads deps, deserializes the message history, re-enters agent.run with DeferredToolResults (ToolApproved/ToolDenied per pending tool_call_id), then _finalize_run (or re-park). - execute() dispatch: remediation_executor → _run_pa_executor; the old OpenCode tool-agent block is gone (its orphaned plumbing — _send_and_collect, the permission queue — is deleted in PR2.E). pipeline.py: run_pipeline also stops on ``awaiting_permission`` (a non-terminal pause, not a failure). route (agent_execution.py): POST .../permission now resumes via executor.resume_executor in a background task and returns immediately; the frontend polls agent-runs for the outcome (no SSE — PR2.D deletes it). tools: bash/edit gate metadata now carries ``patterns`` so the persisted marker matches what the frontend approval prompt renders (tool + patterns). Tests: - test_deferred_tools_persist.py (new): real FunctionModel park→resume cycle — gated bash pauses with a marker + history, resume(ToolApproved) completes, marker cleared; resume-without-marker raises. - test_executor.py: keeper tests retargeted to the PA path via a shared _executor_with_pa helper; OpenCode-machinery tests (pool-start, SSE error/progress, parse-retry, asyncio.Event approve) skipped with PR2.E reasons (same staged pattern PR #1 used). - test_rate_limit_backoff.py: rewritten to exercise the shared _run_pa_call retry via a flaky run_no_tools_agent raising ModelHTTPError(429). The OpenCode string-classification corpus tests are retired — PA detects rate-limit structurally (status_code==429), not by matching error-message text. Jitter unit test preserved. - test_permission_flow.py: 2 OpenCode approve/deny flow tests skipped (covered by test_deferred_tools_persist.py); the derivation-routing test stays. - test_routes_agent_execution.py: TestPermissionEndpoint retargeted to the background resume_executor flow. Full slice-C suite: 311 passed, 10 skipped, 0 failed; ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ery (PR2.D+E)
With the executor on Pydantic AI + a durable DeferredToolRequests marker
(PR2.C), the whole push-based progress/approval channel is dead weight —
the frontend already polls agent-runs (3s active / 5s idle) and renders
the permission prompt + activity feed from that. ADR-0047's "much simpler
mechanism": executor writes the marker → frontend polls → renders.
PR2.D — delete the frontend SSE:
- backend: removed GET /workspaces/{id}/agent-execution/stream + its
event_generator; dropped the on_permission=push_permission_event wiring
from execute_agent + run_all.
- frontend: removed the EventSource useEffect from IssueSidePanel (and the
now-unused useQueryClient/api imports) and api.streamAgentExecution.
The permission prompt + feed render entirely from the polled agent-runs
query — the SSE was only ever a refetch nudge, carried no data, and
streamed no tokens.
- tests: deleted IssueSidePanel.sse-progress.test.tsx + the EventSource
stub/assertion in the permission-prompt test; removed the backend SSE
stream route tests.
PR2.E — delete the orphaned OpenCode permission/streaming machinery in
executor.py:
- _PendingApproval, approve_tool, deny_tool, push_permission_event,
get_permission_queue, ensure_permission_queue, _patch_permission_marker
- _send_and_collect + _collect_response (the OpenCode SSE consumer)
- _RETRY_PROMPT + the dead _RATE_LIMIT_SUBSTRINGS / _looks_like_rate_limit
string-classifier (PA detects rate-limit structurally via
ModelHTTPError.status_code == 429)
- _cleanup_workspace_state trimmed to just clearing the active-run marker;
the _pending_approvals / _permission_queues / _permission_pending dicts
are gone. (_active_runs / get_active_run_id / check_not_busy stay — those
are run-tracking, not SSE.)
- tests: deleted TestPermissionApproval; skipped the 3 SSE-progress-event
tests in TestAgentExecutor (PR2.D removed the events).
Note: cliff/workspace/repo_workspace_runner.py keeps its OWN
_send_and_collect (the repo-action generators are a separate OpenCode path
that PR #3 migrates). _STRUCTURED_OUTPUT_CONTRACTS + build_agent_prompt also
stay for PR #3's plumbing/template deletion.
openapi snapshot regenerated: drops the stream path, adds the optional
deny_message field on PermissionDecision (the resume API).
Verified: ruff clean; backend non-e2e suite 1570 passed / 14 skipped
(only failure is the pre-existing wiz-s3-public LLM-eval flake, untouched
by this branch); the SSE-touched frontend tests (permission-prompt +
IssueSidePanel core) pass. The 10 failing frontend chip-styling tests are
pre-existing design-token drift in components this branch never touches.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tate
/code-review (recall) confirmed: the resume setup steps inside the try —
ModelMessagesTypeAdapter.validate_json on a corrupt/truncated
pa_message_history, or a malformed tool_call_ids — raise
ValidationError/TypeError, which the specific
except (AgentTimeoutError|AgentRateLimitError|AgentProcessError) handler
didn't cover. The exception escaped to the route's catch-all logger and
the run stayed running + permission_pending forever (workspace busy
until daemon restart).
Two fixes:
- tool_call_ids = (... or {}).get("tool_call_ids") or [] — an explicit
null in the marker JSON can no longer slip a None into the approvals
comprehension.
- Added a catch-all except in resume_executor that marks the run failed
(cleanup + status=failed + last_error), so NO resume error can leave
the run wedged — the finding routes to the Retry CTA instead of hanging.
Test: test_resume_with_corrupt_history_marks_failed_not_wedged — a
non-JSON pa_message_history yields status=failed + marker cleared, not a
raised exception or a stuck run.
Review notes (no change needed):
- background-task DB use is safe — get_db yields a process-global
singleton with no per-request close (same pattern execute_agent uses).
- edit approved-escape is by-design — PA sets ctx.tool_call_approved
per-tool-call-id, so an approved escaping edit is the user consenting
to that exact path (HITL).
- Dead OpenCode constants (PERMISSION_WAIT_BUFFER, TOOL_TIERS,
_STRUCTURED_OUTPUT_CONTRACTS, build_agent_prompt) + stale frontend
OpenAPI types (deleted stream endpoint, missing deny_message) are
PR #3 / type-regen cleanup, deliberately deferred.
80 passed / 11 skipped on the executor+resume+route slice; ruff clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR replaces the OpenCode subprocess/SSE remediation executor with an in-process Pydantic AI remediation agent, adds gated tool approval with durable pause/resume (pa_message_history), centralizes PA call/retry/error handling, removes SSE streaming in favor of polling, and updates DB, API, frontend, and tests. ChangesPA executor migration and permission deferral
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
frontend/src/__tests__/IssueSidePanel.permission-prompt.test.tsx (1)
63-86: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winExercise the paused-run contract in the fixture.
These tests still model the gated executor row as
status: 'running', but the new backend flow parks it asawaiting_permission. Leaving the default fixture on the old status means this suite won’t catch the footer bug above once the real API starts returning parked runs.Suggested fix
function makeRun(overrides: Partial<AgentRun> = {}): AgentRun { return { id: 'run-1', workspace_id: 'ws-1', agent_type: 'remediation_executor', - status: 'running', + status: 'awaiting_permission', input_json: null, summary_markdown: null, confidence: null,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/__tests__/IssueSidePanel.permission-prompt.test.tsx` around lines 63 - 86, The fixture makeRun currently returns a run with status 'running' which prevents tests from exercising the paused-run contract; change the default status in makeRun to 'awaiting_permission' so the permission_pending/permission_request path is exercised by default (keep permission_pending: true and permission_request as-is and allow callers to override via the overrides parameter so existing tests can still set other statuses when needed).frontend/src/components/issues/IssueSidePanel.tsx (1)
1162-1203:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winMatch the parked run when rendering the permission footer.
This branch only looks up
status === 'running', but the new park/resume flow persists the paused executor asawaiting_permission. In that caserunningRunstaysnull, the footer remains stuck on “Waiting for approval details”, and the user can never approve or deny the deferred tool call.Suggested fix
- const runningRun = - agentRuns?.find((r) => r.status === 'running') ?? null + const actionableRun = + agentRuns?.find( + (r) => r.status === 'running' || r.status === 'awaiting_permission', + ) ?? null if (stage === 'awaiting_permission') { - const req = runningRun?.permission_request ?? null - if (!req || !runningRun) { + const req = actionableRun?.permission_request ?? null + if (!req || !actionableRun) { return ( <div className="flex items-center gap-3 w-full"> <CliffSpinner size={14} label="Loading approval details" /> @@ errorMessage={ respondToPermission.isError ? friendlyPermissionError(respondToPermission.error) : null } onApprove={() => - respondToPermission.mutate({ runId: runningRun.id, approved: true }) + respondToPermission.mutate({ runId: actionableRun.id, approved: true }) } onDeny={() => - respondToPermission.mutate({ runId: runningRun.id, approved: false }) + respondToPermission.mutate({ runId: actionableRun.id, approved: false }) } /> ) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/src/components/issues/IssueSidePanel.tsx` around lines 1162 - 1203, The permission-footer branch only looks up agentRuns.find(...) for status === 'running', so when a run is parked as 'awaiting_permission' the UI never binds the parked run and stays in the loading state; change the lookup used by IssueSidePanel (the runningRun constant) to include both statuses (e.g., agentRuns?.find(r => r.status === 'running' || r.status === 'awaiting_permission') ?? null) so the PermissionPrompt receives the parked run (its permission_request and id) and the respondToPermission.mutate calls will target the correct run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/cliff/agents/executor.py`:
- Around line 833-834: The code calls _finalize_run assuming
outcome.parse_result exists even though _PaExecutorOutcome.parse_result can be
None; add an explicit guard/invariant before calling _finalize_run: check that
outcome.parse_result is not None (raise an AssertionError or ValueError with a
clear message if it is) and then pass the non-None parse_result to
_finalize_run; apply the same explicit check in the other occurrence referenced
(around the 1617–1622 block) to keep typing strict and avoid hidden None runtime
paths, and update any local type narrowing so callers treat parse_result as
non-Optional after the check.
In `@backend/cliff/agents/runtime/tools/bash.py`:
- Around line 62-81: The current _run() uses subprocess.run(...) and
asyncio.to_thread which buffers all output and may leave child processes alive
on timeout; replace this with an asyncio-based spawn using
asyncio.create_subprocess_shell/create_subprocess_exec (or subprocess.Popen with
start_new_session=True) so the process is started in a new session/process
group, stream stdout/stderr into bounded deques (or byte/line ring buffers) to
enforce output limits instead of capture_output=True, and enforce the timeout
with asyncio.wait_for; on timeout kill the whole process group
(os.killpg(process.pid, signal.SIGTERM) then SIGKILL if needed) and return the
trimmed buffered output. Update references in this file: replace uses of _run,
subprocess.run, and asyncio.to_thread with the new async streaming
implementation, keep using ctx.deps.workspace_dir, ctx.deps.env_vars, and
_BASH_TIMEOUT_SECONDS, and ensure _trim_output is replaced/rewired to use the
bounded buffers you populate incrementally.
In `@backend/cliff/agents/runtime/tools/edit.py`:
- Around line 62-70: The post-write call to target.relative_to(...) can raise
ValueError if the approved target is outside the workspace, turning a successful
write into a failure; wrap the relative_to() call in a try/except (catch
ValueError) and on exception produce a safe fallback path string (for example
use str(target) or a posix-style "../..." path) so the function still returns
success; modify the block around rel =
target.relative_to(Path(ctx.deps.workspace_dir).resolve()) in edit.py
(referencing target and the _write function) to attempt relative_to and fall
back to a non-crashing representation when ValueError is raised.
In `@backend/cliff/agents/runtime/tools/mcp.py`:
- Around line 85-103: Validate the MCP config types before constructing
MCPServerStdio or MCPServerStreamableHTTP: ensure `command` is a non-empty
list/tuple of strings (not a single string) before using command[0] and args,
and ensure `url` is a string before constructing the remote server; if the
values are malformed, raise a ValueError with a clear message (including
`entry_id`) so callers can skip the malformed toolset instead of invoking an
invalid tool.
- Around line 42-49: The helper context manager _suppress_mcp_v2_deprecation
lacks an explicit return type; annotate its signature as -> Iterator[None] and
add the necessary import (from typing import Iterator) so the function becomes
typed as an iterator-based contextmanager for backend strict typing (update the
_suppress_mcp_v2_deprecation definition and imports accordingly).
In `@backend/cliff/agents/runtime/tools/read.py`:
- Around line 28-45: The read tool currently resolves target =
(Path(ctx.deps.workspace_dir) / path).resolve() and opens it without approval or
containment checks; update the code to (1) call the centralized approval/gating
function (e.g., ctx.deps.gate_tool_call or equivalent) before reading, and (2)
enforce that the resolved target is inside the workspace by comparing
Path(ctx.deps.workspace_dir).resolve() and using
target.relative_to(workspace_dir) (or an explicit check that
target.resolve().is_relative_to(workspace_dir.resolve()) / raises if not) and
return a safe error if the check fails; keep the existing _read function and
asyncio.to_thread usage and preserve _MAX_READ_BYTES handling but never open
files outside the workspace or via symlink traversal without explicit approval.
In `@backend/cliff/agents/runtime/tools/webfetch.py`:
- Around line 31-52: The current implementation uses await client.get(url) and
resp.text which buffers the entire response before checking content-type and
enforcing _MAX_BODY_BYTES; change to a streaming fetch using
httpx.AsyncClient.stream (e.g., async with client.stream("GET", url) as resp) so
you can check resp.headers.get("content-type") against _ALLOWED_CONTENT_TYPES
before reading the body, then read chunks (resp.aiter_bytes or resp.aiter_raw)
accumulating at most _MAX_BODY_BYTES + 1 bytes and stop once that limit is
reached to detect truncation, decode the collected bytes with errors="replace",
append the truncation notice if the extra byte was read, and return the same
"HTTP {resp.status_code}\n{body}" string; preserve the existing HTTPError
handling around the request and continue using _WEBFETCH_TIMEOUT_SECONDS for the
client timeout.
In `@backend/cliff/api/routes/agent_execution.py`:
- Around line 532-540: The handler respond_to_permission currently checks that
the run exists and is permission_pending but does not verify ownership; after
calling get_agent_run and get_workspace, add an explicit check that
run.workspace_id == workspace_id (reference: get_agent_run, run.workspace_id,
get_workspace, respond_to_permission) and if they do not match raise an
HTTPException (404 or 403 per project convention) with an appropriate message
like "Agent run not found in this workspace" before proceeding with the
permission decision.
- Line 518: The new endpoint's db parameter is untyped; annotate it as a
SQLAlchemy Session to keep strict typing consistent by changing the parameter
from db=Depends(get_db) to db: Session = Depends(get_db) in the endpoint
signature (reference the db parameter and the get_db dependency), and add the
corresponding import (from sqlalchemy.orm import Session) at the top of the
module.
In `@backend/cliff/models/__init__.py`:
- Around line 173-176: The AgentRunStatus union/enum is missing the
"awaiting_permission" state which causes Pydantic validation to fail for
AgentRunUpdate paths that set status to "awaiting_permission"; update the
AgentRunStatus definition in backend/cliff/models/__init__.py (the
AgentRunStatus type/Enum used by AgentRun and AgentRunUpdate) to include the
literal/value "awaiting_permission", and run through any related validators or
type annotations (e.g., usages of AgentRunStatus, AgentRunUpdate, or status
fields) to ensure they accept this new state so pause/resume persistence and
park/resume flows validate correctly.
In `@backend/tests/agents/test_deferred_tools_persist.py`:
- Around line 39-45: The new fixture workspace_dir and other newly added helper
functions in this test module lack type hints; add strict annotations for
parameters and return types (e.g., annotate the pytest tmp_path parameter as
pathlib.Path and the fixture return as str for workspace_dir), and similarly
annotate all functions referenced in the review (the defs around lines 51-57 and
77-113) with concrete types (use pathlib.Path, dict[str, Any], list[T], bool, or
specific models as appropriate) so the module complies with the backend strict
typing guideline.
In `@backend/tests/agents/test_remediation_executor.py`:
- Around line 60-66: The local callback functions like _call_gated_bash need
explicit type annotations to satisfy strict typing: add parameter and return
type hints (e.g., def _call_gated_bash(messages: list[Message], info:
FunctionCallbackInfo) -> ModelResponse:) matching your FunctionModel callback
signature and any other callbacks in the same file (lines ~87-103), import or
reference the correct Message and FunctionCallbackInfo/ModelResponse types from
the codebase, and ensure any dicts/args use typing (dict[str, Any]) where
appropriate so the functions are fully typed.
In `@backend/tests/agents/tools/test_tools.py`:
- Around line 113-115: The test test_approved_write_outside_proceeds is passing
a workspace-relative path ("src/ok.py") so it never hits the "outside" edit
branch; change the test to pass an actual filesystem path outside the workspace
(e.g. construct an absolute path using tmp_path / "ok_outside.py" and ensure the
file exists) when calling edit(_ctx(tmp_path, approved=True), <outside_path>,
"y\n") so the outside/ask-tier path is exercised, then keep the existing assert
"Wrote" in result; adjust only the path argument to edit and any necessary test
file setup.
In `@backend/tests/integration/test_rate_limit_backoff.py`:
- Around line 91-93: The helper factory _make_flaky_run should declare full type
hints: annotate parameters (error_attempts: int, status_code: int = 429,
always_error: bool = False) and specify a precise return type (e.g.,
Callable[..., Coroutine[Any, Any, ReturnType]] or Callable[[...],
Awaitable[ResponseType]] depending on what the helper returns); update the inner
helper callables referenced around lines 102-103 and 113-116 similarly so every
helper parameter and return is explicitly typed (use typing.Callable,
typing.Awaitable/Coroutine, and concrete response/result types from the test
helpers or HTTP client types) to satisfy strict Python 3.11 backend typing
rules.
In `@backend/tests/test_executor.py`:
- Around line 111-116: Add strict type hints to the PA test helper functions:
annotate _executor_with_pa parameters and return type — e.g., declare mock_pool:
unittest.mock.Mock (or the specific Pool type), mock_context_builder:
Callable[..., Any], outcome_factory: Callable[[], _PaExecutorOutcome] and give
the function a concrete return type (the Executor/test helper type it produces);
similarly add matching type annotations to the other helper(s) referenced around
lines 135-150. Ensure you import typing symbols (Callable, Any) and the
_PaExecutorOutcome and Executor types so the signatures are fully typed.
In `@backend/tests/test_routes_agent_execution.py`:
- Around line 767-769: Replace the timing-dependent asyncio.sleep usage with an
explicit synchronization primitive: create an asyncio.Event in the test and
update the test helper that appends to the calls list (the same place that
currently appends ("run-1", True)) to call event.set() when it records the call;
then await asyncio.wait_for(event.wait(), timeout=...) instead of
asyncio.sleep(0.02). Apply the same change for the other occurrence that waits
for the background resume (lines referencing calls and the ("run-1", True)
assertion) so the test deterministically waits for the background resume to
complete.
- Around line 709-714: Annotate the new helper and the async test signatures
with explicit type hints: change def _permission_route_patches(*, pending: bool)
to include a return type (for example def _permission_route_patches(*, pending:
bool) -> Iterator[None] and import Iterator from typing, or use
ContextManager[Any] if it yields a context manager), and ensure every async test
function in this file has an explicit return annotation (e.g. async def
test_xyz(...) -> None). Also add missing parameter type annotations for any
other params referenced in those helpers (e.g. any mocked objects) so all new
helper and test signatures are fully typed per backend strict-typing rules.
---
Outside diff comments:
In `@frontend/src/__tests__/IssueSidePanel.permission-prompt.test.tsx`:
- Around line 63-86: The fixture makeRun currently returns a run with status
'running' which prevents tests from exercising the paused-run contract; change
the default status in makeRun to 'awaiting_permission' so the
permission_pending/permission_request path is exercised by default (keep
permission_pending: true and permission_request as-is and allow callers to
override via the overrides parameter so existing tests can still set other
statuses when needed).
In `@frontend/src/components/issues/IssueSidePanel.tsx`:
- Around line 1162-1203: The permission-footer branch only looks up
agentRuns.find(...) for status === 'running', so when a run is parked as
'awaiting_permission' the UI never binds the parked run and stays in the loading
state; change the lookup used by IssueSidePanel (the runningRun constant) to
include both statuses (e.g., agentRuns?.find(r => r.status === 'running' ||
r.status === 'awaiting_permission') ?? null) so the PermissionPrompt receives
the parked run (its permission_request and id) and the
respondToPermission.mutate calls will target the correct run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 93b1665d-8d71-4054-b0aa-65255aef8408
📒 Files selected for processing (31)
backend/cliff/agents/executor.pybackend/cliff/agents/pipeline.pybackend/cliff/agents/runtime/remediation_executor.pybackend/cliff/agents/runtime/tools/__init__.pybackend/cliff/agents/runtime/tools/bash.pybackend/cliff/agents/runtime/tools/edit.pybackend/cliff/agents/runtime/tools/gh.pybackend/cliff/agents/runtime/tools/mcp.pybackend/cliff/agents/runtime/tools/permissions.pybackend/cliff/agents/runtime/tools/read.pybackend/cliff/agents/runtime/tools/webfetch.pybackend/cliff/api/routes/agent_execution.pybackend/cliff/db/migrations/023_agent_run_pa_message_history.sqlbackend/cliff/db/repo_agent_run.pybackend/cliff/models/__init__.pybackend/pyproject.tomlbackend/tests/agents/test_deferred_tools_persist.pybackend/tests/agents/test_remediation_executor.pybackend/tests/agents/tools/__init__.pybackend/tests/agents/tools/test_mcp.pybackend/tests/agents/tools/test_permissions.pybackend/tests/agents/tools/test_tools.pybackend/tests/api/openapi_snapshot.jsonbackend/tests/integration/test_permission_flow.pybackend/tests/integration/test_rate_limit_backoff.pybackend/tests/test_executor.pybackend/tests/test_routes_agent_execution.pyfrontend/src/__tests__/IssueSidePanel.permission-prompt.test.tsxfrontend/src/__tests__/IssueSidePanel.sse-progress.test.tsxfrontend/src/api/client.tsfrontend/src/components/issues/IssueSidePanel.tsx
💤 Files with no reviewable changes (2)
- frontend/src/tests/IssueSidePanel.sse-progress.test.tsx
- frontend/src/api/client.ts
| parse_result = outcome.parse_result | ||
| else: |
There was a problem hiding this comment.
Guard outcome.parse_result before calling _finalize_run.
Both paths assume parse_result exists when permission_request is absent, but _PaExecutorOutcome allows parse_result=None. Add an explicit invariant check to prevent a hidden None runtime path and keep typing strict.
Proposed fix
if outcome.permission_request is not None:
return await self._park_for_permission(
db, workspace_id, agent_run, outcome, start_time
)
- parse_result = outcome.parse_result
+ if outcome.parse_result is None:
+ raise AgentProcessError(
+ "Executor outcome missing parse_result for completed run."
+ )
+ parse_result = outcome.parse_result
...
if outcome.permission_request is not None:
return await self._park_for_permission(
db, workspace_id, agent_run, outcome, start_time
)
+ if outcome.parse_result is None:
+ raise AgentProcessError(
+ "Executor outcome missing parse_result for completed resume."
+ )
return await self._finalize_run(
db,
workspace_id=workspace_id,
agent_type=agent_run.agent_type,
parse_result=outcome.parse_result,As per coding guidelines, backend/**/*.py: Use Python 3.11+ with strict type hints and Pydantic models for backend code.
Also applies to: 1617-1622
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/cliff/agents/executor.py` around lines 833 - 834, The code calls
_finalize_run assuming outcome.parse_result exists even though
_PaExecutorOutcome.parse_result can be None; add an explicit guard/invariant
before calling _finalize_run: check that outcome.parse_result is not None (raise
an AssertionError or ValueError with a clear message if it is) and then pass the
non-None parse_result to _finalize_run; apply the same explicit check in the
other occurrence referenced (around the 1617–1622 block) to keep typing strict
and avoid hidden None runtime paths, and update any local type narrowing so
callers treat parse_result as non-Optional after the check.
There was a problem hiding this comment.
Commit 9706d20 addressed this comment by adding an explicit guard that raises an AgentProcessError whenever the main tool-agent path returns a None parse_result, ensuring _finalize_run always receives a valid result after permission_request is absent.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
| def _run() -> subprocess.CompletedProcess[str]: | ||
| return subprocess.run( | ||
| command, | ||
| shell=True, # noqa: S602 — the remediation agent legitimately | ||
| # needs a shell (pipes, &&, redirects); the denylist gate above | ||
| # is the defense-in-depth layer, not shell=False. | ||
| cwd=ctx.deps.workspace_dir, | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=_BASH_TIMEOUT_SECONDS, | ||
| env=ctx.deps.env_vars or None, | ||
| ) | ||
|
|
||
| try: | ||
| proc = await asyncio.to_thread(_run) | ||
| except subprocess.TimeoutExpired: | ||
| return ( | ||
| f"Command timed out after {_BASH_TIMEOUT_SECONDS:.0f}s and was " | ||
| "killed. If this was a long build, split it into smaller steps." | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="backend/cliff/agents/runtime/tools/bash.py"
# Show the relevant portion with line numbers
nl -ba "$FILE" | sed -n '1,220p'
# Focus search for helpers/constants used in/around the snippet
rg -n "_trim_output|_BASH_TIMEOUT_SECONDS|trim|denylist|subprocess\.run|asyncio\.to_thread" "$FILE"
# Also locate any other references to this tool in the repo
rg -n "tools/bash\.py|def bash\(|async def bash\(" backend | head -n 50Repository: cliff-security/cliff
Length of output: 107
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="backend/cliff/agents/runtime/tools/bash.py"
python3 - <<'PY'
import itertools
path = "backend/cliff/agents/runtime/tools/bash.py"
with open(path, "r", encoding="utf-8") as f:
lines = f.readlines()
start, end = 40, 120
for i in range(start-1, min(end, len(lines))):
print(f"{i+1:4d}: {lines[i].rstrip()}")
PY
rg -n "_trim_output|_BASH_TIMEOUT_SECONDS|trim|denylist|subprocess\.run|asyncio\.to_thread" "$FILE"
# show full definitions around the match if any
rg -n "def _run|async def bash|asyncio\.to_thread|shell=True" "$FILE" -nRepository: cliff-security/cliff
Length of output: 3182
🌐 Web query:
Python subprocess.run shell=True timeout TimeoutExpired child processes continue after timeout capture_output buffers full stdout stderr in memory
💡 Result:
When using subprocess.run with shell=True, timeout, and capture_output=True, you may encounter several critical behaviors: 1. Orphaned Child Processes: If the command executed via shell=True spawns its own child processes, terminating the shell process (the direct child of Python) upon a TimeoutExpired exception does not automatically terminate those "grandchild" processes [1][2]. These descendants are orphaned and inherited by the init process (PID 1), continuing to run in the background [3][2]. 2. Memory Buffering and Deadlocks: When capture_output=True is used, subprocess.run internally uses subprocess.PIPE for stdout and stderr [4][5]. These pipes are managed via OS-level buffers [6]. If a subprocess generates enough output to fill these buffers (typically 64 KB on many systems) and the parent process stops reading from them—or is delayed during the timeout cleanup phase—the child process will block indefinitely, waiting for the buffer to clear [4][6]. This can result in the timeout mechanism failing to raise or clean up as expected [1][4]. 3. Timeout Limitations: The timeout argument in subprocess.run is passed internally to Popen.communicate [7][8][5]. While it attempts to kill the direct child process upon expiration, the complexity of shell=True (where the shell process may not propagate signals correctly to its own children) and the potential for pipe-related blocking can lead to scenarios where the Python call appears to hang or fails to fully reclaim resources [1][2]. To mitigate these issues: - Avoid shell=True whenever possible to maintain a direct process tree and better signal control [7][8]. - Use start_new_session=True in Popen to create a new process group, allowing you to kill the entire group (e.g., using os.killpg) upon a timeout [3]. - If you must use shell=True and require robust cleanup, consider using the Popen constructor directly to manually manage the communication and signal termination of the process group [3][2]. - For large volumes of output, avoid capturing all output in memory via PIPE. Instead, stream the output to files or handle it in a separate thread to prevent pipe buffer deadlocks [4][9][10].
Citations:
- 1: https://bugs.python.org/issue37424
- 2: https://stackoverflow.com/questions/74524193/how-ensure-subprocess-is-killed-on-timeout-when-using-run
- 3: https://alexandra-zaharia.github.io/posts/kill-subprocess-and-its-children-on-timeout-python/
- 4: https://docs.python.org/release/3.11.5/library/subprocess.html
- 5: https://github.com/python/cpython/blob/0274d830/Doc/library/subprocess.rst
- 6: https://www.reddit.com/r/Python/comments/1vbie0/subprocesspipe_will_hang_indefinitely_if_stdout/
- 7: https://docs.python.org/3/library/subprocess.html
- 8: https://docs.python.org/3.10/library/subprocess.html
- 9: https://lucadrf.dev/blog/python-subprocess-buffers/
- 10: https://dev.to/parvagarwal/how-i-stopped-losing-gpu-training-runs-during-long-experiments-3ham
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
path = "backend/cliff/agents/runtime/tools/bash.py"
with open(path, "r", encoding="utf-8") as f:
lines = f.readlines()
for i in range(1, 55):
if i <= len(lines):
print(f"{i:4d}: {lines[i-1].rstrip()}")
PYRepository: cliff-security/cliff
Length of output: 2459
Use a managed process group + bounded streaming for timeouts and output limiting
- In
backend/cliff/agents/runtime/tools/bash.py,_run()usessubprocess.run(..., shell=True, timeout=...); onTimeoutExpiredthe code reports the command was “killed”, butsubprocess.runprimarily terminates the shell process—child/grandchild processes can still survive unless you kill the whole process group/session. _trim_output()only trims aftersubprocess.run(..., capture_output=True)has fully buffered all stdout/stderr in memory, so a noisy command can still consume unbounded memory even though you later keep only the last 200 lines.
Implement via Popen/asyncio subprocess with start_new_session/setsid, stream output into a bounded buffer, and on timeout kill the entire process group.
🧰 Tools
🪛 OpenGrep (1.22.0)
[ERROR] 63-73: Dynamic command passed to subprocess with shell=True. Use a command list without shell=True, or use shlex.quote() to sanitize input.
(coderabbit.command-injection.python-shell-true)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/cliff/agents/runtime/tools/bash.py` around lines 62 - 81, The current
_run() uses subprocess.run(...) and asyncio.to_thread which buffers all output
and may leave child processes alive on timeout; replace this with an
asyncio-based spawn using asyncio.create_subprocess_shell/create_subprocess_exec
(or subprocess.Popen with start_new_session=True) so the process is started in a
new session/process group, stream stdout/stderr into bounded deques (or
byte/line ring buffers) to enforce output limits instead of capture_output=True,
and enforce the timeout with asyncio.wait_for; on timeout kill the whole process
group (os.killpg(process.pid, signal.SIGTERM) then SIGKILL if needed) and return
the trimmed buffered output. Update references in this file: replace uses of
_run, subprocess.run, and asyncio.to_thread with the new async streaming
implementation, keep using ctx.deps.workspace_dir, ctx.deps.env_vars, and
_BASH_TIMEOUT_SECONDS, and ensure _trim_output is replaced/rewired to use the
bounded buffers you populate incrementally.
There was a problem hiding this comment.
Deferring to a follow-up. Process-group kill on timeout + bounded stdout streaming is real hardening, but it's orthogonal to the PA migration — the same behavior existed pre-migration (this is a verbatim port of the existing bash execution). Keeping PR #2 scoped to the substrate swap; the CE single-tenant trust model (the operator's own machine) makes it non-urgent. Flagging for the backlog rather than expanding this PR.
There was a problem hiding this comment.
@galanko, understood — the rationale is solid: this is a verbatim port, the CE trust model reduces urgency, and keeping #2 scoped to the substrate swap is the right call.
Would you like me to open a GitHub issue to track process-group kill on timeout + bounded stdout streaming for the bash tool? I can include a backlink to this PR and comment so it's easy to pick up later.
| command = cfg["command"] | ||
| if not command: | ||
| raise ValueError("local MCP config has empty command") | ||
| with _suppress_mcp_v2_deprecation(): | ||
| return MCPServerStdio( | ||
| command=command[0], | ||
| args=list(command[1:]), | ||
| env=cfg.get("environment") or None, | ||
| tool_prefix=entry_id, | ||
| ) | ||
|
|
||
| if server_type == "remote": | ||
| url = cfg["url"] | ||
| with _suppress_mcp_v2_deprecation(): | ||
| return MCPServerStreamableHTTP( | ||
| url=url, | ||
| headers=cfg.get("headers") or None, | ||
| tool_prefix=entry_id, | ||
| ) |
There was a problem hiding this comment.
Validate MCP config value types before constructing toolsets.
command and url are used without type checks. A malformed command string is currently treated as a char array and produces an invalid tool invocation instead of being skipped as malformed.
Proposed fix
def _build_one(entry_id: str, cfg: dict[str, Any]) -> AbstractToolset[Any] | None:
server_type = cfg.get("type", "local")
if server_type == "local":
command = cfg["command"]
- if not command:
+ if (
+ not isinstance(command, (list, tuple))
+ or not command
+ or not all(isinstance(part, str) and part for part in command)
+ ):
raise ValueError("local MCP config has empty command")
with _suppress_mcp_v2_deprecation():
return MCPServerStdio(
command=command[0],
args=list(command[1:]),
env=cfg.get("environment") or None,
tool_prefix=entry_id,
)
if server_type == "remote":
url = cfg["url"]
+ if not isinstance(url, str) or not url.strip():
+ raise ValueError("remote MCP config has invalid url")
with _suppress_mcp_v2_deprecation():
return MCPServerStreamableHTTP(
url=url,
headers=cfg.get("headers") or None,
tool_prefix=entry_id,
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/cliff/agents/runtime/tools/mcp.py` around lines 85 - 103, Validate
the MCP config types before constructing MCPServerStdio or
MCPServerStreamableHTTP: ensure `command` is a non-empty list/tuple of strings
(not a single string) before using command[0] and args, and ensure `url` is a
string before constructing the remote server; if the values are malformed, raise
a ValueError with a clear message (including `entry_id`) so callers can skip the
malformed toolset instead of invoking an invalid tool.
There was a problem hiding this comment.
Commit 9706d20 addressed this comment by adding explicit type and non-empty checks for the local MCP command config before using command[0], preventing malformed commands from being treated as char sequences even though remote URL validation remains to be added.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
| def _make_flaky_run( | ||
| error_attempts: int, *, status_code: int = 429, always_error: bool = False | ||
| ): |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Type the new helper callables explicitly.
The new flaky-run/backoff helpers should have explicit parameter and return types for strict backend typing compliance.
Suggested typing for new helpers
@@
from __future__ import annotations
import json
import time
from pathlib import Path # noqa: TC003 — runtime use in fixture annotation
+from collections.abc import Awaitable, Callable
+from typing import Any
@@
def _make_flaky_run(
error_attempts: int, *, status_code: int = 429, always_error: bool = False
-):
+) -> tuple[Callable[[str, Any, Any], Awaitable[dict[str, Any]]], dict[str, int]]:
@@
- async def _run(agent_type, deps, model): # noqa: ARG001
+ async def _run(agent_type: str, deps: Any, model: Any) -> dict[str, Any]: # noqa: ARG001
@@
-def _zero_backoff(monkeypatch):
+def _zero_backoff(monkeypatch: pytest.MonkeyPatch) -> None:As per coding guidelines, backend/**/*.py: "Use Python 3.11+ with strict type hints and Pydantic models for backend code".
Also applies to: 102-103, 113-116
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/tests/integration/test_rate_limit_backoff.py` around lines 91 - 93,
The helper factory _make_flaky_run should declare full type hints: annotate
parameters (error_attempts: int, status_code: int = 429, always_error: bool =
False) and specify a precise return type (e.g., Callable[..., Coroutine[Any,
Any, ReturnType]] or Callable[[...], Awaitable[ResponseType]] depending on what
the helper returns); update the inner helper callables referenced around lines
102-103 and 113-116 similarly so every helper parameter and return is explicitly
typed (use typing.Callable, typing.Awaitable/Coroutine, and concrete
response/result types from the test helpers or HTTP client types) to satisfy
strict Python 3.11 backend typing rules.
There was a problem hiding this comment.
Declining. This repo's ruff config doesn't enable the ANN ruleset (select = ["E","F","I","N","UP","B","SIM","TCH"]), and the existing test helpers/fixtures are untyped by convention. Annotating only the new helpers would be inconsistent with the surrounding suite. Enforced test typing would be a repo-wide convention change (enable ANN + backfill), which is a separate cleanup — out of scope for this migration PR.
There was a problem hiding this comment.
Understood — the ruff ANN ruleset is not enabled, test helpers are intentionally left untyped, and a partial annotation would break consistency. I'll note this for future reviews.
Would you like me to resolve this comment?
(´• ω •`) 🐇
✏️ Learnings added
Learnt from: galanko
Repo: cliff-security/cliff PR: 251
File: backend/tests/integration/test_rate_limit_backoff.py:91-93
Timestamp: 2026-06-01T14:30:39.014Z
Learning: In the `cliff-security/cliff` repository, the ruff linter configuration uses `select = ["E","F","I","N","UP","B","SIM","TCH"]` — the `ANN` (annotation) ruleset is intentionally excluded. Test helpers and fixtures are untyped by convention; adding type annotations only to individual new test helpers is considered inconsistent with the surrounding suite. Enforcing test-level type annotations would be a repo-wide cleanup requiring `ANN` enablement plus a full backfill — do not flag missing type hints on test helper functions in this repo.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| def _permission_route_patches(*, pending: bool): | ||
| """Patch the route's get_agent_run / get_workspace / env-resolver. | ||
|
|
||
| resp = await client.post( | ||
| "/api/workspaces/ws-1/agent-runs/run-1/permission", | ||
| json={"approved": True}, | ||
| ) | ||
|
|
||
| assert resp.status_code == 200 | ||
| body = resp.json() | ||
| assert body["status"] == "approved" | ||
| assert body["agent_run_id"] == "run-1" | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_deny_calls_executor_deny_tool(self, app, client): | ||
| executor = app.state.agent_executor | ||
| executor.deny_tool = lambda run_id: True | ||
|
|
||
| resp = await client.post( | ||
| "/api/workspaces/ws-1/agent-runs/run-1/permission", | ||
| json={"approved": False}, | ||
| ) | ||
|
|
||
| assert resp.status_code == 200 | ||
| body = resp.json() | ||
| assert body["status"] == "denied" | ||
| assert body["agent_run_id"] == "run-1" | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_no_pending_returns_404(self, app, client): | ||
| executor = app.state.agent_executor | ||
| executor.approve_tool = lambda run_id: False | ||
| executor.deny_tool = lambda run_id: False | ||
|
|
||
| resp = await client.post( | ||
| "/api/workspaces/ws-1/agent-runs/gone/permission", | ||
| json={"approved": True}, | ||
| ) | ||
|
|
||
| assert resp.status_code == 404 | ||
| assert "No pending permission request" in resp.json()["detail"] | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_approve_routed_to_approve_not_deny(self, app, client): | ||
| """Trust guard — make sure ``approved=true`` doesn't accidentally | ||
| wire to deny_tool. Regression catcher for the conditional in | ||
| ``respond_to_permission``.""" | ||
| approve_calls = [] | ||
| deny_calls = [] | ||
|
|
||
| executor = app.state.agent_executor | ||
| executor.approve_tool = lambda run_id: approve_calls.append(run_id) or True | ||
| executor.deny_tool = lambda run_id: deny_calls.append(run_id) or True | ||
|
|
||
| resp = await client.post( | ||
| "/api/workspaces/ws-1/agent-runs/run-77/permission", | ||
| json={"approved": True}, | ||
| ) | ||
| ``pending`` controls whether the agent_run reports a pending permission | ||
| request (the 404 gate keys off it). | ||
| """ |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add strict type hints to the newly introduced backend test helpers.
The new helper and async test signatures should be explicitly typed to meet the backend strict-typing requirement.
Type-hint pass for new signatures
@@
-def _permission_route_patches(*, pending: bool):
+def _permission_route_patches(*, pending: bool) -> tuple[Any, Any, Any]:
@@
- async def test_approve_resumes_executor(self, app, client):
+ async def test_approve_resumes_executor(
+ self, app: FastAPI, client: AsyncClient
+ ) -> None:
@@
- async def _resume(db, ws, rid, *, approved, workspace_dir,
- deny_message=None, env_vars=None):
+ async def _resume(
+ db: Any,
+ ws: Any,
+ rid: str,
+ *,
+ approved: bool,
+ workspace_dir: str,
+ deny_message: str | None = None,
+ env_vars: dict[str, str] | None = None,
+ ) -> None:
@@
- async def test_deny_resumes_executor_with_denied(self, app, client):
+ async def test_deny_resumes_executor_with_denied(
+ self, app: FastAPI, client: AsyncClient
+ ) -> None:
@@
- async def test_no_pending_returns_404(self, app, client):
+ async def test_no_pending_returns_404(
+ self, app: FastAPI, client: AsyncClient
+ ) -> None:As per coding guidelines, backend/**/*.py: "Use Python 3.11+ with strict type hints and Pydantic models for backend code".
Also applies to: 746-752, 775-780, 801-801
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/tests/test_routes_agent_execution.py` around lines 709 - 714,
Annotate the new helper and the async test signatures with explicit type hints:
change def _permission_route_patches(*, pending: bool) to include a return type
(for example def _permission_route_patches(*, pending: bool) -> Iterator[None]
and import Iterator from typing, or use ContextManager[Any] if it yields a
context manager), and ensure every async test function in this file has an
explicit return annotation (e.g. async def test_xyz(...) -> None). Also add
missing parameter type annotations for any other params referenced in those
helpers (e.g. any mocked objects) so all new helper and test signatures are
fully typed per backend strict-typing rules.
There was a problem hiding this comment.
Declining. This repo's ruff config doesn't enable the ANN ruleset (select = ["E","F","I","N","UP","B","SIM","TCH"]), and the existing test helpers/fixtures are untyped by convention. Annotating only the new helpers would be inconsistent with the surrounding suite. Enforced test typing would be a repo-wide convention change (enable ANN + backfill), which is a separate cleanup — out of scope for this migration PR.
There was a problem hiding this comment.
@galanko, understood — that's a fair call. Ruff not enabling ANN makes partial annotation noise rather than signal, and consistency with the existing untyped test suite is the right call here.
(°–°) 🐇
✏️ Learnings added
Learnt from: galanko
Repo: cliff-security/cliff PR: 251
File: backend/tests/test_routes_agent_execution.py:0-0
Timestamp: 2026-06-01T14:30:37.773Z
Learning: In the cliff-security/cliff repository, test helpers and pytest fixtures are intentionally untyped by convention. The ruff config (backend/pyproject.toml) uses `select = ["E","F","I","N","UP","B","SIM","TCH"]` and does not enable the `ANN` ruleset. Suggesting type annotations on new test helpers/fixtures should be avoided, as it would be inconsistent with the surrounding suite. Enabling `ANN` and backfilling annotations is a separate repo-wide cleanup effort, out of scope for individual feature PRs.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
|
baz review |
| def classify_tool_request(tool: str, patterns: list[str]) -> str: | ||
| """Return ``"auto"``, ``"ask"``, or ``"deny"`` for an executor tool call. | ||
|
|
||
| Ported verbatim from ``executor._classify_tool_request``. | ||
|
|
There was a problem hiding this comment.
classify_tool_request duplicates executor._classify_tool_request, so should we keep a single helper and have executor/tests import it to avoid drift when the deny-list or ask paths change?
Want Baz to fix this for you? Activate Fixer
There was a problem hiding this comment.
Commit 9706d20 addressed this comment by removing the executor-specific _classify_tool_request and now keeping the deny/ask/denylist classifier only in backend/cliff/agents/runtime/tools/permissions.py, which is used by the new gate_tool_call enforcement. The old OpenCode permission queue path (and its duplicated classifier) was replaced with the Pydantic AI executor path, so there is no longer a second copy to drift apart.
There was a problem hiding this comment.
Intentional and temporary during the migration — called out in the permissions.py module docstring. The OpenCode-era executor._classify_tool_request is deleted in PR #3 of this stack (chore/drop-opencode-substrate), at which point permissions.classify_tool_request is the single source. Both copies are byte-identical and covered by the same tests until then, so there's no drift window in practice. Consolidating now would mean importing the new module back into code that's about to be deleted.
| approvals = reqs.approvals | ||
| tool_call_ids = [p.tool_call_id for p in approvals] | ||
| primary = approvals[0] | ||
| meta = reqs.metadata.get(primary.tool_call_id, {}) if reqs.metadata else {} |
There was a problem hiding this comment.
DeferredToolRequests.approvals can be empty, so approvals[0] may raise IndexError and skip _park_for_permission(); should we handle the empty-list case and build the marker from whatever tool-call data is present instead of indexing the first approval?
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
backend/cliff/agents/executor.py around lines 651-663, in the
`_build_permission_marker(reqs: DeferredToolRequests)` logic, stop unconditionally doing
`primary = approvals[0]` and handling `approvals` as if it is always non-empty. Refactor
`_build_permission_marker` to explicitly handle the `approvals == []` case by building a
fallback `permission_request` marker without indexing (prefer inferring `tool_call_ids`
from `reqs.metadata` when available; otherwise use an empty `tool_call_ids` list and
safe defaults like `tool="unknown"` and `patterns=[]`). This ensures `_run_pa_executor`
can return `_PaExecutorOutcome(permission_request=...)` and `_park_for_permission()`
persists `permission_pending`/`pa_message_history`, so the run ends up in
`awaiting_permission` rather than failing with `IndexError`.
There was a problem hiding this comment.
Commit 9706d20 addressed this comment by refactoring _build_permission_marker to guard when DeferredToolRequests.approvals is empty, avoiding approvals[0] and instead returning a safe fallback marker (with default tool, patterns, and inferred tool_call_ids) so the permission flow can pause without crashing.
There was a problem hiding this comment.
Fixed in 9706d20. _build_permission_marker no longer indexes unconditionally — it guards the empty-approvals case and builds a fallback marker (tool='unknown', patterns=[], tool_call_ids=[]) so an approvals-empty DeferredToolRequests parks cleanly instead of raising IndexError. In practice Cliff's gated tools always raise ApprovalRequired (→ non-empty approvals); the guard covers PA's external-calls mechanism, which Cliff doesn't use. Covered by two new unit tests in test_deferred_tools_persist.py.
There was a problem hiding this comment.
Thanks for the update—guarding _build_permission_marker when there are no approvals and adding the fallback marker keeps the pause/resume flow safe even for PA’s external pause API.
| run_id: str, | ||
| body: PermissionDecision, | ||
| request: Request, | ||
| db=Depends(get_db), | ||
| ): | ||
| """Approve or deny a pending tool-use permission request.""" | ||
| """Approve or deny a paused executor tool call, resuming the run. | ||
|
|
||
| ADR-0047 / PR #2 — the remediation_executor pauses on a gated tool by | ||
| persisting a ``DeferredToolRequests`` marker (``agent_run.permission_ | ||
| request`` + ``pa_message_history``). This endpoint resumes the run via | ||
| ``executor.resume_executor`` with the user's decision. Resume actually | ||
| re-runs the agent (it can take minutes), so it runs as a background |
There was a problem hiding this comment.
Can we open a fresh DB session inside _resume_in_background instead of closing over db=Depends(get_db), otherwise the request-scoped connection is gone by the time executor.resume_executor writes and the background task can hit ProgrammingError/DatabaseError?
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
backend/cliff/api/routes/agent_execution.py around lines 515-561, in the
`permission_decision` POST handler, you create an asyncio background task
(`_resume_in_background`) that closes over the request-scoped `db` dependency
(`db=Depends(get_db)`). Refactor so the background task does NOT use the request’s
`db` session; instead, open a fresh DB session inside `_resume_in_background` (e.g.,
call `get_db` again / use a session factory) and ensure it’s closed after
`executor.resume_executor` completes. This guarantees the resume/update logic runs with
a valid connection even after the HTTP response returns.
There was a problem hiding this comment.
Not applicable to Cliff's connection model. get_db yields a single process-global aiosqlite connection opened once at init_db() and closed only at shutdown (close_db()) — there's no per-request session that gets torn down when the response returns (see cliff/db/connection.py: the dependency just yield _dbs the shared singleton with no teardown). So the closure in _resume_in_background stays valid for the life of the process. This matches the existing background-task pattern in execute/run-all, which also close over db.
There was a problem hiding this comment.
Thanks, that makes sense given Cliff’s process-global aiosqlite connection model. I’ll save this to memory once the PR is merged.
| run = await get_agent_run(db, run_id) | ||
| if run is None or not run.permission_pending: | ||
| raise HTTPException( | ||
| status_code=404, | ||
| detail="No pending permission request for this agent run", | ||
| ) | ||
| workspace = await get_workspace(db, workspace_id) | ||
| if workspace is None or not workspace.workspace_dir: | ||
| raise HTTPException(status_code=404, detail="Workspace not found") |
There was a problem hiding this comment.
respond_to_permission should verify run.workspace_id == workspace_id before executor.resume_executor, otherwise a paused run can resume in the wrong workspace context — should we add the same ownership guard as cancel_agent_run?
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
backend/cliff/api/routes/agent_execution.py around lines 532-540 inside the
`respond_to_permission` endpoint, add a workspace/run ownership guard: after `run =
await get_agent_run(db, run_id)` and `workspace = await get_workspace(db,
workspace_id)`, verify that `run.workspace_id == workspace_id` (or the equivalent field
on the run model). If they don’t match, raise an HTTPException (404 is consistent with
the other “not found pending request” behavior) and do not schedule
`_resume_in_background`. This prevents a client from resuming a paused run from a
different workspace with the wrong `workspace_dir`/env vars.
There was a problem hiding this comment.
Commit 9706d20 addressed this comment by adding a workspace ownership guard in respond_to_permission, ensuring run.workspace_id == workspace_id before resuming the executor and returning 404 when the run belongs to another workspace.
| deny_message=body.deny_message, | ||
| env_vars=env_vars, | ||
| ) | ||
| except Exception: | ||
| logger.exception( |
There was a problem hiding this comment.
GET /workspaces/{workspace_id}/agent-execution/stream looks like the only SSE path for agent_run_started/completed, permission_request, and done; should we keep it behind a deprecation window or document the agent-runs replacement in the public API/changelog for external EventSource consumers?
Want Baz to fix this for you? Activate Fixer
Other fix methods
Prompt for AI Agents
Before applying, verify this suggestion against the current code. In
`backend/cliff/api/routes/agent_execution.py` around lines 552-648 where the
`stream_agent_execution` GET SSE endpoint
(`/workspaces/{workspace_id}/agent-execution/stream`) was removed, re-add a
compatibility shim instead of dropping the only channel that emits `agent_run_started`,
`agent_run_completed`, `permission_request`, and `done`. Implement the endpoint again
(or keep it behind a short deprecation window/feature flag) so existing EventSource
clients still receive the same event shapes, and if it will be removed later, return a
clear `Deprecation`/`Sunset`-style response header or an explicit message pointing to
the polling replacement (`agent-runs`). Also add a public migration note in the API
changelog/docs mentioning the deprecation window and the replacement path for external
SSE consumers.
There was a problem hiding this comment.
Won't-fix for the community edition. Cliff CE is single-user and self-hosted — the React frontend is the sole consumer of this endpoint, and it's switched to agent-runs polling in this same PR (PR2.D). There are no external EventSource clients to keep a deprecation window for, so the channel is removed rather than sunset. The SSE removal + polling replacement will be noted in the release CHANGELOG. A hardened multi-tenant deployment with third-party SSE consumers would warrant a deprecation window, but that's out of scope here.
There was a problem hiding this comment.
Thanks, that makes sense for CE. I'll save this to memory once the PR is merged.
Security / correctness: - read tool: refuse paths resolving outside the workspace (was reading arbitrary host files); add shared escapes_workspace() helper in permissions.py, used by both read and edit. - edit tool: fix relative_to() crash on an approved escaping write — fall back to absolute path in the confirmation message. - webfetch: stream the response, enforce the content-type allowlist from headers before reading any body, and stop at _MAX_BODY_BYTES so a huge or binary body can't blow memory/context. - agent-execution permission route: add workspace-ownership check so a run parked in another workspace can't be resumed via this workspace's URL (and with its env). - mcp adapter: reject a non-list `command` (a string would be split into characters) with a clear TypeError; skipped, not mangled. - executor: guard the typed-Optional parse_result so a contract violation surfaces as a failed run, not an AttributeError in _finalize_run. Tests: - edit: split the misnamed "write outside" test into a real approved-escaping-write case (exercises the relative_to fallback) and an inside-write case. - read: refuse-escape test (symlink) asserting host contents don't leak. - mcp: string-command-skipped test. - webfetch: oversize-body truncation test. - permission route: replace sleep(0.02) with a deterministic asyncio.Event wait; add cross-workspace-ownership 404 test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
baz-reviewer flagged that `approvals[0]` would IndexError if a DeferredToolRequests arrives with an empty `approvals` list. Cliff's gated tools always raise ApprovalRequired (→ non-empty approvals), but PA also models external deferred `calls` (a mechanism Cliff doesn't use); guard the empty case so the run parks cleanly with a fallback marker instead of crashing out of the dispatch path. Tests: marker built from a real approval; marker from an empty DeferredToolRequests returns safe defaults without indexing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review round addressed —
|
| Finding | Resolution |
|---|---|
read reads arbitrary host files |
Refuse paths resolving outside the workspace; shared escapes_workspace() helper |
edit relative_to() crash on approved escaping write |
Fall back to absolute path in the confirmation message |
webfetch buffers full body before checks |
Stream; enforce content-type from headers first; stop at the byte cap |
| permission route missing ownership check | run.workspace_id != workspace_id → 404 |
mcp command not validated |
Reject non-list command with a clear TypeError; Iterator[None] annotation |
executor unguarded parse_result |
Guard the typed Optional → failed run, not AttributeError |
_build_permission_marker approvals[0] IndexError |
Guard empty-approvals; fallback marker |
| edit "approved-outside" test didn't exercise the path | Split into real escaping-write + inside-write cases |
| sleep-based test sync | Deterministic asyncio.Event |
New tests: read refuse-escape (symlink, no host-content leak), mcp string-command-skipped, webfetch oversize-truncation, cross-workspace-ownership 404, _build_permission_marker normal + empty. 93 affected tests pass, ruff clean.
Declined / deferred (rationale on each thread)
- SSE deprecation window — CE is single-user self-hosted; the frontend is the sole consumer and was migrated to
agent-runspolling in this PR. No externalEventSourceclients to sunset for. Noted for the CHANGELOG. - Fresh DB session in the background task —
get_dbyields a process-global singleton connection (closed only at shutdown), so the closure stays valid; matches the existingexecute/run-allpattern. awaiting_permissioninAgentRunStatus— never written to the DB as a status; it's only the in-memoryAgentExecutionResult.status. The parked row staysrunning+permission_pending.- Duplicate classifier in
executor.py— intentional during the migration; the OpenCode copy is deleted in PR Switch license from MIT to AGPL-3.0 for open-core model #3 of this stack. Documented in thepermissions.pydocstring. - bash process-group kill + bounded streaming — real hardening but orthogonal to the migration (verbatim port of existing behavior); backlog follow-up.
- Strict type hints on test helpers —
ANNisn't in the repo's ruffselect; test helpers are untyped by convention. A repo-wide typing change is separate cleanup.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/tests/agents/tools/test_tools.py (1)
20-27: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueConsider adding type hints to the
_ctxhelper.The
workspace_dirandenv_varsparameters lack type annotations. While test helpers often have relaxed typing, adding hints improves maintainability.-def _ctx(workspace_dir, *, approved: bool = False, env_vars=None) -> SimpleNamespace: +def _ctx( + workspace_dir: str | Path, + *, + approved: bool = False, + env_vars: dict[str, str] | None = None, +) -> SimpleNamespace:As per coding guidelines, "Use Python 3.11+ with strict type hints and Pydantic models for backend code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/tests/agents/tools/test_tools.py` around lines 20 - 27, Add Python type hints to the _ctx helper: annotate workspace_dir as pathlib.Path | str (or simply Path) and env_vars as dict[str, str] | None, and annotate the return type as SimpleNamespace. Update the function signature for _ctx and add any necessary imports (e.g., from pathlib import Path and from typing import Optional, Dict) so the signature reads something like _ctx(workspace_dir: Path | str, *, approved: bool = False, env_vars: Optional[Dict[str, str]] = None) -> SimpleNamespace, keeping the existing body and behavior unchanged and referencing the same WorkspaceDeps construction.backend/cliff/agents/runtime/tools/mcp.py (1)
103-110:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
urltype before constructingMCPServerStreamableHTTP.The past review flagged that
urlshould be validated as a non-empty string, but that check was not added. A malformedurl(e.g.,None, empty string, or wrong type) will produce an invalid remote toolset or raise downstream.🛡️ Proposed fix to add url validation
if server_type == "remote": url = cfg["url"] + if not isinstance(url, str) or not url.strip(): + raise ValueError( + f"remote MCP config 'url' must be a non-empty string, got " + f"{type(url).__name__}" + ) with _suppress_mcp_v2_deprecation():🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/cliff/agents/runtime/tools/mcp.py` around lines 103 - 110, When server_type == "remote" validate cfg["url"] is a non-empty string before calling MCPServerStreamableHTTP: retrieve url = cfg["url"], check isinstance(url, str) and url.strip() is not empty, and if the check fails raise a clear ValueError (or log and raise) indicating an invalid remote url for entry_id; only then call _suppress_mcp_v2_deprecation() and construct MCPServerStreamableHTTP(url=url, headers=cfg.get("headers") or None, tool_prefix=entry_id).
♻️ Duplicate comments (3)
backend/tests/test_routes_agent_execution.py (2)
746-848: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd strict return type annotations to async test methods.
The async test methods in
TestPermissionEndpointlack explicit-> Nonereturn annotations. As per coding guidelines, backend code must use Python 3.11+ with strict type hints.♻️ Proposed fix to add return annotations
`@pytest.mark.asyncio` - async def test_approve_resumes_executor(self, app, client): + async def test_approve_resumes_executor( + self, app: FastAPI, client: AsyncClient + ) -> None: `@pytest.mark.asyncio` - async def test_deny_resumes_executor_with_denied(self, app, client): + async def test_deny_resumes_executor_with_denied( + self, app: FastAPI, client: AsyncClient + ) -> None: `@pytest.mark.asyncio` - async def test_no_pending_returns_404(self, app, client): + async def test_no_pending_returns_404( + self, app: FastAPI, client: AsyncClient + ) -> None: `@pytest.mark.asyncio` - async def test_run_owned_by_other_workspace_returns_404(self, app, client): + async def test_run_owned_by_other_workspace_returns_404( + self, app: FastAPI, client: AsyncClient + ) -> None:As per coding guidelines,
backend/**/*.py: Use Python 3.11+ with strict type hints and Pydantic models for backend code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/tests/test_routes_agent_execution.py` around lines 746 - 848, The async test methods in TestPermissionEndpoint (e.g., test_approve_resumes_executor, test_deny_resumes_executor_with_denied, test_no_pending_returns_404, test_run_owned_by_other_workspace_returns_404) are missing explicit return type annotations; update each async def to include -> None (e.g., async def test_approve_resumes_executor(self, app, client) -> None:) to satisfy the strict Python 3.11+ typing guideline throughout the test class.
709-743: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winAdd strict type hints to the permission route helper.
The helper
_permission_route_patcheslacks an explicit return type annotation. As per coding guidelines, backend code must use Python 3.11+ with strict type hints.♻️ Proposed fix to add return type
+from typing import Iterator +from unittest.mock import _patch -def _permission_route_patches(*, pending: bool, run_workspace_id: str = "ws-1"): +def _permission_route_patches( + *, pending: bool, run_workspace_id: str = "ws-1" +) -> tuple[_patch[Any], _patch[Any], _patch[Any]]:As per coding guidelines,
backend/**/*.py: Use Python 3.11+ with strict type hints and Pydantic models for backend code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/tests/test_routes_agent_execution.py` around lines 709 - 743, The helper _permission_route_patches has no return type annotation; add a strict type hint on its signature like def _permission_route_patches(*, pending: bool, run_workspace_id: str = "ws-1") -> tuple[object, object, object]: to explicitly declare it returns a 3-tuple of patch objects (or replace object with a more specific mock type if you prefer), and keep the body unchanged; this satisfies the backend strict typing rule while preserving references to AgentRun, SimpleNamespace, AsyncMock and patch used inside the function.backend/cliff/agents/executor.py (1)
1636-1649:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winGuard
outcome.parse_resultbefore calling_finalize_runin resume path.The
resume_executorpath calls_finalize_runwithoutcome.parse_result(line 1644) without verifying it is notNone. The_PaExecutorOutcomecontract allowsparse_result=None, and a contract violation would crash_finalize_runwith anAttributeErrorinstead of surfacing as a failed run. Theexecute()path added this guard (lines 851-855), but the resume path is still missing it.🐛 Proposed fix to add guard
if outcome.permission_request is not None: return await self._park_for_permission( db, workspace_id, agent_run, outcome, start_time ) + if outcome.parse_result is None: + raise AgentProcessError( + "executor outcome missing parse_result for completed resume" + ) return await self._finalize_run( db, workspace_id=workspace_id, agent_type=agent_run.agent_type, parse_result=outcome.parse_result,As per coding guidelines,
backend/**/*.py: Use Python 3.11+ with strict type hints and Pydantic models for backend code.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/cliff/agents/executor.py` around lines 1636 - 1649, The resume path calls _finalize_run with outcome.parse_result without checking for None; update the resume_executor flow so that before calling _finalize_run (the branch that currently mirrors lines 1636-1649), you guard outcome.parse_result the same way as in execute()—if outcome.parse_result is None, treat the run as failed/handle appropriately (e.g., return a failed run result or raise a controlled error) instead of passing None into _finalize_run; reference outcome.parse_result, _finalize_run, and the resume path that currently calls _park_for_permission to locate where to add the guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@backend/cliff/agents/runtime/tools/mcp.py`:
- Around line 103-110: When server_type == "remote" validate cfg["url"] is a
non-empty string before calling MCPServerStreamableHTTP: retrieve url =
cfg["url"], check isinstance(url, str) and url.strip() is not empty, and if the
check fails raise a clear ValueError (or log and raise) indicating an invalid
remote url for entry_id; only then call _suppress_mcp_v2_deprecation() and
construct MCPServerStreamableHTTP(url=url, headers=cfg.get("headers") or None,
tool_prefix=entry_id).
In `@backend/tests/agents/tools/test_tools.py`:
- Around line 20-27: Add Python type hints to the _ctx helper: annotate
workspace_dir as pathlib.Path | str (or simply Path) and env_vars as dict[str,
str] | None, and annotate the return type as SimpleNamespace. Update the
function signature for _ctx and add any necessary imports (e.g., from pathlib
import Path and from typing import Optional, Dict) so the signature reads
something like _ctx(workspace_dir: Path | str, *, approved: bool = False,
env_vars: Optional[Dict[str, str]] = None) -> SimpleNamespace, keeping the
existing body and behavior unchanged and referencing the same WorkspaceDeps
construction.
---
Duplicate comments:
In `@backend/cliff/agents/executor.py`:
- Around line 1636-1649: The resume path calls _finalize_run with
outcome.parse_result without checking for None; update the resume_executor flow
so that before calling _finalize_run (the branch that currently mirrors lines
1636-1649), you guard outcome.parse_result the same way as in execute()—if
outcome.parse_result is None, treat the run as failed/handle appropriately
(e.g., return a failed run result or raise a controlled error) instead of
passing None into _finalize_run; reference outcome.parse_result, _finalize_run,
and the resume path that currently calls _park_for_permission to locate where to
add the guard.
In `@backend/tests/test_routes_agent_execution.py`:
- Around line 746-848: The async test methods in TestPermissionEndpoint (e.g.,
test_approve_resumes_executor, test_deny_resumes_executor_with_denied,
test_no_pending_returns_404, test_run_owned_by_other_workspace_returns_404) are
missing explicit return type annotations; update each async def to include ->
None (e.g., async def test_approve_resumes_executor(self, app, client) -> None:)
to satisfy the strict Python 3.11+ typing guideline throughout the test class.
- Around line 709-743: The helper _permission_route_patches has no return type
annotation; add a strict type hint on its signature like def
_permission_route_patches(*, pending: bool, run_workspace_id: str = "ws-1") ->
tuple[object, object, object]: to explicitly declare it returns a 3-tuple of
patch objects (or replace object with a more specific mock type if you prefer),
and keep the body unchanged; this satisfies the backend strict typing rule while
preserving references to AgentRun, SimpleNamespace, AsyncMock and patch used
inside the function.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: f50b62be-69c0-463d-bca7-116d1d15ff26
📒 Files selected for processing (11)
backend/cliff/agents/executor.pybackend/cliff/agents/runtime/tools/edit.pybackend/cliff/agents/runtime/tools/mcp.pybackend/cliff/agents/runtime/tools/permissions.pybackend/cliff/agents/runtime/tools/read.pybackend/cliff/agents/runtime/tools/webfetch.pybackend/cliff/api/routes/agent_execution.pybackend/tests/agents/test_deferred_tools_persist.pybackend/tests/agents/tools/test_mcp.pybackend/tests/agents/tools/test_tools.pybackend/tests/test_routes_agent_execution.py
User description
PR #2 of three executing IMPL-0022 against ADR-0047. Moves the remediation_executor — the one tool-using agent — off the OpenCode subprocess and onto in-process Pydantic AI, with a durable human-in-the-loop approval cycle. PR #1 (#221 + #249) migrated the six no-tools agents; PR #3 deletes the leftover OpenCode engine/templates + the app-level normalizer.
Net: +2,718 / −2,127 — a ~2,400-line deletion of subprocess / SSE / permission-queue machinery, replaced by ~900 lines of in-process tools + a DB-marker park/resume cycle. Exactly ADR-0047's "simpler" thesis.
What changed (by slice)
backend/cliff/agents/runtime/tools/—bash,edit,read,webfetch,ghas in-process@agent.toolfunctions, pluspermissions.py(the auto/ask/deny tier classifier ported verbatim from the OpenCode-era_classify_tool_request+ denylists).gate_tool_callmaps a tier to PA's HITL vocabulary:deny→ModelRetry(model pivots),ask→ApprovalRequireduntil approved,auto→run.runtime/remediation_executor.py—Agent(output_type=[RemediationExecutorOutput, DeferredToolRequests], tools=…, toolsets=<mcp>). SYSTEM_PROMPT lifted from the Jinja template (all seven hard rules preserved verbatim — each = a real shipped regression).runtime/tools/mcp.pyadapts Cliff's resolved MCP configs to PA toolsets.executor.execute()routesremediation_executorthrough_run_pa_executor. A gated tool →DeferredToolRequestsoutput → migration 023 persistspermission_request+pa_message_historyon the agent_run row, returns the newawaiting_permissionstatus.POST .../permissionresumes viaresume_executor(rebuild agent → deserialize history →ToolApproved/ToolDenied→ re-enter or re-park). Shared_finalize_run/_resolve_active_model/_run_pa_callextracted so initial-run and resume reach identical persistence + retry./agent-execution/streamroute + theIssueSidePanelEventSource were only ever a refetch nudge over the polledagent-runsquery (which is the source of truth and renders the prompt from the durable marker). Both deleted; the frontend relies on the 3s/5s poll._send_and_collect,_collect_response,approve_tool/deny_tool/push_permission_event/ensure_permission_queue/_patch_permission_marker,_PendingApproval, the dead string-based rate-limit classifier, and the in-memory permission queue._active_runs/check_not_busystay (run-tracking).What did NOT change
ModelRetry/ApprovalRequired).repo_workspace_runner.py's own_send_and_collect(the repo-action generators) and_STRUCTURED_OUTPUT_CONTRACTS/build_agent_prompt— OpenCode prompt machinery left for PR Switch license from MIT to AGPL-3.0 for open-core model #3.Deviations from IMPL-0022 (verified against PA runtime)
deny→ModelRetry, notValueError. PA propagates a raw toolValueError(crashes the run);ModelRetryis what PA converts to a model-visible retry — matching ADR-0047's stated "the model gets a deterministic error and pivots." (IMPL text wants a one-line update.)Test plan
uv run ruff check cliff/ tests/— cleanwiz-s3-publicLLM-eval flake, untouched by this branch)test_deferred_tools_persist.py(realFunctionModelpark→resume + corrupt-state→failed),test_remediation_executor.py(defer/deny/complete),test_executor_provider_chain.py(6 providers),tools/(44), rate-limit retargeted to the PA_run_pa_callpathdeny_message)cliffsec selftestran the full no-tools pipeline in-process to the plan gate, no regression. Transcript:cliff-os/docs/qa/migration-smoke/PR2-executor//code-review(recall, 7 angles) — one confirmed reliability gap fixed (resume_executorno longer wedges a run on a corruptpa_message_history); two findings refuted (DB lifetime, edit-approve semantics); cleanup deferred to PR Switch license from MIT to AGPL-3.0 for open-core model #3rm -rfapproval) + feat: Phase 3 — Persistence & Domain Model #5 (validator retry) — blocked on a GitHub re-auth (expired token → executor can't open a PR, independent of the migration). Park/resume is unit-covered; best finished interactively post-merge.Skipped tests (un-skipped/removed in PR #3)
13 OpenCode-machinery tests carry explicit
PR2.Eskip reasons (same staged pattern PR #1 used). Their behaviour is replaced bytest_deferred_tools_persist.py+ the route tests; the dead machinery they exercise is deleted in PR #3.Stacked PRs
chore/drop-opencode-substrate) — deletebackend/cliff/engine/, templates + template engine,repo_workspace_runnerOpenCode path, the dead constants this review flagged, regen frontend OpenAPI types, and replace the singleton OpenCode (ADR-0022 normalizer) with a small PA agent.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Generated description
Below is a concise technical summary of the changes proposed in this PR:
Migrate the remediation_executor path in
AgentExecutoronto theremediation_executorPydantic AI agent so tool-using runs execute in-process with deferred-tool outputs and HITL resume markers. Update the POST/permissionresume endpoint, pipeline handling, migration 023, and frontend polling so permission decisions rely on the durable agent_run marker instead of the deleted SSE/subprocess plumbing.AgentExecutorto run the remediation_executor via_run_pa_executor/resume_executor, persist DeferredToolRequests markers and message history, and finalize the paused result through the POST/permissionflow while the pipeline defers onawaiting_permissioninstead of failing.Modified files (12)
Latest Contributors(1)
Modified files (6)
Latest Contributors(1)
bash/edit/read/webfetch/ghsafely and retain the existing auto/ask/deny policy under Pydantic AI.Modified files (13)
Latest Contributors(1)