feat(llm): add Claude Agent SDK adapter behind agent.runtime feature flag#2
feat(llm): add Claude Agent SDK adapter behind agent.runtime feature flag#2PalmPalm7 wants to merge 12 commits into
Conversation
…flag Introduces src.llm package with: - AgentSdkClient: thin async wrapper around claude_agent_sdk.query() that aggregates the streamed AssistantMessage/UserMessage/ResultMessage flow into a single SdkResult (text + tool invocations + usage) - get_runtime() helper reading the new agent.runtime config flag (legacy | sdk), defaulting to legacy so existing deploys are unchanged - Lazy import of claude_agent_sdk: the adapter file is importable without the package installed; calls raise AgentSdkUnavailableError with install hint instead Tests inject a fake sdk module via sys.modules so coverage exercises the message-stream aggregation, options pass-through, error capture, and config defaults without requiring the real SDK in CI. No orchestrator changes — additive only. Phase 2 of the migration will wire AgentSdkClient into a new AgentRunner dispatcher that reads agent.runtime. See artifacts/parsec-agent-sdk-migration-plan.md.
Add admin ability to view all users' saved conversations: - All Users toggle in sidebar (admin only) - Download all conversations as JSON export - GET /api/conversations?all_users=true with admin guard - GET /api/conversations/export endpoint for bulk download - Admins can read (not delete) other users' conversations Add OpenShift group-based admin access: - is_admin_user_async() checks email list + OpenShift groups - learnings.admin_groups config (ace-octo-team) - ace-octo-team added to auth.allowed_groups Incorporate production learnings into agent prompts: - shared_context.md: mandatory schema-first rule, column name gotchas - aap2_agent.md: get_job_log tip for specific job IDs - icinga_agent.md: downtime check-first, OCP Nodes Ready pattern
Real-SDK regression caught during live smoke test against api.anthropic.com on 2026-05-21. The adapter was reading ResultMessage.model, but the actual claude_agent_sdk ResultMessage (>=0.2.x) has no model field at all — model lives on AssistantMessage. Result: SdkResult.model came back as None for every real call. Fix: - Capture model + session_id from AssistantMessage during the stream (last value wins, in case the conversation switches models). - ResultMessage still provides session_id as fallback + the authoritative usage/cost/num_turns. - Refactor the message loop into _ingest_assistant / _ingest_user / _ingest_result helpers to keep complete() under the mccabe complexity limit after the added branches. Added regression test test_complete_captures_model_from_assistant_even_when_result_has_none to lock in the new contract. 19 tests passing (was 18). All quality gates (ruff/black/mypy) clean.
Local test report — green after one real-world bug catchRan the full local test suite plus a live smoke test against Setupgit fetch origin migration/agent-sdk-adapter
git checkout migration/agent-sdk-adapter
python3 -m venv .venv && source .venv/bin/activate
pip install pyyaml pytest pytest-asyncio fastapi dynaconf ruff black mypy
Layer 1 — Unit tests with fake SDKpytest tests/test_agent_sdk_client.py -vResult: ✅ Coverage spans:
Layer 2 — Quality gatesruff check src/llm tests/test_agent_sdk_client.py
black --check src/llm tests/test_agent_sdk_client.py
mypy src/llmResults:
Layer 3 — Lazy import contractVerified that from src.llm import AgentSdkClient, get_runtime, AgentSdkUnavailableError, SdkResult, SdkUsage
# imports OK
print(get_runtime({})) # → 'legacy'
print(get_runtime({'agent': {'runtime': 'sdk'}})) # → 'sdk'
print(get_runtime({'agent': {'runtime': 'skd'}})) # → 'legacy' (logged warning)Layer 4 — Error path when SDK is absentimport asyncio
from src.llm import AgentSdkClient, AgentSdkUnavailableError
from src.llm.agent_sdk_client import AgentSdkConfig
async def main():
c = AgentSdkClient(AgentSdkConfig(model='claude-sonnet-4-5'))
try:
await c.complete(prompt='hi')
except AgentSdkUnavailableError as e:
print(f'Got expected error: {e}')
asyncio.run(main())Result: ✅ Raises Layer 5 — Live smoke test against
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af6d243a3c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if allowed_tools is not None: | ||
| options_kwargs["allowed_tools"] = allowed_tools |
There was a problem hiding this comment.
Deny tools outside allowed_tools
When callers pass allowed_tools as the advertised whitelist for an SDK task, this option only auto-approves those tools; it does not restrict the agent to them. The Claude Agent SDK docs for ClaudeAgentOptions.allowed_tools state that unlisted tools fall through to permission_mode/can_use_tool, so without also setting a denying mode such as permission_mode="dontAsk" (or an equivalent permission callback), an SDK-enabled Parsec run can attempt tools outside the intended sandbox or stall/fail on permission prompts instead of reliably enforcing the supplied tool set.
Useful? React with 👍 / 👎.
Staging environment test (real OpenShift cluster, live Vertex)The adapter was exercised end-to-end on a real OpenShift cluster (NERC Why a special test pathThe standard Parsec image can't exercise Procedure# 1. Provision the Vertex credential secret once (mounts a GCP ADC):
LLM_BACKEND=vertex scripts/deploy.sh
# 2. Build a test image (Parsec branch + Node + claude CLI + claude-agent-sdk)
# and run a Job that calls AgentSdkClient.complete() against Vertex:
NAMESPACE=parsec-palmpalm7 pr2-test/run-pr2-test.shThe Job sets client = AgentSdkClient(AgentSdkConfig(model="claude-sonnet-4-5@20250929"))
result = await client.complete(prompt="Reply with exactly: SDK ADAPTER OK ...")Result — PASSThe SDK spawned the Worth flagging for the migration discussionThat trivial call cost $0.094 — consistent with the migration plan's Risk rhpds#4 (SDK prompt overhead vs. a bare What this validates for review
|
Adds a "Skills" tab to the sidebar that calls GET /api/skills (added in the skills-loader change) and renders the discovered skills: name, source badge, parsec-native badge, description, version/domain/tool-count, and any loader warnings. Mirrors the existing Examples/Debug tab pattern. Read-only and additive — no behavior change to the agent loop; this is an operator-visibility surface for the skills the loader finds. Skill metadata is rendered via textContent (never innerHTML), so arbitrary on-disk SKILL.md content cannot inject markup. Companion to the skills-loader PR (depends on its /api/skills endpoint).
…e path - .skills-panel: flex:1 + min-height:0 + overflow-y:auto so a tall list scrolls inside the sidebar instead of overflowing the viewport - render each skill's skill_path (where the loader discovered it from)
docs: updated contributing.md for local testing
feat(ui): read-only Skills sidebar tab (visualizes GET /api/skills)
…onfig fixes Addresses review on rhpds#24: - No timeout/cancellation: a hung sdk.query() leaked the coroutine and the child CLI process. complete() now takes a `timeout` (seconds), defaulting to the new `agent.sdk.timeout` config (300s; null/0 disables). The query loop is wrapped in asyncio.timeout(), and on expiry the result is marked an error instead of hanging. asyncio.timeout(None) is a no-op, so one code path covers bounded + unbounded. - Fragile model capture: warn when AssistantMessage.model changes mid-conversation instead of silently letting "last wins". - Incorrect session_id fallback: the `or state[...]` form read as falling back to a stale id; replaced with an explicit truthy guard that takes the ResultMessage id when present without clobbering the AssistantMessage id. - Config access mismatch: _get_section now coerces to a real dict (mirrors skills loader's _get_skills_section), giving a concrete dict[str, Any] return type. Documented why attribute access (cfg.agent) isn't used — from_config must also accept the plain-dict configs the tests pass. Adds tests: per-call + config-default timeout, model-change warning, session_id preservation, and the timeout config knob. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f5390b694
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for block in getattr(message, "content", []) or []: | ||
| if isinstance(block, sdk.TextBlock): | ||
| state["text_parts"].append(getattr(block, "text", "")) | ||
| elif isinstance(block, sdk.ToolUseBlock): |
There was a problem hiding this comment.
Preserve tool results from assistant content blocks
When the Agent SDK emits a ToolResultBlock inside AssistantMessage.content (the Python SDK reference's progress-monitoring example handles ToolResultBlock while iterating assistant content), this loop drops it because only TextBlock and ToolUseBlock are handled here and _ingest_user() is the only place that pairs results. In SDK runs that use tools, SdkResult.tool_invocations can therefore contain the tool call but no result/is_error, so callers lose the execution outcome even though the tool completed.
Useful? React with 👍 / 👎.
…ion, phase 1) (rhpds#23) * feat(skills): add SKILL.md loader compatible with rhdp-rca-plugin Discovers SKILL.md directories from project, plugin, and user roots, parses YAML frontmatter + Markdown body, and validates against the Anthropic Agent Skills Spec v1.0. Layers a `parsec:` extension block for version, domain, MCP deps, and permissions — unknown keys warn rather than fail so plain Claude Code skills still load. Includes GET /api/skills diagnostic endpoint for operators to verify discovery, and a tests/skills_fixtures/ tree exercising all validation paths. Foundational for upcoming Agent SDK migration (see artifacts/parsec-agent-sdk-migration-plan.md, phase 1 PR #1). No behavior change — additive only, default config registers ./skills which is empty. * fix(skills): reject symlinked skill dirs and cap SKILL.md size Addresses review on rhpds#23: - Symlink path traversal: iterdir() follows symlinks, so a symlinked child of a source root could resolve outside it (relevant because plugin_paths can point at external mounts). Skip symlinked skill directories and validate that each child resolves within the root. Safe for ConfigMap-mounted skills — K8s symlinks the files inside a skill dir (SKILL.md), not the skill dir itself. - Unbounded read: read_text() had no size limit, so an oversized SKILL.md from an untrusted mount could OOM the pod. Stat and reject files over MAX_SKILL_SIZE_BYTES (1 MiB) before reading. Adds tests for both: symlinked-dir skip and oversized-file rejection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Brings the SDK-adapter branch up to date with main (now includes the merged skills loader rhpds#23, Skills UI rhpds#25, contributing docs rhpds#22, and the admin/group-based-access changes). Conflict: config/config.yaml — main added the `skills:` section (via rhpds#23) and this branch added the `agent:`/`agent.sdk` section. Resolved by keeping both; they are independent and complementary (agent.runtime selects the runtime, skills.* configures SKILL.md discovery). Local gate green on the merged tree: black, ruff, mypy (src/) + pytest (87 passed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6835602197
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @router.get("/skills") | ||
| async def list_skills(): |
There was a problem hiding this comment.
Require configured auth before listing skills
When Parsec relies on its in-app _check_user_allowed enforcement (as the query, conversations, learnings, and share APIs do), this new handler is reachable without any user headers or group/user whitelist check. A direct request to /api/skills can therefore enumerate configured skill names, allowed tools, metadata, and filesystem paths even for callers who would be rejected by the normal auth path; add the same Request/header parameters and _check_user_allowed gate before loading and serializing manifests.
Useful? React with 👍 / 👎.
Summary
Adds
src.llm— a thin async adapter aroundclaude_agent_sdk.query()behind a newagent.runtime: legacy|sdkfeature flag. Defaults tolegacyso existing deploys are unchanged. Theclaude_agent_sdkimport is lazy — module is importable without the SDK installed; only.complete()triggers the import and raisesAgentSdkUnavailableErrorwith an install hint if missing.Behavior change: none. Adapter is unused until a future PR wires it into the orchestrator. This PR is purely scaffolding so we can iterate on the SDK surface in isolation.
Sibling PR #1 (
migration/skills-loader) is independent — they can land in either order. Seeartifacts/parsec-agent-sdk-migration-plan.mdfor the migration plan.What's in the box
src/llm/__init__.py— public APIsrc/llm/agent_sdk_client.py—AgentSdkClient.complete()aggregates the SDK's streamedAssistantMessage/UserMessage/ResultMessageflow into a singleSdkResult(text + tool invocations + usage)src/llm/runtime.py—get_runtime(config)helper for the new flagsrc/llm/types.py—SdkResult,SdkUsagedataclassesconfig/config.yaml— newagent:section with runtime flag and optional SDK overridestests/test_agent_sdk_client.py— 18 passing tests (fake SDK injected viasys.modules)Why lazy import
Keeping
claude-agent-sdkout of required deps means:When Phase 2 wires the adapter into the orchestrator, we'll add
claude-agent-sdkto an optional[sdk]dep set.Usage shape (for reviewers)
Test plan
pytest tests/test_agent_sdk_client.py— 18/18 passruff check src/llm tests/test_agent_sdk_client.py— cleanblack --check src/llm tests/test_agent_sdk_client.py— cleanclaude_agent_sdkNOT installedcomplete()raisesAgentSdkUnavailableError(not a vanillaImportError) when SDK is missingpip install claude-agent-sdkis run in a dev venv🤖 Generated with Claude Code