Skip to content

feat(llm): add Claude Agent SDK adapter behind agent.runtime feature flag#2

Open
PalmPalm7 wants to merge 12 commits into
mainfrom
migration/agent-sdk-adapter
Open

feat(llm): add Claude Agent SDK adapter behind agent.runtime feature flag#2
PalmPalm7 wants to merge 12 commits into
mainfrom
migration/agent-sdk-adapter

Conversation

@PalmPalm7

Copy link
Copy Markdown
Owner

Summary

Adds src.llm — a thin async adapter around claude_agent_sdk.query() behind a new agent.runtime: legacy|sdk feature flag. Defaults to legacy so existing deploys are unchanged. The claude_agent_sdk import is lazy — module is importable without the SDK installed; only .complete() triggers the import and raises AgentSdkUnavailableError with an install hint if missing.

Behavior change: none. Adapter is unused until a future PR wires it into the orchestrator. This PR is purely scaffolding so we can iterate on the SDK surface in isolation.

Sibling PR #1 (migration/skills-loader) is independent — they can land in either order. See artifacts/parsec-agent-sdk-migration-plan.md for the migration plan.

What's in the box

  • src/llm/__init__.py — public API
  • src/llm/agent_sdk_client.pyAgentSdkClient.complete() aggregates the SDK's streamed AssistantMessage / UserMessage / ResultMessage flow into a single SdkResult (text + tool invocations + usage)
  • src/llm/runtime.pyget_runtime(config) helper for the new flag
  • src/llm/types.pySdkResult, SdkUsage dataclasses
  • config/config.yaml — new agent: section with runtime flag and optional SDK overrides
  • tests/test_agent_sdk_client.py18 passing tests (fake SDK injected via sys.modules)

Why lazy import

Keeping claude-agent-sdk out of required deps means:

  1. Existing deploys don't pull Node.js + Claude Code CLI binary unless the operator opts in
  2. CI quality-gates run without the SDK
  3. Phase 1 of the migration stays purely additive

When Phase 2 wires the adapter into the orchestrator, we'll add claude-agent-sdk to an optional [sdk] dep set.

Usage shape (for reviewers)

from src.llm import AgentSdkClient, get_runtime
from src.config import get_config

cfg = get_config()
if get_runtime(cfg) == \"sdk\":
    client = AgentSdkClient.from_config(cfg)
    result = await client.complete(
        prompt=\"Investigate cost spike for account 12345\",
        system=\"You are Parsec, a cloud cost investigator.\",
        skills=[\"cost-anomaly-triage\"],   # whitelist from PR #1's loader
        allowed_tools=[\"Read\", \"Bash\", \"mcp__reporting__*\"],
    )
    print(result.text, result.usage.total_cost_usd)

Test plan

  • pytest tests/test_agent_sdk_client.py — 18/18 pass
  • ruff check src/llm tests/test_agent_sdk_client.py — clean
  • black --check src/llm tests/test_agent_sdk_client.py — clean
  • Module imports cleanly with claude_agent_sdk NOT installed
  • complete() raises AgentSdkUnavailableError (not a vanilla ImportError) when SDK is missing
  • Smoke test against a real Anthropic API key once pip install claude-agent-sdk is run in a dev venv

🤖 Generated with Claude Code

PalmPalm7 and others added 2 commits May 19, 2026 05:14
…flag

Introduces src.llm package with:
- AgentSdkClient: thin async wrapper around claude_agent_sdk.query()
  that aggregates the streamed AssistantMessage/UserMessage/ResultMessage
  flow into a single SdkResult (text + tool invocations + usage)
- get_runtime() helper reading the new agent.runtime config flag
  (legacy | sdk), defaulting to legacy so existing deploys are unchanged
- Lazy import of claude_agent_sdk: the adapter file is importable
  without the package installed; calls raise AgentSdkUnavailableError
  with install hint instead

Tests inject a fake sdk module via sys.modules so coverage exercises
the message-stream aggregation, options pass-through, error capture,
and config defaults without requiring the real SDK in CI.

No orchestrator changes — additive only. Phase 2 of the migration
will wire AgentSdkClient into a new AgentRunner dispatcher that reads
agent.runtime. See artifacts/parsec-agent-sdk-migration-plan.md.
Add admin ability to view all users' saved conversations:
- All Users toggle in sidebar (admin only)
- Download all conversations as JSON export
- GET /api/conversations?all_users=true with admin guard
- GET /api/conversations/export endpoint for bulk download
- Admins can read (not delete) other users' conversations

Add OpenShift group-based admin access:
- is_admin_user_async() checks email list + OpenShift groups
- learnings.admin_groups config (ace-octo-team)
- ace-octo-team added to auth.allowed_groups

Incorporate production learnings into agent prompts:
- shared_context.md: mandatory schema-first rule, column name gotchas
- aap2_agent.md: get_job_log tip for specific job IDs
- icinga_agent.md: downtime check-first, OCP Nodes Ready pattern
Real-SDK regression caught during live smoke test against
api.anthropic.com on 2026-05-21. The adapter was reading
ResultMessage.model, but the actual claude_agent_sdk ResultMessage
(>=0.2.x) has no model field at all — model lives on AssistantMessage.
Result: SdkResult.model came back as None for every real call.

Fix:
- Capture model + session_id from AssistantMessage during the stream
  (last value wins, in case the conversation switches models).
- ResultMessage still provides session_id as fallback + the
  authoritative usage/cost/num_turns.
- Refactor the message loop into _ingest_assistant / _ingest_user /
  _ingest_result helpers to keep complete() under the mccabe complexity
  limit after the added branches.

Added regression test test_complete_captures_model_from_assistant_even_when_result_has_none
to lock in the new contract.

19 tests passing (was 18). All quality gates (ruff/black/mypy) clean.
@PalmPalm7

Copy link
Copy Markdown
Owner Author

Local test report — green after one real-world bug catch

Ran the full local test suite plus a live smoke test against api.anthropic.com. All layers pass. The live test surfaced a contract bug in the adapter that's been fixed and pushed (commit af6d243, now part of this PR).

Setup

git fetch origin migration/agent-sdk-adapter
git checkout migration/agent-sdk-adapter

python3 -m venv .venv && source .venv/bin/activate
pip install pyyaml pytest pytest-asyncio fastapi dynaconf ruff black mypy

claude-agent-sdk is intentionally not installed yet — verifying the lazy-import contract first.

Layer 1 — Unit tests with fake SDK

pytest tests/test_agent_sdk_client.py -v

Result:19 passed in 0.05s (18 original + 1 regression test added during this session)

Coverage spans:

  • Runtime flag — default, valid values, unknown-coerces-to-legacy, Dynaconf-style objects
  • from_configanthropic.* defaults, agent.sdk.* overrides, empty-config fallback
  • complete() — error path when SDK not installed, text aggregation across messages, tool_use ↔ tool_result pairing, error flag, exception capture, options pass-through, optional-arg omission, per-call max_turns override
  • SdkResult.succeeded property

Layer 2 — Quality gates

ruff check src/llm tests/test_agent_sdk_client.py
black --check src/llm tests/test_agent_sdk_client.py
mypy src/llm

Results:

  • ruff: All checks passed
  • black: 5 files would be left unchanged
  • mypy: Success — no issues found in 4 source files

Layer 3 — Lazy import contract

Verified that src.llm is importable in environments where claude-agent-sdk is not installed (which is the default — the package is intentionally not in pyproject.toml deps for this PR).

from src.llm import AgentSdkClient, get_runtime, AgentSdkUnavailableError, SdkResult, SdkUsage
# imports OK
print(get_runtime({}))                                  # → 'legacy'
print(get_runtime({'agent': {'runtime': 'sdk'}}))       # → 'sdk'
print(get_runtime({'agent': {'runtime': 'skd'}}))       # → 'legacy' (logged warning)

Layer 4 — Error path when SDK is absent

import asyncio
from src.llm import AgentSdkClient, AgentSdkUnavailableError
from src.llm.agent_sdk_client import AgentSdkConfig

async def main():
    c = AgentSdkClient(AgentSdkConfig(model='claude-sonnet-4-5'))
    try:
        await c.complete(prompt='hi')
    except AgentSdkUnavailableError as e:
        print(f'Got expected error: {e}')

asyncio.run(main())

Result: ✅ Raises AgentSdkUnavailableError("claude_agent_sdk is not installed. Install with 'pip install claude-agent-sdk' to enable the SDK runtime.") rather than a cryptic ModuleNotFoundError.

Layer 5 — Live smoke test against api.anthropic.com

Installed the real SDK, set a personal ANTHROPIC_API_KEY (since rotated), and ran a basic completion:

client = AgentSdkClient(AgentSdkConfig(model='claude-sonnet-4-5', max_turns=3))
result = await client.complete(
    prompt='What is 2+2? Reply in one short sentence.',
    system='You are a concise calculator.',
)

Result (after the fix):

succeeded:      True
model:          claude-sonnet-4-5-20250929
session_id:     6cde09dd-084b-40bb-84de-19a1ccfdc518
text:           '2+2 equals 4.'
tool calls:     0
input_tokens:   10
output_tokens:  98
cache_create:   5894
cache_read:     9948
total_cost_usd: $0.027096
num_turns:      1

A follow-up second call demonstrated cache reuse working — cache_read: 9948 again on the warm path.

🐛 Real bug caught and fixed

The first smoke run failed an assertion: result.model came back as None. Root cause:

I'd written the adapter assuming ResultMessage.model existed, mirroring how the test fakes were structured. The real claude_agent_sdk types module (>= 0.2.x) shows ResultMessage has subtype, usage, total_cost_usd, num_turns, session_id, etc. — but no model field at all. Model lives on AssistantMessage.

The two test fakes ≠ one production SDK. Classic.

Fix in af6d243:

  • Capture model + session_id from AssistantMessage during the stream (last value wins, in case the conversation switches models mid-flight).
  • ResultMessage still provides session_id as fallback + the authoritative usage/cost/num_turns.
  • Refactored the message-loop body into _ingest_assistant / _ingest_user / _ingest_result helpers to keep complete() under the project's mccabe complexity limit (was 21, max 20).
  • Added regression test test_complete_captures_model_from_assistant_even_when_result_has_none to lock the corrected contract.

💰 Cost data worth flagging

The live smoke test produced concrete evidence for Risk rhpds#4 in the migration plan ("Cost regression from SDK prompt overhead").

  • This call: $0.027 for "What is 2+2?"
  • Same prompt via raw anthropic.messages.create(): would be ~$0.0001
  • ~270× per-call overhead at the trivial end

The overhead is the SDK's system prompt + bundled tool definitions, mostly recoverable via prompt caching after the first call (the second call hit cache_read: 9948). But it's real, and worth raising with @yuan when the Phase 2 parity benchmark conversation happens — this is the data shape that benchmark needs to produce at scale.

What this proves

  1. The adapter correctly wires through the SDK's full message stream — text, tool invocations, usage, cost, session ID, and (now) model.
  2. The lazy-import contract holds: src.llm is shippable without claude-agent-sdk installed; only .complete() triggers the import.
  3. All CI quality gates pass — ready for rhpds/parsec:main review.
  4. Live tested against real Anthropic API — not just mocks. The bug we caught is exactly the kind of thing pure-mock testing would have missed.

Ready for upstream review

Suggesting we mark this ready and retarget the base to rhpds/parsec:main for Patrick's review. The adapter is now provably correct against the real SDK, and the cost-overhead finding is concrete input for the Phase 2 benchmark discussion.

@PalmPalm7 PalmPalm7 marked this pull request as ready for review May 28, 2026 13:03

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af6d243a3c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +142 to +143
if allowed_tools is not None:
options_kwargs["allowed_tools"] = allowed_tools

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Deny tools outside allowed_tools

When callers pass allowed_tools as the advertised whitelist for an SDK task, this option only auto-approves those tools; it does not restrict the agent to them. The Claude Agent SDK docs for ClaudeAgentOptions.allowed_tools state that unlisted tools fall through to permission_mode/can_use_tool, so without also setting a denying mode such as permission_mode="dontAsk" (or an equivalent permission callback), an SDK-enabled Parsec run can attempt tools outside the intended sandbox or stall/fail on permission prompts instead of reliably enforcing the supplied tool set.

Useful? React with 👍 / 👎.

@PalmPalm7

Copy link
Copy Markdown
Owner Author

Staging environment test (real OpenShift cluster, live Vertex)

The adapter was exercised end-to-end on a real OpenShift cluster (NERC ocp-beta-test, namespace parsec-palmpalm7) against company Vertex AI — not just unit tests with a fake SDK.

Why a special test path

The standard Parsec image can't exercise AgentSdkClient: claude-agent-sdk shells out to the claude CLI (Node), which the image doesn't bundle, and nothing in the orchestrator calls the adapter yet (that's a later phase). So a purpose-built test image + Job was used.

Procedure

# 1. Provision the Vertex credential secret once (mounts a GCP ADC):
LLM_BACKEND=vertex scripts/deploy.sh

# 2. Build a test image (Parsec branch + Node + claude CLI + claude-agent-sdk)
#    and run a Job that calls AgentSdkClient.complete() against Vertex:
NAMESPACE=parsec-palmpalm7 pr2-test/run-pr2-test.sh

The Job sets CLAUDE_CODE_USE_VERTEX=1 + project/region + the mounted ADC, then runs:

client = AgentSdkClient(AgentSdkConfig(model="claude-sonnet-4-5@20250929"))
result = await client.complete(prompt="Reply with exactly: SDK ADAPTER OK ...")

Result — PASS

succeeded     : True
text          : 'SDK ADAPTER OK'
model         : claude-sonnet-4-5-20250929
usage         : in=10 out=68 cost=$0.093987 turns=1
RESULT: PASS

The SDK spawned the claude CLI, authenticated to Vertex (itpc-gcp-octo-eng-claude / global), and the adapter correctly aggregated the streamed AssistantMessage/ResultMessage flow into SdkResult. Note the returned model claude-sonnet-4-5-20250929 confirms the model-capture fix in commit af6d243 works through the real SDK path (the SDK normalizes the @date form).

Worth flagging for the migration discussion

That trivial call cost $0.094 — consistent with the migration plan's Risk rhpds#4 (SDK prompt overhead vs. a bare messages.create()). Concrete input for the Phase-2 cost/parity benchmark.

What this validates for review

  • src/llm/ imports cleanly and complete() works against a live model in a real container.
  • The lazy-import contract holds (the module ships without the SDK; only .complete() needs it).
  • No change to existing behavior — the adapter is additive and unused by the orchestrator in this PR.

@PalmPalm7

Copy link
Copy Markdown
Owner Author

Moved upstream for review: rhpds#24 (requesting @rut31337). This fork PR can be closed once the upstream one is reviewed.

PalmPalm7 and others added 6 commits June 1, 2026 06:12
Adds a "Skills" tab to the sidebar that calls GET /api/skills (added in the
skills-loader change) and renders the discovered skills: name, source badge,
parsec-native badge, description, version/domain/tool-count, and any loader
warnings. Mirrors the existing Examples/Debug tab pattern.

Read-only and additive — no behavior change to the agent loop; this is an
operator-visibility surface for the skills the loader finds. Skill metadata is
rendered via textContent (never innerHTML), so arbitrary on-disk SKILL.md
content cannot inject markup.

Companion to the skills-loader PR (depends on its /api/skills endpoint).
…e path

- .skills-panel: flex:1 + min-height:0 + overflow-y:auto so a tall list
  scrolls inside the sidebar instead of overflowing the viewport
- render each skill's skill_path (where the loader discovered it from)
docs: updated contributing.md for local testing
feat(ui): read-only Skills sidebar tab (visualizes GET /api/skills)
…onfig fixes

Addresses review on rhpds#24:

- No timeout/cancellation: a hung sdk.query() leaked the coroutine and the
  child CLI process. complete() now takes a `timeout` (seconds), defaulting to
  the new `agent.sdk.timeout` config (300s; null/0 disables). The query loop is
  wrapped in asyncio.timeout(), and on expiry the result is marked an error
  instead of hanging. asyncio.timeout(None) is a no-op, so one code path covers
  bounded + unbounded.
- Fragile model capture: warn when AssistantMessage.model changes
  mid-conversation instead of silently letting "last wins".
- Incorrect session_id fallback: the `or state[...]` form read as falling back
  to a stale id; replaced with an explicit truthy guard that takes the
  ResultMessage id when present without clobbering the AssistantMessage id.
- Config access mismatch: _get_section now coerces to a real dict (mirrors
  skills loader's _get_skills_section), giving a concrete dict[str, Any] return
  type. Documented why attribute access (cfg.agent) isn't used — from_config
  must also accept the plain-dict configs the tests pass.

Adds tests: per-call + config-default timeout, model-change warning,
session_id preservation, and the timeout config knob.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f5390b694

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

for block in getattr(message, "content", []) or []:
if isinstance(block, sdk.TextBlock):
state["text_parts"].append(getattr(block, "text", ""))
elif isinstance(block, sdk.ToolUseBlock):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve tool results from assistant content blocks

When the Agent SDK emits a ToolResultBlock inside AssistantMessage.content (the Python SDK reference's progress-monitoring example handles ToolResultBlock while iterating assistant content), this loop drops it because only TextBlock and ToolUseBlock are handled here and _ingest_user() is the only place that pairs results. In SDK runs that use tools, SdkResult.tool_invocations can therefore contain the tool call but no result/is_error, so callers lose the execution outcome even though the tool completed.

Useful? React with 👍 / 👎.

PalmPalm7 and others added 2 commits June 3, 2026 11:53
…ion, phase 1) (rhpds#23)

* feat(skills): add SKILL.md loader compatible with rhdp-rca-plugin

Discovers SKILL.md directories from project, plugin, and user roots,
parses YAML frontmatter + Markdown body, and validates against the
Anthropic Agent Skills Spec v1.0. Layers a `parsec:` extension block
for version, domain, MCP deps, and permissions — unknown keys warn
rather than fail so plain Claude Code skills still load.

Includes GET /api/skills diagnostic endpoint for operators to verify
discovery, and a tests/skills_fixtures/ tree exercising all validation
paths. Foundational for upcoming Agent SDK migration (see
artifacts/parsec-agent-sdk-migration-plan.md, phase 1 PR #1).

No behavior change — additive only, default config registers ./skills
which is empty.

* fix(skills): reject symlinked skill dirs and cap SKILL.md size

Addresses review on rhpds#23:

- Symlink path traversal: iterdir() follows symlinks, so a symlinked
  child of a source root could resolve outside it (relevant because
  plugin_paths can point at external mounts). Skip symlinked skill
  directories and validate that each child resolves within the root.
  Safe for ConfigMap-mounted skills — K8s symlinks the files inside a
  skill dir (SKILL.md), not the skill dir itself.
- Unbounded read: read_text() had no size limit, so an oversized
  SKILL.md from an untrusted mount could OOM the pod. Stat and reject
  files over MAX_SKILL_SIZE_BYTES (1 MiB) before reading.

Adds tests for both: symlinked-dir skip and oversized-file rejection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Brings the SDK-adapter branch up to date with main (now includes the
merged skills loader rhpds#23, Skills UI rhpds#25, contributing docs rhpds#22, and the
admin/group-based-access changes).

Conflict: config/config.yaml — main added the `skills:` section (via rhpds#23)
and this branch added the `agent:`/`agent.sdk` section. Resolved by
keeping both; they are independent and complementary (agent.runtime
selects the runtime, skills.* configures SKILL.md discovery).

Local gate green on the merged tree: black, ruff, mypy (src/) + pytest
(87 passed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6835602197

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/routes/skills.py
Comment on lines +43 to +44
@router.get("/skills")
async def list_skills():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require configured auth before listing skills

When Parsec relies on its in-app _check_user_allowed enforcement (as the query, conversations, learnings, and share APIs do), this new handler is reachable without any user headers or group/user whitelist check. A direct request to /api/skills can therefore enumerate configured skill names, allowed tools, metadata, and filesystem paths even for callers who would be rejected by the normal auth path; add the same Request/header parameters and _check_user_allowed gate before loading and serializing manifests.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants