feat(agent): Icinga sub-agent on the Agent SDK — skill + profile (migration, phase 2 pilot)#32
feat(agent): Icinga sub-agent on the Agent SDK — skill + profile (migration, phase 2 pilot)#32PalmPalm7 wants to merge 2 commits into
Conversation
Route a sub-agent task to the legacy Anthropic loop or the Claude Agent SDK adapter based on agent.runtime (legacy|sdk, default legacy), returning the same structured result dict either way. Additive and dormant: nothing in the request path imports it yet (mirrors how the rhpds#24 adapter shipped behind the flag). A follow-up PR wires the Icinga sub-agent to dispatch through it. With the default legacy runtime it is a transparent pass-through to src.agent.agents.run_sub_agent — zero behavior change. The SDK branch surfaces token/cost/cache usage under data.usage, which the Phase-2 cost benchmark compares against legacy. 11 tests: runtime resolution, legacy pass-through, SDK result normalization, and SDK-unavailable error handling. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Splits the Icinga sub-agent into a first-class skill + a thin SDK profile, the Phase-2 pilot for running a sub-agent on the Claude Agent SDK. - skills/icinga-triage/SKILL.md: the Icinga triage workflow as an Agent Skill (state model, the two monitoring GitHub repos, the Step-0->diagnose procedure, gated write ops). Loads strict with zero warnings; parsec-native, domain=icinga. This is the "skill = reusable capability" half. - src/agent/icinga_sdk.py: build_icinga_sdk_profile() returns the skill + the monitoring-mcp (SSE) and GitHub (HTTP) MCP servers the legacy query_icinga / github tools already use, so the SDK consumes the same backends directly. Config-only and SDK-import-light (unit-testable). - runner.py: the SDK branch now applies the per-agent profile (sdk_profile_for) — Icinga loads its skill + servers; other agents get an empty profile. The "agent = running instance that loads the skill" half. Still gated by agent.runtime: sdk (default legacy) -> zero behavior change. Skill discovery in-cluster depends on the image baking skills/ (rhpds#27); the runner seam depends on rhpds#30. End-to-end Icinga-on-SDK run is verified in the personal NERC cluster (results to be commented on the PR). 7 Icinga tests + runner suite green; full suite passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Test results.
|
|
In-cluster cost A/B — measured (personal NERC cluster, Vertex
Findings:
Harness: |
Code Review — PR #32 (Draft)Scope: 5 files, +689 lines — Icinga sub-agent as Agent SDK pilot Findings (5)1. 2. When 3. SKILL.md 4. SDK path does not pass 5. Third copy of Cross-PR patterns
|
What
The Phase-2 pilot: run one sub-agent (Icinga) on the Claude Agent SDK, split into the "skill vs agent" model the migration is built around.
skills/icinga-triage/SKILL.md— the Icinga triage workflow as a first-class Agent Skill (state model, the two monitoring GitHub repos, the Step-0→diagnose procedure, gated write ops). Loads strict, 0 warnings,parsec-native,domain: icinga. This is the reusable capability.src/agent/icinga_sdk.py—build_icinga_sdk_profile()returns the skill + the same backends the legacy tools use: themonitoring-mcpsidecar (SSE) and the GitHub MCP (HTTP). Both are real MCP servers, so the SDK consumes them directly — no per-tool shim. Config-only, SDK-import-light → unit-testable.runner.py— the SDK branch now applies a per-agent profile (sdk_profile_for): Icinga loads its skill + servers; other agents get an empty profile. The agent = the running instance that loads the skill.Why
Icinga is the ideal pilot — a clean 3-tool boundary, one MCP, a prompt that maps 1:1 to a SKILL.md, and (from the MLflow baseline) the most expensive legacy sub-agent (~$1.38/query at ~452K uncached input tokens), so it's the biggest cost-saving target for the SDK's prompt-caching.
Scope / safety
Gated by
agent.runtime: sdk(defaultlegacy) → zero behavior change by default. The write-capable Icinga tools (acknowledge_problem/schedule_downtime/…) stay gated in the skill ("only when the user explicitly requests").Dependencies
skills/into the image) — needed for in-cluster skill discovery.How to test
Result (local gate)
black✓ ·ruff✓ ·mypy✓pytest tests/test_icinga_sdk.py→ 7 passed; full suite → 105 passed, no regressionsPlan:
artifacts/parsec-phase2-plan.md.