feat(llm): MLflow tracing for the Agent SDK subprocess (migration, phase 2)#31
feat(llm): MLflow tracing for the Agent SDK subprocess (migration, phase 2)#31PalmPalm7 wants to merge 1 commit into
Conversation
The SDK runs the Claude Code CLI as a child process, so the legacy
app-side MLflow collector doesn't see its work. Trace it the way RHDP
already traces Claude Code itself: by environment, so the subprocess
exports its own claude_code.* spans (tokens, per-turn latency, tool
calls) to the same MLflow server.
- src/llm/sdk_tracing.py: build_tracing_env() derives MLFLOW_* env from
the existing mlflow.* config (uri, experiment, basic-auth); returns {}
when tracking_url is empty, so it's a no-op by default like the
collector. build_hooks_settings() returns the settings.json hooks block
(Stop + SessionStart) for hook-based wiring.
- AgentSdkClient.from_config now merges the tracing env into the SDK
subprocess env automatically; an explicit agent.sdk.env wins on conflict.
10 tracing tests; adapter suite still green. End-to-end span export is
verified in-cluster (MLflow experiment parsec-agent-metrics).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Test results.
|
Code Review — PR #31 (Draft)Scope: 3 files, +214 lines — MLflow tracing for Agent SDK subprocess Findings (2)1. 2. Duplicated No blocking issues. Core tracing logic is correct — |
What
The Agent SDK (#24) runs the Claude Code CLI as a child process, so the legacy app-side MLflow collector (
src/connections/mlflow_tracking.py) never sees its LLM calls. This addssrc/llm/sdk_tracing.pyto trace the SDK path the way RHDP already traces Claude Code itself — by environment — so the subprocess exports its ownclaude_code.*spans (tokens, per-turn latency, tool calls) to the same MLflow server/experiment.Why
Phase-2 needs trustworthy per-call cost/latency/tool numbers to benchmark the SDK path against legacy. This is the measurement layer.
How
build_tracing_env(config)derivesMLFLOW_CLAUDE_TRACING_ENABLED/MLFLOW_TRACKING_URI/MLFLOW_EXPERIMENT_NAME(+ basic-auth) from the existingmlflow.*config — the same reads asinit_mlflow. Returns{}whentracking_urlis empty, so it's a no-op by default, exactly like the collector.AgentSdkClient.from_configmerges that env into the SDK subprocessenvautomatically; an explicitagent.sdk.envwins on conflict.build_hooks_settings()returns thesettings.jsonhooksblock (Stop+SessionStart) for deployments that prefer hook-based wiring.Scope / safety
Additive; only affects the
sdkruntime, and only whenmlflow.tracking_urlis set. No change to the legacy path or default deploy.How to test
Result (local gate)
black✓ ·ruff✓ ·mypy✓pytest tests/test_sdk_tracing.py→ 9 passed; full suite → 96 passed, no regressionsparsec-agent-metricsis pending in-cluster verification — results will be commented below once run.Builds on #24 (SDK adapter) and #30 (AgentRunner). Plan:
artifacts/parsec-agent-sdk-migration-plan.md.