diff --git a/.ai/spec/what/telemetry.md b/.ai/spec/what/telemetry.md new file mode 100644 index 0000000..270ecca --- /dev/null +++ b/.ai/spec/what/telemetry.md @@ -0,0 +1,249 @@ +# Agentic Telemetry Data Model + +## Overview + +The agentic system collects telemetry at multiple levels of detail, designed to be extensible without breaking downstream consumers. + +## Telemetry Tiers + +| Tier | Scope | Description | +|------|-------|-------------| +| Step metrics | Individual agent call | Cost, latency, token counts, model/provider per step | +| Proposal summary | Entire proposal lifecycle | Aggregated outcome, total cost/duration, approval decisions | +| Transcript | Prompt/response/tool calls | Full LLM interaction trace for debugging and analysis | +| Audit trail | Identity, authorization, decisions | Compliance-grade record of who did what and why | + +--- + +## Step Metrics + +Collected from the sandbox agent's response envelope and stored on each Result CR status. The data is available in-cluster but not yet exported to any external telemetry system. + +### Schema: `StepMetrics` + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `latencyMs` | int64 | Yes | Wall-clock time (ms) the agent spent processing | +| `inputTokens` | int64 | No | Input tokens consumed by the LLM | +| `outputTokens` | int64 | No | Output tokens produced by the LLM | +| `costUsd` | string | No | Estimated cost in USD (decimal string, e.g. "0.05"). Omitted when unknown. | +| `model` | string | No | LLM model used (e.g. "claude-opus-4-6") | +| `provider` | string | No | LLM provider (e.g. "anthropic", "openai") | +| `toolCallsCount` | int32 | No | Number of tool invocations the agent made | + +### Storage location + +Each Result CR (AnalysisResult, ExecutionResult, VerificationResult, EscalationResult) has: +```yaml +status: + metrics: + latencyMs: 4500 + inputTokens: 1200 + outputTokens: 350 + costUsd: "0.03" + model: claude-opus-4-6 + provider: anthropic + toolCallsCount: 5 +``` + +--- + +## Proposal Summary + +Aggregated from step metrics + proposal lifecycle events. The proposal summary is a nested structure: proposal-level aggregates at the top, with per-step detail embedded. The per-step detail is derived directly from the Step Metrics tier (read from Result CR `.status.metrics` at terminal phase). + +### Proposal-level fields + +| Field | Type | Source | Description | +|-------|------|--------|-------------| +| `proposalName` | string | Proposal CR | Unique identifier | +| `namespace` | string | Proposal CR | Proposal namespace | +| `finalPhase` | string | DerivePhase() | Terminal phase (Completed, Failed, Denied, Escalated) | +| `totalLatencyMs` | int64 | Sum of steps | End-to-end processing time | +| `totalCostUsd` | string | Sum of steps | Total LLM cost | +| `totalInputTokens` | int64 | Sum of steps | Total input tokens across all steps | +| `totalOutputTokens` | int64 | Sum of steps | Total output tokens across all steps | +| `approvalDecision` | enum | ProposalApproval | Approved, Denied, TimedOut | +| `targetNamespaces` | []string | Proposal spec | Cluster namespaces targeted | +| `createdAt` | timestamp | Proposal CR | When the proposal was created | +| `completedAt` | timestamp | Terminal condition | When it reached terminal state | + +### Per-step fields (nested under `steps[]`) + +Each entry corresponds to a workflow step that ran. Metrics fields come directly from Result CR `.status.metrics`. + +| Field | Type | Source | Description | +|-------|------|--------|-------------| +| `name` | string | Step type | "analysis", "execution", "verification", "escalation" | +| `outcome` | enum | Result CR condition | Succeeded, Failed | +| `model` | string | StepMetrics | LLM model used for this step | +| `provider` | string | StepMetrics | LLM provider used for this step | +| `latencyMs` | int64 | StepMetrics | Wall-clock time for this step | +| `inputTokens` | int64 | StepMetrics | Input tokens for this step | +| `outputTokens` | int64 | StepMetrics | Output tokens for this step | +| `costUsd` | string | StepMetrics | Cost for this step | +| `toolCallsCount` | int32 | StepMetrics | Tool invocations for this step | +| `failureReason` | string | Result CR status | Why this step failed (if applicable) | +| `retryCount` | int32 | Execution status | Retries (execution only) | + +### Example JSON + +```json +{ + "proposalName": "fix-oom-staging", + "namespace": "openshift-lightspeed", + "finalPhase": "Completed", + "totalLatencyMs": 12500, + "totalCostUsd": "0.12", + "totalInputTokens": 3500, + "totalOutputTokens": 900, + "approvalDecision": "Approved", + "targetNamespaces": ["staging"], + "createdAt": "2026-06-16T10:00:00Z", + "completedAt": "2026-06-16T10:02:30Z", + "steps": [ + { + "name": "analysis", + "outcome": "Succeeded", + "model": "claude-opus-4-6", + "provider": "anthropic", + "latencyMs": 4500, + "inputTokens": 1200, + "outputTokens": 350, + "costUsd": "0.03", + "toolCallsCount": 5 + }, + { + "name": "execution", + "outcome": "Succeeded", + "model": "claude-opus-4-6", + "provider": "anthropic", + "latencyMs": 6000, + "inputTokens": 1500, + "outputTokens": 400, + "costUsd": "0.05", + "toolCallsCount": 8, + "retryCount": 0 + }, + { + "name": "verification", + "outcome": "Succeeded", + "model": "claude-opus-4-6", + "provider": "anthropic", + "latencyMs": 2000, + "inputTokens": 800, + "outputTokens": 150, + "costUsd": "0.04", + "toolCallsCount": 3 + } + ] +} +``` + +### Open questions + +- Where is the summary computed? Operator at terminal phase? An exporter job? +- Where is it stored? Proposal status (ephemeral if TTL deletes it), or pushed to external system? +- What's the export target? Dataverse? Prometheus? Both? + +--- + +## Transcript + +Per-step detail including prompt content, tool call traces, and LLM reasoning. This is the **agent's responsibility** — the operator does not collect or store transcript data. The agent logs this information to its own stdout (available via pod logs / cluster log aggregator). + +### Data captured by the agent + +| Field | Type | Description | Notes | +|-------|------|-------------|-------| +| `promptInput` | string | Full prompt sent to LLM | Already in Proposal spec (request text) | +| `llmOutput` | string | Raw LLM response text | Already stored in Result CRs (options, actions, checks) | +| `toolCalls` | []ToolCall | Each tool invocation (name, args, result, duration) | Agent logs only | +| `reasoningTrace` | string | Chain-of-thought / scratchpad (if available) | Agent logs only | +| `inferenceId` | string | Provider-assigned inference ID | Agent logs only | +| `modelVersion` | string | Exact model version/checkpoint | Agent logs only | + +### Per tool call + +| Field | Type | Description | +|-------|------|-------------| +| `name` | string | Tool name (e.g. "kubectl_get") | +| `arguments` | string | Arguments passed | +| `result` | string | Tool output (may contain cluster state) | +| `durationMs` | int64 | Time spent in tool | +| `success` | bool | Whether tool succeeded | + +### Notes + +- The agent already logs transcripts to stdout during execution +- Available via `oc logs` on the sandbox pod (or cluster log aggregator) +- Too large for CRDs (prompts + tool outputs can be 100KB+) +- Must be opt-in for export due to sensitivity + +--- + +## Audit Trail + +Required for PCI DSS 10.2, NIST 800-53 AU-10 compliance. + +Audit Trail extends the Proposal Summary with identity, authorization, and compliance fields. It is not a separate data structure — it adds columns to the same record. + +### Additional proposal-level fields (beyond Proposal Summary) + +| Field | Type | Description | +|-------|------|-------------| +| `escalationTrigger` | string | Why human review was required | +| `reviewerIdentity` | string | Who approved/denied execution | +| `reviewTimestamp` | timestamp | When human intervention occurred | + +### Additional per-step fields (beyond Proposal Summary `steps[]`) + +| Field | Type | Description | +|-------|------|-------------| +| `agentIdentity` | string | ServiceAccount the pod ran as | +| `agentName` | string | Agent CR name used for this step | +| `authorizationScope` | []string | RBAC permissions granted (execution step only) | + +> Note: Fields for failed proposal recovery (restart/retry) will be added when OLS-3278 is implemented. + +### Behavioral telemetry + +Behavioral metrics (action velocity, permission escalation rate, cross-boundary calls, exception rate) are not collected directly — they are computed from the raw telemetry above. Design of derived metrics belongs in the dashboard/alerting layer, not the data model. + +--- + +## Data Classification + +| Data category | Sensitivity | Default treatment | Notes | +|---------------|-------------|-------------------|-------| +| Token counts, latency, cost | **Non-sensitive** | Ship as-is | Operational metrics | +| Model/provider names | **Non-sensitive** | Ship as-is | Infrastructure metadata | +| Proposal outcome, retry count | **Non-sensitive** | Ship as-is | Workflow state | +| Target namespaces | **Low** | Ship as-is | Cluster topology hint | +| Approval decision + stage | **Low** | Ship as-is | Workflow decision | +| Failure reason (condition message) | **Medium** | Redact cluster-specific details | May contain resource names | +| Proposal request text | **Medium** | Opt-in only | May contain PII or context | +| Reviewer/approver identity | **Medium** | Hash for external export | Employee information | +| Tool outputs (pod logs, oc output) | **High** | Do not export without explicit opt-in | Contains cluster state, possibly secrets | +| Prompt content | **High** | Do not export | May contain injected secrets or context | +| LLM reasoning trace | **High** | Do not export | May reflect sensitive logic | + +--- + +## Extensibility Design + +1. **Additive fields only** — new fields added as optional; never remove or rename existing ones +2. **Tiers are independent** — Step Metrics works without Proposal Summary; Proposal Summary works without Transcript +3. **Per-step metrics on Result CRs** — the atomic unit; proposal summaries are computed from these +4. **Export format is decoupled from storage** — CRs store structured data; exporters transform to target format (Dataverse, Prometheus, etc.) +5. **Sensitivity tiers gate export** — non-sensitive ships by default; medium requires opt-in; high requires explicit configuration + +--- + +## Open Questions (Requiring BU Input) + +1. Which Proposal Summary fields does product actually need for "make the system better" analytics? +2. What is the Dataverse table structure? Can we reuse the existing OLS transcript schema or need a new one? +3. What's the export cadence — real-time (webhook/stream) or batch (periodic job)? +4. Is there a retention policy — how long should telemetry be kept? +5. For compliance (Audit Trail) — what regulations apply to this product specifically?