openshift · blublinsky · Jun 16, 2026
diff --git a/.ai/spec/how/reconciler.md b/.ai/spec/how/reconciler.md
@@ -125,7 +125,7 @@ Audience: AI agents. Behavioral rules and phase semantics live in **what/** spec
 
 - **Constructor:** Accepts `SandboxProvider`, `client.Client`, `ClientFactory func(endpoint string) AgentHTTPClientInterface`, operator namespace. `Timeout` defaults to `defaultSandboxTimeout` const.
 - **`callWithSandbox` order:** `SetStep` on provider → `Claim` → `patchSandboxInfo` (status subresource merge) → `WaitReady` → normalize URL (`http://{endpoint}:8080` if no scheme) → `outputSchemaForStep` → `ClientFactory(endpoint).Run(ctx, "", query, schema, agentCtx)`. Template derivation (sandbox-claim mode) happens inside `SandboxManager.Claim`; bare-pod mode builds the pod spec inside `BarePodManager.Claim`.
-- **`Run` contract:** Empty `systemPrompt`; full payload in POST body per `client.go` (`query`, `outputSchema`, `context`). Path constant `/v1/agent/run`.
+- **`Run` contract:** Empty `systemPrompt`; full payload in POST body per `client.go` (`query`, `outputSchema`, `context`). Path constant `/v1/agent/run`. Response is a `{metrics, result}` envelope; `callWithSandbox` returns both the raw result JSON and the parsed `RunMetrics`.
 - **`buildAgentContext`:** `TargetNamespaces`, `ApprovedOption` / `ExecutionResult` per step, `PreviousAttempts` from failed `StepResultRef` outcomes across analysis/execution/verification result lists.
 - **`ReleaseSandboxes`:** Iterates `Status.Steps.{Analysis,Execution,Verification,Escalation}.Sandbox.ClaimName` and calls `Release` for each non-empty.
 
@@ -135,7 +135,8 @@ Audience: AI agents. Behavioral rules and phase semantics live in **what/** spec
 
 - **`AgentHTTPClientInterface`:** `Run(ctx, systemPrompt, query, outputSchema, agentCtx) (*agentRunResponse, error)`.
 - **`NewAgentHTTPClient`:** Returns concrete type with long HTTP timeout, TLS `InsecureSkipVerify` for in-cluster calls.
-- **`Run`:** Marshals `agentRunRequest`, POSTs, reads capped body size, non-200 → error with truncated body; 200 → raw JSON in `agentRunResponse.Response` for caller to unmarshal phase-specific structs.
+- **`Run`:** Marshals `agentRunRequest`, POSTs, reads capped body size, non-200 → error with truncated body; 200 → parses `{metrics, result}` envelope. Returns `agentRunResponse` containing both `Result json.RawMessage` (per-step workflow data) and `Metrics *RunMetrics` (telemetry). Callers unmarshal `Result` into phase-specific structs; `Metrics` is passed to Result CR creation for storage.
+- **`RunMetrics`:** `LatencyMs int64`, `InputTokens int64`, `OutputTokens int64`, `CostUSD *string` (nil when unknown; decimal string e.g. "0.05"), `Model string`, `Provider string`, `ToolCallsCount int`.
 
 ---
 

diff --git a/.ai/spec/what/sandbox-execution.md b/.ai/spec/what/sandbox-execution.md
@@ -12,7 +12,8 @@ Behavioral specification for how workflow steps run inside ephemeral **sandboxes
 6. **Readiness**: In `sandbox-claim` mode, the controller MUST poll sandbox/claim status until the backing `Sandbox` reports `Ready=True` (standard condition pattern) and exposes a **service FQDN** for in-cluster HTTP, or until a configurable **sandbox wait budget** elapses (error path). In `bare-pod` mode, the controller MUST poll the Pod's conditions until `Ready=True` and extract `status.podIP` as the endpoint, or until the sandbox wait budget elapses.
 7. **Endpoint construction**: Agent HTTP URL MUST be formed from the readiness endpoint; if the endpoint is not already an absolute URL with HTTP scheme, the client MUST prefix standard cluster HTTP scheme and port expected for the agent container.
 8. **HTTP contract**: Each step MUST call the agent **`POST /v1/agent/run`** with JSON body carrying at least `query`, `outputSchema`, and `context`; optional `systemPrompt` and `timeout_ms` exist in the wire shape but **system prompt MUST be sent empty** in the current implementation (prompt material lives in `query` and templates).
-9. **Response handling**: HTTP success responses MUST be parsed as JSON matching the per-step schema (analysis/execution/verification/escalation). Non-success HTTP MUST fail the step with an error surfaced to proposal conditions.
+9. **Response envelope**: HTTP success responses MUST be parsed as a `{metrics, result}` JSON envelope. The `result` field contains the per-step workflow data (analysis options, execution actions, verification checks, escalation content) matching the step's `outputSchema`. The `metrics` field contains sandbox-owned telemetry: `latency_ms`, `input_tokens`, `output_tokens`, `cost_usd` (optional, omitted when unknown), `model`, `provider`, `tool_calls_count`. Non-success HTTP MUST fail the step with an error surfaced to proposal conditions.
+9a. **Metrics handling**: The operator MUST extract `metrics` from the envelope and store them on the corresponding Result CR status (AnalysisResult, ExecutionResult, VerificationResult, EscalationResult). The operator MUST NOT rely on metrics for workflow decisions — they are observability-only data.
 10. **Output schema selection**: `outputSchema` MUST be the step-specific JSON schema: analysis schema depends on `spec.analysisOutput.mode`, whether execution/verification steps exist in the proposal, and optional injected `components` sub-schema from `spec.analysisOutput.schema`; other steps use fixed schemas for their response shapes.
 11. **Analysis query payload**: The `query` string MUST encode the user request or revision-augmented request and encode workflow flags indicating whether execution/verification steps exist (template-rendered).
 12. **Execution query payload**: The `query` MUST include JSON describing the approved remediation option.

diff --git a/.tekton/integration-tests/pipelines/agentic-operator-e2e-pipeline.yaml b/.tekton/integration-tests/pipelines/agentic-operator-e2e-pipeline.yaml
@@ -143,7 +143,7 @@ spec:
               - name: NAMESPACE
                 value: "$(params.namespace)"
               - name: SANDBOX_IMAGE
-                value: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent"
+                value: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric"
             image: registry.redhat.io/openshift4/ose-cli:latest
             script: |
               set -euo pipefail

diff --git a/.tekton/integration-tests/scripts/install-operator.sh b/.tekton/integration-tests/scripts/install-operator.sh
@@ -9,7 +9,7 @@
 # Optional env:
 #   OPERATOR_NAMESPACE  (default: openshift-lightspeed)
 #   SANDBOX_MODE        (default: bare-pod)
-#   SANDBOX_IMAGE       (default: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent)
+#   SANDBOX_IMAGE       (default: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric)
 
 set -euo pipefail
 
@@ -18,7 +18,7 @@ set -euo pipefail
 
 OPERATOR_NAMESPACE="${OPERATOR_NAMESPACE:-openshift-lightspeed}"
 SANDBOX_MODE="${SANDBOX_MODE:-bare-pod}"
-SANDBOX_IMAGE="${SANDBOX_IMAGE:-quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent}"
+SANDBOX_IMAGE="${SANDBOX_IMAGE:-quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric}"
 
 echo "=== Agentic operator install ==="
 echo "  IMG:                ${IMG}"

diff --git a/Makefile b/Makefile
@@ -110,7 +110,7 @@ endif
 # Sandbox mode: "bare-pod" (default) or "sandbox-claim".
 SANDBOX_MODE ?= bare-pod
 # Agent sandbox image used by bare-pod mode (the container the operator creates per step).
-SANDBOX_IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent
+SANDBOX_IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric
 
 # kubernetes-sigs/agent-sandbox release reference (used only for documentation links).
 AGENT_SANDBOX_VERSION ?= v0.4.5

diff --git a/api/v1alpha1/analysisresult_types.go b/api/v1alpha1/analysisresult_types.go
@@ -50,6 +50,10 @@ type AnalysisResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // AnalysisResultSpec contains the immutable identity fields for an AnalysisResult.

diff --git a/api/v1alpha1/escalationresult_types.go b/api/v1alpha1/escalationresult_types.go
@@ -55,6 +55,10 @@ type EscalationResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // EscalationResultSpec contains the immutable identity fields for an EscalationResult.

diff --git a/api/v1alpha1/executionresult_types.go b/api/v1alpha1/executionresult_types.go
@@ -55,6 +55,10 @@ type ExecutionResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // ExecutionResultSpec contains the immutable identity fields for an ExecutionResult.

diff --git a/api/v1alpha1/shared_types.go b/api/v1alpha1/shared_types.go
@@ -191,3 +191,47 @@ type SkillsSource struct {
 	// +kubebuilder:validation:items:MaxLength=512
 	Paths []string `json:"paths,omitempty"`
 }
+
+// StepMetrics contains telemetry data collected during a workflow step execution.
+// Populated from the sandbox agent's response envelope.
+type StepMetrics struct {
+	// latencyMs is the wall-clock time (milliseconds) the agent spent processing.
+	// +required
+	// +kubebuilder:validation:Minimum=0
+	LatencyMs *int64 `json:"latencyMs,omitempty"`
+
+	// inputTokens is the number of input tokens consumed by the LLM.
+	// +optional
+	// +kubebuilder:validation:Minimum=0
+	InputTokens *int64 `json:"inputTokens,omitempty"`
+
+	// outputTokens is the number of output tokens produced by the LLM.
+	// +optional
+	// +kubebuilder:validation:Minimum=0
+	OutputTokens *int64 `json:"outputTokens,omitempty"`
+
+	// costUsd is the estimated cost in US dollars for this step, if known.
+	// Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+	// +optional
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=32
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(\\\\.[0-9]+)?$')",message="costUsd must be a decimal number string (e.g. '0.05')"
+	CostUSD string `json:"costUsd,omitempty"`
+
+	// model is the LLM model used (e.g. "claude-opus-4-6").
+	// +optional
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=128
+	Model string `json:"model,omitempty"`
+
+	// provider is the LLM provider used (e.g. "anthropic", "openai").
+	// +optional
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=64
+	Provider string `json:"provider,omitempty"`
+
+	// toolCallsCount is the number of tool invocations the agent made.
+	// +optional
+	// +kubebuilder:validation:Minimum=0
+	ToolCallsCount *int32 `json:"toolCallsCount,omitempty"`
+}
diff --git a/api/v1alpha1/verificationresult_types.go b/api/v1alpha1/verificationresult_types.go
@@ -56,6 +56,10 @@ type VerificationResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // VerificationResultSpec contains the immutable identity fields for a VerificationResult.

diff --git a/config/crd/bases/agentic.openshift.io_analysisresults.yaml b/config/crd/bases/agentic.openshift.io_analysisresults.yaml
@@ -136,6 +136,58 @@ spec:
                 maxLength: 8192
                 minLength: 1
                 type: string
+              metrics:
+                description: metrics contains telemetry from the sandbox agent for
+                  this step.
+                properties:
+                  costUsd:
+                    description: |-
+                      costUsd is the estimated cost in US dollars for this step, if known.
+                      Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+                    maxLength: 32
+                    minLength: 1
+                    type: string
+                    x-kubernetes-validations:
+                    - message: costUsd must be a decimal number string (e.g. '0.05')
+                      rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
+                  inputTokens:
+                    description: inputTokens is the number of input tokens consumed
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  latencyMs:
+                    description: latencyMs is the wall-clock time (milliseconds) the
+                      agent spent processing.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  model:
+                    description: model is the LLM model used (e.g. "claude-opus-4-6").
+                    maxLength: 128
+                    minLength: 1
+                    type: string
+                  outputTokens:
+                    description: outputTokens is the number of output tokens produced
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  provider:
+                    description: provider is the LLM provider used (e.g. "anthropic",
+                      "openai").
+                    maxLength: 64
+                    minLength: 1
+                    type: string
+                  toolCallsCount:
+                    description: toolCallsCount is the number of tool invocations
+                      the agent made.
+                    format: int32
+                    minimum: 0
+                    type: integer
+                required:
+                - latencyMs
+                type: object
               options:
                 description: options contains the remediation options returned by
                   the analysis agent.

diff --git a/config/crd/bases/agentic.openshift.io_escalationresults.yaml b/config/crd/bases/agentic.openshift.io_escalationresults.yaml
@@ -142,6 +142,58 @@ spec:
                 maxLength: 8192
                 minLength: 1
                 type: string
+              metrics:
+                description: metrics contains telemetry from the sandbox agent for
+                  this step.
+                properties:
+                  costUsd:
+                    description: |-
+                      costUsd is the estimated cost in US dollars for this step, if known.
+                      Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+                    maxLength: 32
+                    minLength: 1
+                    type: string
+                    x-kubernetes-validations:
+                    - message: costUsd must be a decimal number string (e.g. '0.05')
+                      rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
+                  inputTokens:
+                    description: inputTokens is the number of input tokens consumed
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  latencyMs:
+                    description: latencyMs is the wall-clock time (milliseconds) the
+                      agent spent processing.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  model:
+                    description: model is the LLM model used (e.g. "claude-opus-4-6").
+                    maxLength: 128
+                    minLength: 1
+                    type: string
+                  outputTokens:
+                    description: outputTokens is the number of output tokens produced
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  provider:
+                    description: provider is the LLM provider used (e.g. "anthropic",
+                      "openai").
+                    maxLength: 64
+                    minLength: 1
+                    type: string
+                  toolCallsCount:
+                    description: toolCallsCount is the number of tool invocations
+                      the agent made.
+                    format: int32
+                    minimum: 0
+                    type: integer
+                required:
+                - latencyMs
+                type: object
               sandbox:
                 description: sandbox tracks the sandbox pod used for this escalation.
                 properties:

diff --git a/config/crd/bases/agentic.openshift.io_executionresults.yaml b/config/crd/bases/agentic.openshift.io_executionresults.yaml
@@ -202,6 +202,58 @@ spec:
                 maxLength: 8192
                 minLength: 1
                 type: string
+              metrics:
+                description: metrics contains telemetry from the sandbox agent for
+                  this step.
+                properties:
+                  costUsd:
+                    description: |-
+                      costUsd is the estimated cost in US dollars for this step, if known.
+                      Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+                    maxLength: 32
+                    minLength: 1
+                    type: string
+                    x-kubernetes-validations:
+                    - message: costUsd must be a decimal number string (e.g. '0.05')
+                      rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
+                  inputTokens:
+                    description: inputTokens is the number of input tokens consumed
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  latencyMs:
+                    description: latencyMs is the wall-clock time (milliseconds) the
+                      agent spent processing.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  model:
+                    description: model is the LLM model used (e.g. "claude-opus-4-6").
+                    maxLength: 128
+                    minLength: 1
+                    type: string
+                  outputTokens:
+                    description: outputTokens is the number of output tokens produced
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  provider:
+                    description: provider is the LLM provider used (e.g. "anthropic",
+                      "openai").
+                    maxLength: 64
+                    minLength: 1
+                    type: string
+                  toolCallsCount:
+                    description: toolCallsCount is the number of tool invocations
+                      the agent made.
+                    format: int32
+                    minimum: 0
+                    type: integer
+                required:
+                - latencyMs
+                type: object
               sandbox:
                 description: sandbox tracks the sandbox pod used for this execution.
                 properties: