diff --git a/.ai/spec/how/reconciler.md b/.ai/spec/how/reconciler.md
index 62bd0997..5bf77061 100644
--- a/.ai/spec/how/reconciler.md
+++ b/.ai/spec/how/reconciler.md
@@ -125,7 +125,7 @@ Audience: AI agents. Behavioral rules and phase semantics live in **what/** spec
 
 - **Constructor:** Accepts `SandboxProvider`, `client.Client`, `ClientFactory func(endpoint string) AgentHTTPClientInterface`, operator namespace. `Timeout` defaults to `defaultSandboxTimeout` const.
 - **`callWithSandbox` order:** `SetStep` on provider → `Claim` → `patchSandboxInfo` (status subresource merge) → `WaitReady` → normalize URL (`http://{endpoint}:8080` if no scheme) → `outputSchemaForStep` → `ClientFactory(endpoint).Run(ctx, "", query, schema, agentCtx)`. Template derivation (sandbox-claim mode) happens inside `SandboxManager.Claim`; bare-pod mode builds the pod spec inside `BarePodManager.Claim`.
-- **`Run` contract:** Empty `systemPrompt`; full payload in POST body per `client.go` (`query`, `outputSchema`, `context`). Path constant `/v1/agent/run`.
+- **`Run` contract:** Empty `systemPrompt`; full payload in POST body per `client.go` (`query`, `outputSchema`, `context`). Path constant `/v1/agent/run`. Response is a `{metrics, result}` envelope; `callWithSandbox` returns both the raw result JSON and the parsed `RunMetrics`.
 - **`buildAgentContext`:** `TargetNamespaces`, `ApprovedOption` / `ExecutionResult` per step, `PreviousAttempts` from failed `StepResultRef` outcomes across analysis/execution/verification result lists.
 - **`ReleaseSandboxes`:** Iterates `Status.Steps.{Analysis,Execution,Verification,Escalation}.Sandbox.ClaimName` and calls `Release` for each non-empty.
 
@@ -135,7 +135,8 @@ Audience: AI agents. Behavioral rules and phase semantics live in **what/** spec
 
 - **`AgentHTTPClientInterface`:** `Run(ctx, systemPrompt, query, outputSchema, agentCtx) (*agentRunResponse, error)`.
 - **`NewAgentHTTPClient`:** Returns concrete type with long HTTP timeout, TLS `InsecureSkipVerify` for in-cluster calls.
-- **`Run`:** Marshals `agentRunRequest`, POSTs, reads capped body size, non-200 → error with truncated body; 200 → raw JSON in `agentRunResponse.Response` for caller to unmarshal phase-specific structs.
+- **`Run`:** Marshals `agentRunRequest`, POSTs, reads capped body size, non-200 → error with truncated body; 200 → parses `{metrics, result}` envelope. Returns `agentRunResponse` containing both `Result json.RawMessage` (per-step workflow data) and `Metrics *RunMetrics` (telemetry). Callers unmarshal `Result` into phase-specific structs; `Metrics` is passed to Result CR creation for storage.
+- **`RunMetrics`:** `LatencyMs int64`, `InputTokens int64`, `OutputTokens int64`, `CostUSD *string` (nil when unknown; decimal string e.g. "0.05"), `Model string`, `Provider string`, `ToolCallsCount int`.
 
 ---
 
diff --git a/.ai/spec/what/sandbox-execution.md b/.ai/spec/what/sandbox-execution.md
index d3a9dcb8..4f4e23d4 100644
--- a/.ai/spec/what/sandbox-execution.md
+++ b/.ai/spec/what/sandbox-execution.md
@@ -12,7 +12,8 @@ Behavioral specification for how workflow steps run inside ephemeral **sandboxes
 6. **Readiness**: In `sandbox-claim` mode, the controller MUST poll sandbox/claim status until the backing `Sandbox` reports `Ready=True` (standard condition pattern) and exposes a **service FQDN** for in-cluster HTTP, or until a configurable **sandbox wait budget** elapses (error path). In `bare-pod` mode, the controller MUST poll the Pod's conditions until `Ready=True` and extract `status.podIP` as the endpoint, or until the sandbox wait budget elapses.
 7. **Endpoint construction**: Agent HTTP URL MUST be formed from the readiness endpoint; if the endpoint is not already an absolute URL with HTTP scheme, the client MUST prefix standard cluster HTTP scheme and port expected for the agent container.
 8. **HTTP contract**: Each step MUST call the agent **`POST /v1/agent/run`** with JSON body carrying at least `query`, `outputSchema`, and `context`; optional `systemPrompt` and `timeout_ms` exist in the wire shape but **system prompt MUST be sent empty** in the current implementation (prompt material lives in `query` and templates).
-9. **Response handling**: HTTP success responses MUST be parsed as JSON matching the per-step schema (analysis/execution/verification/escalation). Non-success HTTP MUST fail the step with an error surfaced to proposal conditions.
+9. **Response envelope**: HTTP success responses MUST be parsed as a `{metrics, result}` JSON envelope. The `result` field contains the per-step workflow data (analysis options, execution actions, verification checks, escalation content) matching the step's `outputSchema`. The `metrics` field contains sandbox-owned telemetry: `latency_ms`, `input_tokens`, `output_tokens`, `cost_usd` (optional, omitted when unknown), `model`, `provider`, `tool_calls_count`. Non-success HTTP MUST fail the step with an error surfaced to proposal conditions.
+9a. **Metrics handling**: The operator MUST extract `metrics` from the envelope and store them on the corresponding Result CR status (AnalysisResult, ExecutionResult, VerificationResult, EscalationResult). The operator MUST NOT rely on metrics for workflow decisions — they are observability-only data.
 10. **Output schema selection**: `outputSchema` MUST be the step-specific JSON schema: analysis schema depends on `spec.analysisOutput.mode`, whether execution/verification steps exist in the proposal, and optional injected `components` sub-schema from `spec.analysisOutput.schema`; other steps use fixed schemas for their response shapes.
 11. **Analysis query payload**: The `query` string MUST encode the user request or revision-augmented request and encode workflow flags indicating whether execution/verification steps exist (template-rendered).
 12. **Execution query payload**: The `query` MUST include JSON describing the approved remediation option.
diff --git a/.tekton/integration-tests/pipelines/agentic-operator-e2e-pipeline.yaml b/.tekton/integration-tests/pipelines/agentic-operator-e2e-pipeline.yaml
index 6635670e..fe41f9b8 100644
--- a/.tekton/integration-tests/pipelines/agentic-operator-e2e-pipeline.yaml
+++ b/.tekton/integration-tests/pipelines/agentic-operator-e2e-pipeline.yaml
@@ -143,7 +143,7 @@ spec:
               - name: NAMESPACE
                 value: "$(params.namespace)"
               - name: SANDBOX_IMAGE
-                value: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent"
+                value: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric"
             image: registry.redhat.io/openshift4/ose-cli:latest
             script: |
               set -euo pipefail
diff --git a/.tekton/integration-tests/scripts/install-operator.sh b/.tekton/integration-tests/scripts/install-operator.sh
index d81f9cf2..a97ee35a 100755
--- a/.tekton/integration-tests/scripts/install-operator.sh
+++ b/.tekton/integration-tests/scripts/install-operator.sh
@@ -9,7 +9,7 @@
 # Optional env:
 #   OPERATOR_NAMESPACE  (default: openshift-lightspeed)
 #   SANDBOX_MODE        (default: bare-pod)
-#   SANDBOX_IMAGE       (default: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent)
+#   SANDBOX_IMAGE       (default: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric)
 
 set -euo pipefail
 
@@ -18,7 +18,7 @@ set -euo pipefail
 
 OPERATOR_NAMESPACE="${OPERATOR_NAMESPACE:-openshift-lightspeed}"
 SANDBOX_MODE="${SANDBOX_MODE:-bare-pod}"
-SANDBOX_IMAGE="${SANDBOX_IMAGE:-quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent}"
+SANDBOX_IMAGE="${SANDBOX_IMAGE:-quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric}"
 
 echo "=== Agentic operator install ==="
 echo "  IMG:                ${IMG}"
diff --git a/Makefile b/Makefile
index 9f59008b..dd3ad2cf 100644
--- a/Makefile
+++ b/Makefile
@@ -110,7 +110,7 @@ endif
 # Sandbox mode: "bare-pod" (default) or "sandbox-claim".
 SANDBOX_MODE ?= bare-pod
 # Agent sandbox image used by bare-pod mode (the container the operator creates per step).
-SANDBOX_IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent
+SANDBOX_IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric
 
 # kubernetes-sigs/agent-sandbox release reference (used only for documentation links).
 AGENT_SANDBOX_VERSION ?= v0.4.5
diff --git a/api/v1alpha1/analysisresult_types.go b/api/v1alpha1/analysisresult_types.go
index 7616238c..884230ba 100644
--- a/api/v1alpha1/analysisresult_types.go
+++ b/api/v1alpha1/analysisresult_types.go
@@ -50,6 +50,10 @@ type AnalysisResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // AnalysisResultSpec contains the immutable identity fields for an AnalysisResult.
diff --git a/api/v1alpha1/escalationresult_types.go b/api/v1alpha1/escalationresult_types.go
index c4634974..405e72e5 100644
--- a/api/v1alpha1/escalationresult_types.go
+++ b/api/v1alpha1/escalationresult_types.go
@@ -55,6 +55,10 @@ type EscalationResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // EscalationResultSpec contains the immutable identity fields for an EscalationResult.
diff --git a/api/v1alpha1/executionresult_types.go b/api/v1alpha1/executionresult_types.go
index f6d6215f..0eba0698 100644
--- a/api/v1alpha1/executionresult_types.go
+++ b/api/v1alpha1/executionresult_types.go
@@ -55,6 +55,10 @@ type ExecutionResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // ExecutionResultSpec contains the immutable identity fields for an ExecutionResult.
diff --git a/api/v1alpha1/shared_types.go b/api/v1alpha1/shared_types.go
index 27eb633c..5c9a8e9a 100644
--- a/api/v1alpha1/shared_types.go
+++ b/api/v1alpha1/shared_types.go
@@ -191,3 +191,47 @@ type SkillsSource struct {
 	// +kubebuilder:validation:items:MaxLength=512
 	Paths []string `json:"paths,omitempty"`
 }
+
+// StepMetrics contains telemetry data collected during a workflow step execution.
+// Populated from the sandbox agent's response envelope.
+type StepMetrics struct {
+	// latencyMs is the wall-clock time (milliseconds) the agent spent processing.
+	// +required
+	// +kubebuilder:validation:Minimum=0
+	LatencyMs *int64 `json:"latencyMs,omitempty"`
+
+	// inputTokens is the number of input tokens consumed by the LLM.
+	// +optional
+	// +kubebuilder:validation:Minimum=0
+	InputTokens *int64 `json:"inputTokens,omitempty"`
+
+	// outputTokens is the number of output tokens produced by the LLM.
+	// +optional
+	// +kubebuilder:validation:Minimum=0
+	OutputTokens *int64 `json:"outputTokens,omitempty"`
+
+	// costUsd is the estimated cost in US dollars for this step, if known.
+	// Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+	// +optional
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=32
+	// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(\\\\.[0-9]+)?$')",message="costUsd must be a decimal number string (e.g. '0.05')"
+	CostUSD string `json:"costUsd,omitempty"`
+
+	// model is the LLM model used (e.g. "claude-opus-4-6").
+	// +optional
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=128
+	Model string `json:"model,omitempty"`
+
+	// provider is the LLM provider used (e.g. "anthropic", "openai").
+	// +optional
+	// +kubebuilder:validation:MinLength=1
+	// +kubebuilder:validation:MaxLength=64
+	Provider string `json:"provider,omitempty"`
+
+	// toolCallsCount is the number of tool invocations the agent made.
+	// +optional
+	// +kubebuilder:validation:Minimum=0
+	ToolCallsCount *int32 `json:"toolCallsCount,omitempty"`
+}
diff --git a/api/v1alpha1/verificationresult_types.go b/api/v1alpha1/verificationresult_types.go
index 11a3bbe8..a9682956 100644
--- a/api/v1alpha1/verificationresult_types.go
+++ b/api/v1alpha1/verificationresult_types.go
@@ -56,6 +56,10 @@ type VerificationResultStatus struct {
 	// +kubebuilder:validation:MinLength=1
 	// +kubebuilder:validation:MaxLength=8192
 	FailureReason string `json:"failureReason,omitempty"`
+
+	// metrics contains telemetry from the sandbox agent for this step.
+	// +optional
+	Metrics StepMetrics `json:"metrics,omitzero"`
 }
 
 // VerificationResultSpec contains the immutable identity fields for a VerificationResult.
diff --git a/config/crd/bases/agentic.openshift.io_analysisresults.yaml b/config/crd/bases/agentic.openshift.io_analysisresults.yaml
index 62922c52..97a4658c 100644
--- a/config/crd/bases/agentic.openshift.io_analysisresults.yaml
+++ b/config/crd/bases/agentic.openshift.io_analysisresults.yaml
@@ -136,6 +136,58 @@ spec:
                 maxLength: 8192
                 minLength: 1
                 type: string
+              metrics:
+                description: metrics contains telemetry from the sandbox agent for
+                  this step.
+                properties:
+                  costUsd:
+                    description: |-
+                      costUsd is the estimated cost in US dollars for this step, if known.
+                      Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+                    maxLength: 32
+                    minLength: 1
+                    type: string
+                    x-kubernetes-validations:
+                    - message: costUsd must be a decimal number string (e.g. '0.05')
+                      rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
+                  inputTokens:
+                    description: inputTokens is the number of input tokens consumed
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  latencyMs:
+                    description: latencyMs is the wall-clock time (milliseconds) the
+                      agent spent processing.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  model:
+                    description: model is the LLM model used (e.g. "claude-opus-4-6").
+                    maxLength: 128
+                    minLength: 1
+                    type: string
+                  outputTokens:
+                    description: outputTokens is the number of output tokens produced
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  provider:
+                    description: provider is the LLM provider used (e.g. "anthropic",
+                      "openai").
+                    maxLength: 64
+                    minLength: 1
+                    type: string
+                  toolCallsCount:
+                    description: toolCallsCount is the number of tool invocations
+                      the agent made.
+                    format: int32
+                    minimum: 0
+                    type: integer
+                required:
+                - latencyMs
+                type: object
               options:
                 description: options contains the remediation options returned by
                   the analysis agent.
diff --git a/config/crd/bases/agentic.openshift.io_escalationresults.yaml b/config/crd/bases/agentic.openshift.io_escalationresults.yaml
index bc54ccab..e964fe08 100644
--- a/config/crd/bases/agentic.openshift.io_escalationresults.yaml
+++ b/config/crd/bases/agentic.openshift.io_escalationresults.yaml
@@ -142,6 +142,58 @@ spec:
                 maxLength: 8192
                 minLength: 1
                 type: string
+              metrics:
+                description: metrics contains telemetry from the sandbox agent for
+                  this step.
+                properties:
+                  costUsd:
+                    description: |-
+                      costUsd is the estimated cost in US dollars for this step, if known.
+                      Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+                    maxLength: 32
+                    minLength: 1
+                    type: string
+                    x-kubernetes-validations:
+                    - message: costUsd must be a decimal number string (e.g. '0.05')
+                      rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
+                  inputTokens:
+                    description: inputTokens is the number of input tokens consumed
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  latencyMs:
+                    description: latencyMs is the wall-clock time (milliseconds) the
+                      agent spent processing.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  model:
+                    description: model is the LLM model used (e.g. "claude-opus-4-6").
+                    maxLength: 128
+                    minLength: 1
+                    type: string
+                  outputTokens:
+                    description: outputTokens is the number of output tokens produced
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  provider:
+                    description: provider is the LLM provider used (e.g. "anthropic",
+                      "openai").
+                    maxLength: 64
+                    minLength: 1
+                    type: string
+                  toolCallsCount:
+                    description: toolCallsCount is the number of tool invocations
+                      the agent made.
+                    format: int32
+                    minimum: 0
+                    type: integer
+                required:
+                - latencyMs
+                type: object
               sandbox:
                 description: sandbox tracks the sandbox pod used for this escalation.
                 properties:
diff --git a/config/crd/bases/agentic.openshift.io_executionresults.yaml b/config/crd/bases/agentic.openshift.io_executionresults.yaml
index e31a4883..88994264 100644
--- a/config/crd/bases/agentic.openshift.io_executionresults.yaml
+++ b/config/crd/bases/agentic.openshift.io_executionresults.yaml
@@ -202,6 +202,58 @@ spec:
                 maxLength: 8192
                 minLength: 1
                 type: string
+              metrics:
+                description: metrics contains telemetry from the sandbox agent for
+                  this step.
+                properties:
+                  costUsd:
+                    description: |-
+                      costUsd is the estimated cost in US dollars for this step, if known.
+                      Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+                    maxLength: 32
+                    minLength: 1
+                    type: string
+                    x-kubernetes-validations:
+                    - message: costUsd must be a decimal number string (e.g. '0.05')
+                      rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
+                  inputTokens:
+                    description: inputTokens is the number of input tokens consumed
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  latencyMs:
+                    description: latencyMs is the wall-clock time (milliseconds) the
+                      agent spent processing.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  model:
+                    description: model is the LLM model used (e.g. "claude-opus-4-6").
+                    maxLength: 128
+                    minLength: 1
+                    type: string
+                  outputTokens:
+                    description: outputTokens is the number of output tokens produced
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  provider:
+                    description: provider is the LLM provider used (e.g. "anthropic",
+                      "openai").
+                    maxLength: 64
+                    minLength: 1
+                    type: string
+                  toolCallsCount:
+                    description: toolCallsCount is the number of tool invocations
+                      the agent made.
+                    format: int32
+                    minimum: 0
+                    type: integer
+                required:
+                - latencyMs
+                type: object
               sandbox:
                 description: sandbox tracks the sandbox pod used for this execution.
                 properties:
diff --git a/config/crd/bases/agentic.openshift.io_verificationresults.yaml b/config/crd/bases/agentic.openshift.io_verificationresults.yaml
index af892bdd..11a78d12 100644
--- a/config/crd/bases/agentic.openshift.io_verificationresults.yaml
+++ b/config/crd/bases/agentic.openshift.io_verificationresults.yaml
@@ -194,6 +194,58 @@ spec:
                 maxLength: 8192
                 minLength: 1
                 type: string
+              metrics:
+                description: metrics contains telemetry from the sandbox agent for
+                  this step.
+                properties:
+                  costUsd:
+                    description: |-
+                      costUsd is the estimated cost in US dollars for this step, if known.
+                      Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
+                    maxLength: 32
+                    minLength: 1
+                    type: string
+                    x-kubernetes-validations:
+                    - message: costUsd must be a decimal number string (e.g. '0.05')
+                      rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
+                  inputTokens:
+                    description: inputTokens is the number of input tokens consumed
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  latencyMs:
+                    description: latencyMs is the wall-clock time (milliseconds) the
+                      agent spent processing.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  model:
+                    description: model is the LLM model used (e.g. "claude-opus-4-6").
+                    maxLength: 128
+                    minLength: 1
+                    type: string
+                  outputTokens:
+                    description: outputTokens is the number of output tokens produced
+                      by the LLM.
+                    format: int64
+                    minimum: 0
+                    type: integer
+                  provider:
+                    description: provider is the LLM provider used (e.g. "anthropic",
+                      "openai").
+                    maxLength: 64
+                    minLength: 1
+                    type: string
+                  toolCallsCount:
+                    description: toolCallsCount is the number of tool invocations
+                      the agent made.
+                    format: int32
+                    minimum: 0
+                    type: integer
+                required:
+                - latencyMs
+                type: object
               sandbox:
                 description: sandbox tracks the sandbox pod used for this verification.
                 properties:
diff --git a/controller/proposal/agent.go b/controller/proposal/agent.go
index e6698ae3..50d96d9c 100644
--- a/controller/proposal/agent.go
+++ b/controller/proposal/agent.go
@@ -10,6 +10,7 @@ import (
 type AnalysisOutput struct {
 	Success bool
 	Options []agenticv1alpha1.RemediationOption
+	Metrics *RunMetrics
 }
 
 // ExecutionOutput holds the execution agent's output.
@@ -17,6 +18,7 @@ type ExecutionOutput struct {
 	Success      bool
 	ActionsTaken []agenticv1alpha1.ExecutionAction
 	Verification agenticv1alpha1.ExecutionVerification
+	Metrics      *RunMetrics
 }
 
 // VerificationOutput holds the verification agent's output.
@@ -24,6 +26,7 @@ type VerificationOutput struct {
 	Success bool
 	Checks  []agenticv1alpha1.VerifyCheck
 	Summary string
+	Metrics *RunMetrics
 }
 
 // EscalationOutput holds the escalation agent's output.
@@ -31,6 +34,7 @@ type EscalationOutput struct {
 	Success bool
 	Summary string
 	Content string
+	Metrics *RunMetrics
 }
 
 // AgentCaller abstracts the agent invocation path. The reconciler
diff --git a/controller/proposal/client.go b/controller/proposal/client.go
index 2ac08385..c8aeca21 100644
--- a/controller/proposal/client.go
+++ b/controller/proposal/client.go
@@ -64,8 +64,21 @@ type agentPreviousAttempt struct {
 	FailureReason string `json:"failureReason,omitempty"`
 }
 
+// RunMetrics contains sandbox-owned telemetry from the agent response envelope.
+type RunMetrics struct {
+	LatencyMs      int64   `json:"latency_ms"`
+	InputTokens    int64   `json:"input_tokens"`
+	OutputTokens   int64   `json:"output_tokens"`
+	CostUSD        *string `json:"cost_usd,omitempty"`
+	Model          string  `json:"model"`
+	Provider       string  `json:"provider"`
+	ToolCallsCount int     `json:"tool_calls_count"`
+}
+
+// agentRunResponse is the envelope returned by POST /v1/agent/run.
 type agentRunResponse struct {
-	Response json.RawMessage
+	Metrics *RunMetrics     `json:"metrics"`
+	Result  json.RawMessage `json:"result"`
 }
 
 // AgentHTTPClientInterface abstracts HTTP calls to the agent service for testability.
@@ -129,5 +142,13 @@ func (c *AgentHTTPClient) Run(ctx context.Context, systemPrompt, query string, o
 		return nil, fmt.Errorf("POST %s returned HTTP %d: %s", runPath, resp.StatusCode, truncated)
 	}
 
-	return &agentRunResponse{Response: respBody}, nil
+	var response agentRunResponse
+	if err := json.Unmarshal(respBody, &response); err != nil {
+		return nil, fmt.Errorf("parse response envelope: %w", err)
+	}
+	if len(response.Result) == 0 || string(response.Result) == "null" {
+		return nil, fmt.Errorf("response envelope missing or null 'result' field")
+	}
+
+	return &response, nil
 }
diff --git a/controller/proposal/client_test.go b/controller/proposal/client_test.go
index 51ef7e15..590b0d7d 100644
--- a/controller/proposal/client_test.go
+++ b/controller/proposal/client_test.go
@@ -31,7 +31,7 @@ func TestAgentHTTPClient_RunSuccess(t *testing.T) {
 		}
 
 		w.Header().Set("Content-Type", "application/json")
-		w.Write([]byte(`{"options": [{"title": "Fix it"}]}`))
+		w.Write([]byte(`{"metrics":{"latency_ms":100,"input_tokens":10,"output_tokens":5,"model":"test","provider":"test","tool_calls_count":0},"result":{"options": [{"title": "Fix it"}]}}`))
 	}))
 	defer server.Close()
 
@@ -40,8 +40,11 @@ func TestAgentHTTPClient_RunSuccess(t *testing.T) {
 	if err != nil {
 		t.Fatalf("unexpected error: %v", err)
 	}
-	if len(resp.Response) == 0 {
-		t.Error("expected non-empty response")
+	if len(resp.Result) == 0 {
+		t.Error("expected non-empty result")
+	}
+	if resp.Metrics == nil {
+		t.Error("expected non-nil metrics")
 	}
 }
 
@@ -96,7 +99,7 @@ func TestAgentHTTPClient_RunWithExecutionResult(t *testing.T) {
 		}
 
 		w.Header().Set("Content-Type", "application/json")
-		w.Write([]byte(`{"success": true}`))
+		w.Write([]byte(`{"metrics":{"latency_ms":50,"input_tokens":5,"output_tokens":3,"model":"test","provider":"test","tool_calls_count":0},"result":{"success": true}}`))
 	}))
 	defer server.Close()
 
@@ -131,7 +134,7 @@ func TestAgentHTTPClient_RunWithoutExecutionResult(t *testing.T) {
 		}
 
 		w.Header().Set("Content-Type", "application/json")
-		w.Write([]byte(`{"success": true}`))
+		w.Write([]byte(`{"metrics":{"latency_ms":50,"input_tokens":5,"output_tokens":3,"model":"test","provider":"test","tool_calls_count":0},"result":{"success": true}}`))
 	}))
 	defer server.Close()
 
@@ -165,7 +168,7 @@ func TestAgentHTTPClient_RunWithContext(t *testing.T) {
 		}
 
 		w.Header().Set("Content-Type", "application/json")
-		w.Write([]byte(`{"success": true}`))
+		w.Write([]byte(`{"metrics":{"latency_ms":50,"input_tokens":5,"output_tokens":3,"model":"test","provider":"test","tool_calls_count":0},"result":{"success": true}}`))
 	}))
 	defer server.Close()
 
diff --git a/controller/proposal/reconciler_test.go b/controller/proposal/reconciler_test.go
index 1fd40fc1..e4c6a304 100644
--- a/controller/proposal/reconciler_test.go
+++ b/controller/proposal/reconciler_test.go
@@ -251,7 +251,7 @@ func newMockSandboxAgent(analysisJSON, executionJSON, verificationJSON string) (
 		ClientFactory: func(_ string) AgentHTTPClientInterface {
 			resp := responses[callCount%len(responses)]
 			callCount++
-			httpClient.response = &agentRunResponse{Response: json.RawMessage(resp)}
+			httpClient.response = &agentRunResponse{Result: json.RawMessage(resp)}
 			return httpClient
 		},
 		Namespace: "test-ns",
diff --git a/controller/proposal/results.go b/controller/proposal/results.go
index 64390140..91338596 100644
--- a/controller/proposal/results.go
+++ b/controller/proposal/results.go
@@ -19,6 +19,25 @@ const (
 	ErrPatchResultStatus          = "patch"
 )
 
+func toStepMetrics(m *RunMetrics) agenticv1alpha1.StepMetrics {
+	if m == nil {
+		return agenticv1alpha1.StepMetrics{}
+	}
+	toolCalls := int32(m.ToolCallsCount)
+	sm := agenticv1alpha1.StepMetrics{
+		LatencyMs:      &m.LatencyMs,
+		InputTokens:    &m.InputTokens,
+		OutputTokens:   &m.OutputTokens,
+		Model:          m.Model,
+		Provider:       m.Provider,
+		ToolCallsCount: &toolCalls,
+	}
+	if m.CostUSD != nil {
+		sm.CostUSD = *m.CostUSD
+	}
+	return sm
+}
+
 func resultCRName(proposalName, step string, index int) string {
 	return truncateK8sName(fmt.Sprintf("%s-%s-%d", proposalName, step, index))
 }
@@ -111,6 +130,7 @@ func (r *ProposalReconciler) createAnalysisResult(
 
 	if result != nil {
 		cr.Status.Options = result.Options
+		cr.Status.Metrics = toStepMetrics(result.Metrics)
 	}
 
 	return crName, createIdempotent(ctx, r.Client, cr, "AnalysisResult")
@@ -158,6 +178,7 @@ func (r *ProposalReconciler) createExecutionResult(
 	if result != nil {
 		cr.Status.ActionsTaken = result.ActionsTaken
 		cr.Status.Verification = result.Verification
+		cr.Status.Metrics = toStepMetrics(result.Metrics)
 	}
 
 	return crName, createIdempotent(ctx, r.Client, cr, "ExecutionResult")
@@ -205,6 +226,7 @@ func (r *ProposalReconciler) createVerificationResult(
 	if result != nil {
 		cr.Status.Checks = result.Checks
 		cr.Status.Summary = result.Summary
+		cr.Status.Metrics = toStepMetrics(result.Metrics)
 	}
 
 	return crName, createIdempotent(ctx, r.Client, cr, "VerificationResult")
@@ -251,6 +273,7 @@ func (r *ProposalReconciler) createEscalationResult(
 	if result != nil {
 		cr.Status.Summary = result.Summary
 		cr.Status.Content = result.Content
+		cr.Status.Metrics = toStepMetrics(result.Metrics)
 	}
 
 	return crName, createIdempotent(ctx, r.Client, cr, "EscalationResult")
@@ -272,6 +295,7 @@ func copyResultStatus(dst, src client.Object) {
 			d.Status.Options = s.Status.Options
 			d.Status.FailureReason = s.Status.FailureReason
 			d.Status.Sandbox = s.Status.Sandbox
+			d.Status.Metrics = s.Status.Metrics
 		}
 	case *agenticv1alpha1.ExecutionResult:
 		if s, ok := src.(*agenticv1alpha1.ExecutionResult); ok {
@@ -279,6 +303,7 @@ func copyResultStatus(dst, src client.Object) {
 			d.Status.Verification = s.Status.Verification
 			d.Status.FailureReason = s.Status.FailureReason
 			d.Status.Sandbox = s.Status.Sandbox
+			d.Status.Metrics = s.Status.Metrics
 		}
 	case *agenticv1alpha1.VerificationResult:
 		if s, ok := src.(*agenticv1alpha1.VerificationResult); ok {
@@ -286,6 +311,7 @@ func copyResultStatus(dst, src client.Object) {
 			d.Status.Summary = s.Status.Summary
 			d.Status.FailureReason = s.Status.FailureReason
 			d.Status.Sandbox = s.Status.Sandbox
+			d.Status.Metrics = s.Status.Metrics
 		}
 	case *agenticv1alpha1.EscalationResult:
 		if s, ok := src.(*agenticv1alpha1.EscalationResult); ok {
@@ -293,6 +319,7 @@ func copyResultStatus(dst, src client.Object) {
 			d.Status.Content = s.Status.Content
 			d.Status.FailureReason = s.Status.FailureReason
 			d.Status.Sandbox = s.Status.Sandbox
+			d.Status.Metrics = s.Status.Metrics
 		}
 	}
 }
diff --git a/controller/proposal/sandbox_agent.go b/controller/proposal/sandbox_agent.go
index fec57c4b..703f91cf 100644
--- a/controller/proposal/sandbox_agent.go
+++ b/controller/proposal/sandbox_agent.go
@@ -77,7 +77,7 @@ func stepString(step agenticv1alpha1.SandboxStep) string {
 
 func (s *SandboxAgentCaller) Analyze(ctx context.Context, proposal *agenticv1alpha1.Proposal, step resolvedStep, requestText string, serviceAccount string) (*AnalysisOutput, error) {
 	query := buildAnalysisQuery(requestText, proposal)
-	raw, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepAnalysis), step, query, buildAgentContext(proposal), serviceAccount)
+	raw, metrics, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepAnalysis), step, query, buildAgentContext(proposal), serviceAccount)
 	if err != nil {
 		return nil, fmt.Errorf("%s: %w", ErrAnalysisAgentCall, err)
 	}
@@ -90,6 +90,7 @@ func (s *SandboxAgentCaller) Analyze(ctx context.Context, proposal *agenticv1alp
 	return &AnalysisOutput{
 		Success: resp.Success,
 		Options: resp.Options,
+		Metrics: metrics,
 	}, nil
 }
 
@@ -100,7 +101,7 @@ func (s *SandboxAgentCaller) Execute(ctx context.Context, proposal *agenticv1alp
 	}
 
 	query := buildExecutionQuery(option)
-	raw, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepExecution), step, query, agentCtx, serviceAccount)
+	raw, metrics, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepExecution), step, query, agentCtx, serviceAccount)
 	if err != nil {
 		return nil, fmt.Errorf("%s: %w", ErrExecutionAgentCall, err)
 	}
@@ -113,6 +114,7 @@ func (s *SandboxAgentCaller) Execute(ctx context.Context, proposal *agenticv1alp
 	out := &ExecutionOutput{
 		Success:      resp.Success,
 		ActionsTaken: resp.ActionsTaken,
+		Metrics:      metrics,
 	}
 	if resp.Verification != nil {
 		out.Verification = *resp.Verification
@@ -128,7 +130,7 @@ func (s *SandboxAgentCaller) Verify(ctx context.Context, proposal *agenticv1alph
 	agentCtx.ExecutionResult = executionOutputToAgentResult(exec)
 
 	query := buildVerificationQuery(option, exec)
-	raw, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepVerification), step, query, agentCtx, serviceAccount)
+	raw, metrics, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepVerification), step, query, agentCtx, serviceAccount)
 	if err != nil {
 		return nil, fmt.Errorf("%s: %w", ErrVerificationAgentCall, err)
 	}
@@ -142,12 +144,13 @@ func (s *SandboxAgentCaller) Verify(ctx context.Context, proposal *agenticv1alph
 		Success: resp.Success,
 		Checks:  resp.Checks,
 		Summary: resp.Summary,
+		Metrics: metrics,
 	}, nil
 }
 
 func (s *SandboxAgentCaller) Escalate(ctx context.Context, proposal *agenticv1alpha1.Proposal, step resolvedStep, requestText string, serviceAccount string) (*EscalationOutput, error) {
 	agentCtx := buildAgentContext(proposal)
-	raw, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepEscalation), step, requestText, agentCtx, serviceAccount)
+	raw, metrics, err := s.callWithSandbox(ctx, proposal, stepString(agenticv1alpha1.SandboxStepEscalation), step, requestText, agentCtx, serviceAccount)
 	if err != nil {
 		return nil, fmt.Errorf("%s: %w", ErrEscalationAgentCall, err)
 	}
@@ -165,6 +168,7 @@ func (s *SandboxAgentCaller) Escalate(ctx context.Context, proposal *agenticv1al
 		Success: resp.Success,
 		Summary: resp.Summary,
 		Content: resp.Content,
+		Metrics: metrics,
 	}, nil
 }
 
@@ -176,12 +180,12 @@ func (s *SandboxAgentCaller) callWithSandbox(
 	query string,
 	agentCtx *agentContext,
 	serviceAccount string,
-) (json.RawMessage, error) {
+) (json.RawMessage, *RunMetrics, error) {
 	s.Sandbox.SetStep(step.Agent, step.LLM, step.Tools, serviceAccount)
 
 	claimName, err := s.Sandbox.Claim(ctx, proposal.Name, stepName, "")
 	if err != nil {
-		return nil, fmt.Errorf("%s: %w", ErrClaimSandbox, err)
+		return nil, nil, fmt.Errorf("%s: %w", ErrClaimSandbox, err)
 	}
 
 	// Write sandbox info immediately so the console can stream logs
@@ -195,7 +199,7 @@ func (s *SandboxAgentCaller) callWithSandbox(
 
 	endpoint, err := s.Sandbox.WaitReady(ctx, claimName, timeout)
 	if err != nil {
-		return nil, fmt.Errorf("%s: %w", ErrWaitForSandbox, err)
+		return nil, nil, fmt.Errorf("%s: %w", ErrWaitForSandbox, err)
 	}
 
 	agentURL := endpoint
@@ -208,10 +212,10 @@ func (s *SandboxAgentCaller) callWithSandbox(
 	client := s.ClientFactory(agentURL)
 	resp, err := client.Run(ctx, "", query, schema, agentCtx)
 	if err != nil {
-		return nil, err
+		return nil, nil, err
 	}
 
-	return resp.Response, nil
+	return resp.Result, resp.Metrics, nil
 }
 
 func (s *SandboxAgentCaller) ReleaseSandboxes(ctx context.Context, proposal *agenticv1alpha1.Proposal) error {
diff --git a/controller/proposal/sandbox_agent_test.go b/controller/proposal/sandbox_agent_test.go
index 9a046058..16ce7972 100644
--- a/controller/proposal/sandbox_agent_test.go
+++ b/controller/proposal/sandbox_agent_test.go
@@ -102,7 +102,7 @@ func TestSandboxAgentCaller_Analyze_HappyPath(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "ls-analysis-fix-crash", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
 		response: &agentRunResponse{
-			Response: json.RawMessage(`{"success": true, "options": [{"title": "Increase memory", "diagnosis": {"summary": "OOM", "confidence": "High", "rootCause": "memory limit"}, "proposal": {"description": "Bump memory", "actions": [{"type": "patch", "description": "patch deploy"}], "risk": "Low"}}]}`),
+			Result: json.RawMessage(`{"success": true, "options": [{"title": "Increase memory", "diagnosis": {"summary": "OOM", "confidence": "High", "rootCause": "memory limit"}, "proposal": {"description": "Bump memory", "actions": [{"type": "patch", "description": "patch deploy"}], "risk": "Low"}}]}`),
 		},
 	}
 
@@ -126,7 +126,7 @@ func TestSandboxAgentCaller_Execute_HappyPath(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "ls-execution-fix-crash", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
 		response: &agentRunResponse{
-			Response: json.RawMessage(`{"success": true, "actionsTaken": [{"type": "patch", "description": "Patched deployment", "outcome": "Succeeded"}], "verification": {"conditionOutcome": "Improved", "summary": "Pod running"}}`),
+			Result: json.RawMessage(`{"success": true, "actionsTaken": [{"type": "patch", "description": "Patched deployment", "outcome": "Succeeded"}], "verification": {"conditionOutcome": "Improved", "summary": "Pod running"}}`),
 		},
 	}
 
@@ -151,7 +151,7 @@ func TestSandboxAgentCaller_Verify_HappyPath(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "ls-verification-fix-crash", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
 		response: &agentRunResponse{
-			Response: json.RawMessage(`{"success": true, "checks": [{"name": "pod-running", "source": "oc", "value": "Running", "result": "Passed"}], "summary": "All checks passed"}`),
+			Result: json.RawMessage(`{"success": true, "checks": [{"name": "pod-running", "source": "oc", "value": "Running", "result": "Passed"}], "summary": "All checks passed"}`),
 		},
 	}
 
@@ -221,7 +221,7 @@ func TestSandboxAgentCaller_HTTPError(t *testing.T) {
 func TestSandboxAgentCaller_ParseError(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage("not valid json")},
+		response: &agentRunResponse{Result: json.RawMessage("not valid json")},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -237,7 +237,7 @@ func TestSandboxAgentCaller_ParseError(t *testing.T) {
 func TestSandboxAgentCaller_SandboxNotReleasedAfterCall(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "options": []}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "options": []}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -256,7 +256,7 @@ func TestSandboxAgentCaller_SandboxNotReleasedAfterCall(t *testing.T) {
 func TestSandboxAgentCaller_ContextPropagation(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "options": []}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "options": []}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -301,7 +301,7 @@ func TestSandboxAgentCaller_ContextPropagation(t *testing.T) {
 func TestSandboxAgentCaller_VerifyPassesExecutionResult(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -349,7 +349,7 @@ func TestSandboxAgentCaller_VerifyPassesExecutionResult(t *testing.T) {
 func TestSandboxAgentCaller_VerifyNilExecLeavesExecutionResultNil(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -366,7 +366,7 @@ func TestSandboxAgentCaller_VerifyNilExecLeavesExecutionResultNil(t *testing.T)
 func TestSandboxAgentCaller_VerifyExecWithoutInlineVerification(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -390,7 +390,7 @@ func TestSandboxAgentCaller_VerifyExecWithoutInlineVerification(t *testing.T) {
 func TestSandboxAgentCaller_ExecutePassesApprovedOption(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "actionsTaken": []}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "actionsTaken": []}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -410,7 +410,7 @@ func TestSandboxAgentCaller_ExecutePassesApprovedOption(t *testing.T) {
 func TestSandboxAgentCaller_AnalysisQueryFraming(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "options": []}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "options": []}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -430,7 +430,7 @@ func TestSandboxAgentCaller_AnalysisQueryFraming(t *testing.T) {
 func TestSandboxAgentCaller_ExecutionQueryFraming(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "actionsTaken": []}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "actionsTaken": []}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -460,7 +460,7 @@ func TestSandboxAgentCaller_ExecutionQueryFraming(t *testing.T) {
 func TestSandboxAgentCaller_VerificationQueryFraming(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "checks": [], "summary": "ok"}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -492,7 +492,7 @@ func TestSandboxAgentCaller_VerificationQueryFraming(t *testing.T) {
 func TestSandboxAgentCaller_ExecutionQueryNilOption(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "claim-1", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
-		response: &agentRunResponse{Response: json.RawMessage(`{"success": true, "actionsTaken": []}`)},
+		response: &agentRunResponse{Result: json.RawMessage(`{"success": true, "actionsTaken": []}`)},
 	}
 
 	caller := newTestSandboxAgentCaller(sandbox, httpClient)
@@ -512,7 +512,7 @@ func TestSandboxAgentCaller_Analyze_PatchesSandboxInfo(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "ls-analysis-fix-crash", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
 		response: &agentRunResponse{
-			Response: json.RawMessage(`{"success": true, "options": [{"title": "Fix it", "diagnosis": {"summary": "broken", "confidence": "High", "rootCause": "bug"}, "proposal": {"description": "fix", "actions": [{"type": "patch", "description": "patch"}], "risk": "Low"}}]}`),
+			Result: json.RawMessage(`{"success": true, "options": [{"title": "Fix it", "diagnosis": {"summary": "broken", "confidence": "High", "rootCause": "bug"}, "proposal": {"description": "fix", "actions": [{"type": "patch", "description": "patch"}], "risk": "Low"}}]}`),
 		},
 	}
 
@@ -541,7 +541,7 @@ func TestSandboxAgentCaller_Execute_PatchesSandboxInfo(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "ls-execution-fix-crash", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
 		response: &agentRunResponse{
-			Response: json.RawMessage(`{"success": true, "actionsTaken": [{"type": "patch", "description": "patched deploy"}]}`),
+			Result: json.RawMessage(`{"success": true, "actionsTaken": [{"type": "patch", "description": "patched deploy"}]}`),
 		},
 	}
 
@@ -567,7 +567,7 @@ func TestSandboxAgentCaller_Verify_PatchesSandboxInfo(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "ls-verification-fix-crash", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
 		response: &agentRunResponse{
-			Response: json.RawMessage(`{"success": true, "checks": [{"name": "pod-running", "source": "oc", "value": "Running", "result": "Passed"}], "summary": "All checks passed"}`),
+			Result: json.RawMessage(`{"success": true, "checks": [{"name": "pod-running", "source": "oc", "value": "Running", "result": "Passed"}], "summary": "All checks passed"}`),
 		},
 	}
 
@@ -593,7 +593,7 @@ func TestSandboxAgentCaller_SandboxInfoPatch_DoesNotBlockOnError(t *testing.T) {
 	sandbox := &mockSandboxProvider{claimName: "ls-analysis-fix-crash", endpoint: "http://sandbox:8080"}
 	httpClient := &mockHTTPClient{
 		response: &agentRunResponse{
-			Response: json.RawMessage(`{"success": true, "options": []}`),
+			Result: json.RawMessage(`{"success": true, "options": []}`),
 		},
 	}
 
diff --git a/test/agent/Makefile b/test/agent/Makefile
index ca287b4a..ace4c6e6 100644
--- a/test/agent/Makefile
+++ b/test/agent/Makefile
@@ -7,7 +7,7 @@
 _AGENT_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
 ROOT := $(abspath $(_AGENT_DIR)/../..)
 CONTAINER_TOOL ?= $(shell which podman >/dev/null 2>&1 && echo podman || echo docker)
-IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent
+IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric
 
 .PHONY: docker-build docker-push
 
diff --git a/test/agent/main.go b/test/agent/main.go
index 44a20694..1d795a52 100644
--- a/test/agent/main.go
+++ b/test/agent/main.go
@@ -151,11 +151,13 @@ func handleRun(w http.ResponseWriter, r *http.Request) {
 		time.Sleep(d)
 	}
 
-	body := cannedResponse(phase, ns)
+	result := cannedResponse(phase, ns)
+
+	envelope := fmt.Sprintf(`{"metrics":{"latency_ms":1500,"input_tokens":100,"output_tokens":50,"model":"mock-model","provider":"mock","tool_calls_count":1},"result":%s}`, string(result))
 
 	w.Header().Set("Content-Type", "application/json")
 	w.WriteHeader(http.StatusOK)
-	if _, err := w.Write(body); err != nil {
+	if _, err := w.Write([]byte(envelope)); err != nil {
 		log.Printf("write response: %v", err)
 	}
 }
diff --git a/test/agent/sandboxtemplate/sandboxtemplate.yaml b/test/agent/sandboxtemplate/sandboxtemplate.yaml
index 2bf06fbb..ffd930e2 100644
--- a/test/agent/sandboxtemplate/sandboxtemplate.yaml
+++ b/test/agent/sandboxtemplate/sandboxtemplate.yaml
@@ -21,7 +21,7 @@ spec:
       automountServiceAccountToken: false
       containers:
         - name: agent
-          image: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent
+          image: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric
           args: ["-addr", ":8080"]
           ports:
             - name: http
diff --git a/test/e2e/helpers_test.go b/test/e2e/helpers_test.go
index 9544e2f4..761dda20 100644
--- a/test/e2e/helpers_test.go
+++ b/test/e2e/helpers_test.go
@@ -201,7 +201,7 @@ func createProposal(t *testing.T, c client.Client, name string) *agenticv1alpha1
 		Spec: agenticv1alpha1.ProposalSpec{
 			Request:          "Pod crash-looping in staging namespace",
 			TargetNamespaces: []string{"staging"},
-			Tools:            agenticv1alpha1.ToolsSpec{Skills: []agenticv1alpha1.SkillsSource{{Image: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent", Paths: []string{"/skills"}}}},
+			Tools:            agenticv1alpha1.ToolsSpec{Skills: []agenticv1alpha1.SkillsSource{{Image: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric", Paths: []string{"/skills"}}}},
 			Analysis:         agenticv1alpha1.ProposalStep{Agent: "e2e-agent"},
 			Execution:        agenticv1alpha1.ProposalStep{Agent: "e2e-agent"},
 			Verification:     agenticv1alpha1.ProposalStep{Agent: "e2e-agent"},