Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .ai/spec/how/reconciler.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ Audience: AI agents. Behavioral rules and phase semantics live in **what/** spec

- **Constructor:** Accepts `SandboxProvider`, `client.Client`, `ClientFactory func(endpoint string) AgentHTTPClientInterface`, operator namespace. `Timeout` defaults to `defaultSandboxTimeout` const.
- **`callWithSandbox` order:** `SetStep` on provider → `Claim` → `patchSandboxInfo` (status subresource merge) → `WaitReady` → normalize URL (`http://{endpoint}:8080` if no scheme) → `outputSchemaForStep` → `ClientFactory(endpoint).Run(ctx, "", query, schema, agentCtx)`. Template derivation (sandbox-claim mode) happens inside `SandboxManager.Claim`; bare-pod mode builds the pod spec inside `BarePodManager.Claim`.
- **`Run` contract:** Empty `systemPrompt`; full payload in POST body per `client.go` (`query`, `outputSchema`, `context`). Path constant `/v1/agent/run`.
- **`Run` contract:** Empty `systemPrompt`; full payload in POST body per `client.go` (`query`, `outputSchema`, `context`). Path constant `/v1/agent/run`. Response is a `{metrics, result}` envelope; `callWithSandbox` returns both the raw result JSON and the parsed `RunMetrics`.
- **`buildAgentContext`:** `TargetNamespaces`, `ApprovedOption` / `ExecutionResult` per step, `PreviousAttempts` from failed `StepResultRef` outcomes across analysis/execution/verification result lists.
- **`ReleaseSandboxes`:** Iterates `Status.Steps.{Analysis,Execution,Verification,Escalation}.Sandbox.ClaimName` and calls `Release` for each non-empty.

Expand All @@ -135,7 +135,8 @@ Audience: AI agents. Behavioral rules and phase semantics live in **what/** spec

- **`AgentHTTPClientInterface`:** `Run(ctx, systemPrompt, query, outputSchema, agentCtx) (*agentRunResponse, error)`.
- **`NewAgentHTTPClient`:** Returns concrete type with long HTTP timeout, TLS `InsecureSkipVerify` for in-cluster calls.
- **`Run`:** Marshals `agentRunRequest`, POSTs, reads capped body size, non-200 → error with truncated body; 200 → raw JSON in `agentRunResponse.Response` for caller to unmarshal phase-specific structs.
- **`Run`:** Marshals `agentRunRequest`, POSTs, reads capped body size, non-200 → error with truncated body; 200 → parses `{metrics, result}` envelope. Returns `agentRunResponse` containing both `Result json.RawMessage` (per-step workflow data) and `Metrics *RunMetrics` (telemetry). Callers unmarshal `Result` into phase-specific structs; `Metrics` is passed to Result CR creation for storage.
- **`RunMetrics`:** `LatencyMs int64`, `InputTokens int64`, `OutputTokens int64`, `CostUSD *string` (nil when unknown; decimal string e.g. "0.05"), `Model string`, `Provider string`, `ToolCallsCount int`.

Comment thread
coderabbitai[bot] marked this conversation as resolved.
---

Expand Down
3 changes: 2 additions & 1 deletion .ai/spec/what/sandbox-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Behavioral specification for how workflow steps run inside ephemeral **sandboxes
6. **Readiness**: In `sandbox-claim` mode, the controller MUST poll sandbox/claim status until the backing `Sandbox` reports `Ready=True` (standard condition pattern) and exposes a **service FQDN** for in-cluster HTTP, or until a configurable **sandbox wait budget** elapses (error path). In `bare-pod` mode, the controller MUST poll the Pod's conditions until `Ready=True` and extract `status.podIP` as the endpoint, or until the sandbox wait budget elapses.
7. **Endpoint construction**: Agent HTTP URL MUST be formed from the readiness endpoint; if the endpoint is not already an absolute URL with HTTP scheme, the client MUST prefix standard cluster HTTP scheme and port expected for the agent container.
8. **HTTP contract**: Each step MUST call the agent **`POST /v1/agent/run`** with JSON body carrying at least `query`, `outputSchema`, and `context`; optional `systemPrompt` and `timeout_ms` exist in the wire shape but **system prompt MUST be sent empty** in the current implementation (prompt material lives in `query` and templates).
9. **Response handling**: HTTP success responses MUST be parsed as JSON matching the per-step schema (analysis/execution/verification/escalation). Non-success HTTP MUST fail the step with an error surfaced to proposal conditions.
9. **Response envelope**: HTTP success responses MUST be parsed as a `{metrics, result}` JSON envelope. The `result` field contains the per-step workflow data (analysis options, execution actions, verification checks, escalation content) matching the step's `outputSchema`. The `metrics` field contains sandbox-owned telemetry: `latency_ms`, `input_tokens`, `output_tokens`, `cost_usd` (optional, omitted when unknown), `model`, `provider`, `tool_calls_count`. Non-success HTTP MUST fail the step with an error surfaced to proposal conditions.
9a. **Metrics handling**: The operator MUST extract `metrics` from the envelope and store them on the corresponding Result CR status (AnalysisResult, ExecutionResult, VerificationResult, EscalationResult). The operator MUST NOT rely on metrics for workflow decisions — they are observability-only data.
10. **Output schema selection**: `outputSchema` MUST be the step-specific JSON schema: analysis schema depends on `spec.analysisOutput.mode`, whether execution/verification steps exist in the proposal, and optional injected `components` sub-schema from `spec.analysisOutput.schema`; other steps use fixed schemas for their response shapes.
11. **Analysis query payload**: The `query` string MUST encode the user request or revision-augmented request and encode workflow flags indicating whether execution/verification steps exist (template-rendered).
12. **Execution query payload**: The `query` MUST include JSON describing the approved remediation option.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ spec:
- name: NAMESPACE
value: "$(params.namespace)"
- name: SANDBOX_IMAGE
value: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent"
value: "quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric"
image: registry.redhat.io/openshift4/ose-cli:latest
script: |
set -euo pipefail
Expand Down
4 changes: 2 additions & 2 deletions .tekton/integration-tests/scripts/install-operator.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
# Optional env:
# OPERATOR_NAMESPACE (default: openshift-lightspeed)
# SANDBOX_MODE (default: bare-pod)
# SANDBOX_IMAGE (default: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent)
# SANDBOX_IMAGE (default: quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric)

set -euo pipefail

Expand All @@ -18,7 +18,7 @@ set -euo pipefail

OPERATOR_NAMESPACE="${OPERATOR_NAMESPACE:-openshift-lightspeed}"
SANDBOX_MODE="${SANDBOX_MODE:-bare-pod}"
SANDBOX_IMAGE="${SANDBOX_IMAGE:-quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent}"
SANDBOX_IMAGE="${SANDBOX_IMAGE:-quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric}"

echo "=== Agentic operator install ==="
echo " IMG: ${IMG}"
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ endif
# Sandbox mode: "bare-pod" (default) or "sandbox-claim".
SANDBOX_MODE ?= bare-pod
# Agent sandbox image used by bare-pod mode (the container the operator creates per step).
SANDBOX_IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent
SANDBOX_IMAGE ?= quay.io/openshift-lightspeed/ols-qe:lightspeed-mock-agent-metric

# kubernetes-sigs/agent-sandbox release reference (used only for documentation links).
AGENT_SANDBOX_VERSION ?= v0.4.5
Expand Down
4 changes: 4 additions & 0 deletions api/v1alpha1/analysisresult_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ type AnalysisResultStatus struct {
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=8192
FailureReason string `json:"failureReason,omitempty"`

// metrics contains telemetry from the sandbox agent for this step.
// +optional
Metrics StepMetrics `json:"metrics,omitzero"`
Comment thread
coderabbitai[bot] marked this conversation as resolved.
}

// AnalysisResultSpec contains the immutable identity fields for an AnalysisResult.
Expand Down
4 changes: 4 additions & 0 deletions api/v1alpha1/escalationresult_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ type EscalationResultStatus struct {
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=8192
FailureReason string `json:"failureReason,omitempty"`

// metrics contains telemetry from the sandbox agent for this step.
// +optional
Metrics StepMetrics `json:"metrics,omitzero"`
}

// EscalationResultSpec contains the immutable identity fields for an EscalationResult.
Expand Down
4 changes: 4 additions & 0 deletions api/v1alpha1/executionresult_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ type ExecutionResultStatus struct {
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=8192
FailureReason string `json:"failureReason,omitempty"`

// metrics contains telemetry from the sandbox agent for this step.
// +optional
Metrics StepMetrics `json:"metrics,omitzero"`
}

// ExecutionResultSpec contains the immutable identity fields for an ExecutionResult.
Expand Down
44 changes: 44 additions & 0 deletions api/v1alpha1/shared_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,3 +191,47 @@ type SkillsSource struct {
// +kubebuilder:validation:items:MaxLength=512
Paths []string `json:"paths,omitempty"`
}

// StepMetrics contains telemetry data collected during a workflow step execution.
// Populated from the sandbox agent's response envelope.
type StepMetrics struct {
// latencyMs is the wall-clock time (milliseconds) the agent spent processing.
// +required
// +kubebuilder:validation:Minimum=0
LatencyMs *int64 `json:"latencyMs,omitempty"`

// inputTokens is the number of input tokens consumed by the LLM.
// +optional
// +kubebuilder:validation:Minimum=0
InputTokens *int64 `json:"inputTokens,omitempty"`

// outputTokens is the number of output tokens produced by the LLM.
// +optional
// +kubebuilder:validation:Minimum=0
OutputTokens *int64 `json:"outputTokens,omitempty"`

// costUsd is the estimated cost in US dollars for this step, if known.
// Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
// +optional
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=32
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+(\\\\.[0-9]+)?$')",message="costUsd must be a decimal number string (e.g. '0.05')"
CostUSD string `json:"costUsd,omitempty"`

// model is the LLM model used (e.g. "claude-opus-4-6").
// +optional
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=128
Model string `json:"model,omitempty"`

// provider is the LLM provider used (e.g. "anthropic", "openai").
// +optional
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=64
Provider string `json:"provider,omitempty"`

// toolCallsCount is the number of tool invocations the agent made.
// +optional
// +kubebuilder:validation:Minimum=0
ToolCallsCount *int32 `json:"toolCallsCount,omitempty"`
}
4 changes: 4 additions & 0 deletions api/v1alpha1/verificationresult_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,10 @@ type VerificationResultStatus struct {
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=8192
FailureReason string `json:"failureReason,omitempty"`

// metrics contains telemetry from the sandbox agent for this step.
// +optional
Metrics StepMetrics `json:"metrics,omitzero"`
}

// VerificationResultSpec contains the immutable identity fields for a VerificationResult.
Expand Down
52 changes: 52 additions & 0 deletions config/crd/bases/agentic.openshift.io_analysisresults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,58 @@ spec:
maxLength: 8192
minLength: 1
type: string
metrics:
description: metrics contains telemetry from the sandbox agent for
this step.
properties:
costUsd:
description: |-
costUsd is the estimated cost in US dollars for this step, if known.
Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
maxLength: 32
minLength: 1
type: string
x-kubernetes-validations:
- message: costUsd must be a decimal number string (e.g. '0.05')
rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
inputTokens:
description: inputTokens is the number of input tokens consumed
by the LLM.
format: int64
minimum: 0
type: integer
latencyMs:
description: latencyMs is the wall-clock time (milliseconds) the
agent spent processing.
format: int64
minimum: 0
type: integer
model:
description: model is the LLM model used (e.g. "claude-opus-4-6").
maxLength: 128
minLength: 1
type: string
outputTokens:
description: outputTokens is the number of output tokens produced
by the LLM.
format: int64
minimum: 0
type: integer
provider:
description: provider is the LLM provider used (e.g. "anthropic",
"openai").
maxLength: 64
minLength: 1
type: string
toolCallsCount:
description: toolCallsCount is the number of tool invocations
the agent made.
format: int32
minimum: 0
type: integer
required:
- latencyMs
type: object
options:
description: options contains the remediation options returned by
the analysis agent.
Expand Down
52 changes: 52 additions & 0 deletions config/crd/bases/agentic.openshift.io_escalationresults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,58 @@ spec:
maxLength: 8192
minLength: 1
type: string
metrics:
description: metrics contains telemetry from the sandbox agent for
this step.
properties:
costUsd:
description: |-
costUsd is the estimated cost in US dollars for this step, if known.
Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
maxLength: 32
minLength: 1
type: string
x-kubernetes-validations:
- message: costUsd must be a decimal number string (e.g. '0.05')
rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
inputTokens:
description: inputTokens is the number of input tokens consumed
by the LLM.
format: int64
minimum: 0
type: integer
latencyMs:
description: latencyMs is the wall-clock time (milliseconds) the
agent spent processing.
format: int64
minimum: 0
type: integer
model:
description: model is the LLM model used (e.g. "claude-opus-4-6").
maxLength: 128
minLength: 1
type: string
outputTokens:
description: outputTokens is the number of output tokens produced
by the LLM.
format: int64
minimum: 0
type: integer
provider:
description: provider is the LLM provider used (e.g. "anthropic",
"openai").
maxLength: 64
minLength: 1
type: string
toolCallsCount:
description: toolCallsCount is the number of tool invocations
the agent made.
format: int32
minimum: 0
type: integer
required:
- latencyMs
type: object
sandbox:
description: sandbox tracks the sandbox pod used for this escalation.
properties:
Expand Down
52 changes: 52 additions & 0 deletions config/crd/bases/agentic.openshift.io_executionresults.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,58 @@ spec:
maxLength: 8192
minLength: 1
type: string
metrics:
description: metrics contains telemetry from the sandbox agent for
this step.
properties:
costUsd:
description: |-
costUsd is the estimated cost in US dollars for this step, if known.
Serialized as a string to avoid floating-point portability issues (e.g. "0.05").
maxLength: 32
minLength: 1
type: string
x-kubernetes-validations:
- message: costUsd must be a decimal number string (e.g. '0.05')
rule: self.matches('^[0-9]+(\\.[0-9]+)?$')
inputTokens:
description: inputTokens is the number of input tokens consumed
by the LLM.
format: int64
minimum: 0
type: integer
latencyMs:
description: latencyMs is the wall-clock time (milliseconds) the
agent spent processing.
format: int64
minimum: 0
type: integer
model:
description: model is the LLM model used (e.g. "claude-opus-4-6").
maxLength: 128
minLength: 1
type: string
outputTokens:
description: outputTokens is the number of output tokens produced
by the LLM.
format: int64
minimum: 0
type: integer
provider:
description: provider is the LLM provider used (e.g. "anthropic",
"openai").
maxLength: 64
minLength: 1
type: string
toolCallsCount:
description: toolCallsCount is the number of tool invocations
the agent made.
format: int32
minimum: 0
type: integer
required:
- latencyMs
type: object
sandbox:
description: sandbox tracks the sandbox pod used for this execution.
properties:
Expand Down
Loading