Define sandbox → operator metrics handoff by blublinsky · Pull Request #74 · openshift/lightspeed-agentic-sandbox

blublinsky · 2026-06-15T13:21:32Z

Summary

Implements OLS-3130: POST /v1/agent/run now returns a fixed {metrics, result} envelope instead of a flat {success, summary, …} body.

metrics (sandbox-owned): latency_ms, input_tokens, output_tokens, cost_usd?, model, provider, tool_calls_count
result (agent-owned): success, summary, plus structured fields from outputSchema
Both keys are present on success, timeout, agent error, and empty-result paths. Model-emitted metrics keys are stripped from parsed JSON; cost_usd is omitted when unknown (never faked as 0).

Breaking change: the operator must unwrap data["result"] for workflow fields and read data["metrics"] for observability. Operator changes are out of scope here and must land before production rollout.

openshift-ci · 2026-06-15T13:21:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign harche for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-15T13:22:03Z

Warning

Review limit reached

@blublinsky, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 47 minutes and 17 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b2c92503-2179-48e9-97e4-0e2fbc3468f7

📥 Commits

Reviewing files that changed from the base of the PR and between 7ed401e and d83adae.

📒 Files selected for processing (10)

.ai/spec/how/provider-architecture.md
.ai/spec/what/run-api.md
AGENTS.md
evals/runner.py
src/lightspeed_agentic/routes/models.py
src/lightspeed_agentic/routes/query.py
tests/e2e/features/structured_output.feature
tests/e2e/runner.py
tests/e2e/steps/then.py
tests/test_routes.py

📝 Walkthrough

Walkthrough

Refactors POST /v1/agent/run response from a flat {success, summary} shape to a strict {metrics, result} envelope. Introduces RunMetrics and RunResult Pydantic models, rewrites the streaming endpoint to populate and return the envelope, and updates all specs, unit tests, E2E steps, and eval tooling to match.

Changes

RunResponse {metrics, result} envelope

Layer / File(s)	Summary
Behavioral spec and AGENTS.md references `.ai/spec/what/run-api.md`, `.ai/spec/how/provider-architecture.md`, `AGENTS.md`	`run-api.md` rewrites the `RunResponse` contract to require exactly `metrics` and `result` at the top level, defines the `metrics` schema, updates all error-path shapes and examples, and adds a verification table. `provider-architecture.md` and `AGENTS.md` update cross-references accordingly.
Pydantic model definitions `src/lightspeed_agentic/routes/models.py`	Adds `RunMetrics` (extra fields forbidden, typed telemetry) and `RunResult` (extra fields allowed), and replaces the old `{success, summary}` `RunResponse` with the two-field envelope model. `ConfigDict` is added to imports.
`run_endpoint` streaming and envelope construction `src/lightspeed_agentic/routes/query.py`	Adds `_normalize_cost_usd`, introduces monotonic timing/counter variables with `_metrics()`/`_response()` closures, replaces the streaming loop with an `EventLogger`-based async `run()` that counts tool calls and captures the terminal result event, and reworks JSON parsing to merge extra fields into `RunResult` while filtering `metrics`/`success`/`summary` keys.
Unit test updates and new envelope coverage `tests/test_routes.py`	Adds `_result`/`_metrics`/`_assert_envelope` helpers and updates all existing tests to read from the envelope. Adds new cases for metrics key stripping, `cost_usd` omission, `tool_calls_count`, agent error envelope, partial tool call metrics on error, and structured fields isolation inside `result`.
E2E steps, runner, and eval runner updates `tests/e2e/steps/then.py`, `tests/e2e/runner.py`, `tests/e2e/features/structured_output.feature`, `evals/runner.py`	`then.py` adds a `_run_result` helper and redirects all field assertions into `body["result"]`, with the envelope shape assertion step explicitly checking for `metrics` and `result` keys. `evals/runner.py` extracts `success`/`summary` from `data["result"]`. Runner docstrings and feature spec references updated.

Sequence Diagram

sequenceDiagram
  participant Client
  participant run_endpoint
  participant EventLogger
  participant Provider

  Client->>run_endpoint: POST /v1/agent/run
  run_endpoint->>EventLogger: EventLogger("run"), start monotonic timer
  loop provider stream events
    Provider-->>run_endpoint: tool_call event
    run_endpoint->>EventLogger: log, increment tool_calls_count
    Provider-->>run_endpoint: result event
    run_endpoint->>EventLogger: log, capture text + token/cost metrics
  end
  alt timeout
    run_endpoint-->>Client: {metrics: {latency_ms, zeroed tokens}, result: {success: false, summary: "timed out"}}
  else agent error
    run_endpoint-->>Client: {metrics: {partial tool_calls_count}, result: {success: false, summary: "agent error: ..."}}
  else success
    run_endpoint->>run_endpoint: parse JSON → RunResult, filter metrics/success/summary keys
    run_endpoint-->>Client: {metrics: RunMetrics, result: {success, summary, ...extra fields}}
  end

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly reflects the main change: restructuring POST /run response to include a metrics-result envelope for sandbox-operator handoff.
Description check	✅ Passed	The description clearly explains the envelope restructuring, field definitions, scope boundaries, and breaking change implications.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/e2e/steps/then.py (1)
14-16: 💤 Low value

Helper lacks defensive error handling for malformed envelope.

The _run_result() helper raises KeyError if the response body is missing a "result" key. Test steps that use this helper (lines 30, 41, 48, 57, 75) without first validating the envelope (via assert_200_envelope at lines 84–97) will fail with an opaque traceback if the envelope is broken. Consider adding a guard or ensuring all scenarios call envelope validation first.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/steps/then.py` around lines 14 - 16, The _run_result() helper
function lacks defensive error handling and directly accesses body["result"]
without validating that the key exists, causing opaque KeyError exceptions if
the envelope is malformed. Add a check to verify the "result" key exists in the
body parameter before accessing it, and raise a more informative error message
if it is missing (for example, indicating that the response envelope is
malformed or missing the expected "result" key). This ensures that test steps
calling _run_result() at lines 30, 41, 48, 57, and 75 will get clear error
messages even if they skip the envelope validation check at lines 84–97.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_agentic/routes/query.py`:
- Around line 156-166: The response structure in the _response call for the
RunResult is changing the contract with the operator, which expects flat
top-level fields but will now only receive metrics and result, causing workflow
fields to be empty and proposal steps to fail. Add a rollout guard
(compatibility gate or feature flag) that conditionally returns the response in
the old contract format during rollout until the operator parsing is updated and
deployed. This guard should wrap the response generation logic to maintain
backward compatibility while the operator is being updated in parallel.

---

Nitpick comments:
In `@tests/e2e/steps/then.py`:
- Around line 14-16: The _run_result() helper function lacks defensive error
handling and directly accesses body["result"] without validating that the key
exists, causing opaque KeyError exceptions if the envelope is malformed. Add a
check to verify the "result" key exists in the body parameter before accessing
it, and raise a more informative error message if it is missing (for example,
indicating that the response envelope is malformed or missing the expected
"result" key). This ensures that test steps calling _run_result() at lines 30,
41, 48, 57, and 75 will get clear error messages even if they skip the envelope
validation check at lines 84–97.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1a9bc143-3a0c-4786-97d5-92c7a03f153b

📥 Commits

Reviewing files that changed from the base of the PR and between 951c2af and 7ed401e.

📒 Files selected for processing (10)

.ai/spec/how/provider-architecture.md
.ai/spec/what/run-api.md
AGENTS.md
evals/runner.py
src/lightspeed_agentic/routes/models.py
src/lightspeed_agentic/routes/query.py
tests/e2e/features/structured_output.feature
tests/e2e/runner.py
tests/e2e/steps/then.py
tests/test_routes.py

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

openshift/lightspeed-agentic-operator (manual)

coderabbitai · 2026-06-15T13:28:00Z

+            return _response(
+                RunResult(
+                    success=parsed.get("success", True),
+                    summary=parsed.get("summary", text),
+                    **{
+                        k: v
+                        for k, v in parsed.items()
+                        if k not in ("success", "summary", "metrics")
+                    },
+                )
            )


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add a rollout guard for the response-contract break.

This endpoint now emits only {metrics, result}. The linked operator currently unmarshals flat top-level fields; with this shape change, workflow fields end up unread (zero-valued/empty), so proposal steps can fail or be treated as unsuccessful after deploy unless rollout ordering is strictly enforced. Add a compatibility gate (or versioned endpoint) until operator parsing is updated and deployed.

Also applies to: 171-177

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lightspeed_agentic/routes/query.py` around lines 156 - 166, The response structure in the _response call for the RunResult is changing the contract with the operator, which expects flat top-level fields but will now only receive metrics and result, causing workflow fields to be empty and proposal steps to fail. Add a rollout guard (compatibility gate or feature flag) that conditionally returns the response in the old contract format during rollout until the operator parsing is updated and deployed. This guard should wrap the response generation logic to maintain backward compatibility while the operator is being updated in parallel.

Source: Linked repositories

openshift-ci · 2026-06-15T13:49:43Z

@blublinsky: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci · 2026-06-17T08:50:01Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci Bot requested review from harche and xrajesh June 15, 2026 13:21

coderabbitai Bot requested changes Jun 15, 2026

View reviewed changes

Define sandbox → operator metrics handoff

d83adae

blublinsky force-pushed the ols-3130-metrics-envelope branch from 7ed401e to d83adae Compare June 15, 2026 13:34

blublinsky mentioned this pull request Jun 16, 2026

OLS-2989: Implement operator-side metrics envelope parsing and Result CR storage openshift/lightspeed-agentic-operator#133

Open

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define sandbox → operator metrics handoff#74

Define sandbox → operator metrics handoff#74
blublinsky wants to merge 1 commit into
openshift:mainfrom
blublinsky:ols-3130-metrics-envelope

blublinsky commented Jun 15, 2026

Uh oh!

openshift-ci Bot commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Uh oh!

openshift-ci Bot commented Jun 15, 2026

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blublinsky commented Jun 15, 2026

Uh oh!

openshift-ci Bot commented Jun 15, 2026

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 15, 2026

Uh oh!

openshift-ci Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading