Define sandbox → operator metrics handoff#74
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Warning Review limit reached
More reviews will be available in 47 minutes and 17 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughRefactors ChangesRunResponse {metrics, result} envelope
Sequence DiagramsequenceDiagram
participant Client
participant run_endpoint
participant EventLogger
participant Provider
Client->>run_endpoint: POST /v1/agent/run
run_endpoint->>EventLogger: EventLogger("run"), start monotonic timer
loop provider stream events
Provider-->>run_endpoint: tool_call event
run_endpoint->>EventLogger: log, increment tool_calls_count
Provider-->>run_endpoint: result event
run_endpoint->>EventLogger: log, capture text + token/cost metrics
end
alt timeout
run_endpoint-->>Client: {metrics: {latency_ms, zeroed tokens}, result: {success: false, summary: "timed out"}}
else agent error
run_endpoint-->>Client: {metrics: {partial tool_calls_count}, result: {success: false, summary: "agent error: ..."}}
else success
run_endpoint->>run_endpoint: parse JSON → RunResult, filter metrics/success/summary keys
run_endpoint-->>Client: {metrics: RunMetrics, result: {success, summary, ...extra fields}}
end
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/e2e/steps/then.py (1)
14-16: 💤 Low valueHelper lacks defensive error handling for malformed envelope.
The
_run_result()helper raisesKeyErrorif the response body is missing a"result"key. Test steps that use this helper (lines 30, 41, 48, 57, 75) without first validating the envelope (viaassert_200_envelopeat lines 84–97) will fail with an opaque traceback if the envelope is broken. Consider adding a guard or ensuring all scenarios call envelope validation first.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/e2e/steps/then.py` around lines 14 - 16, The _run_result() helper function lacks defensive error handling and directly accesses body["result"] without validating that the key exists, causing opaque KeyError exceptions if the envelope is malformed. Add a check to verify the "result" key exists in the body parameter before accessing it, and raise a more informative error message if it is missing (for example, indicating that the response envelope is malformed or missing the expected "result" key). This ensures that test steps calling _run_result() at lines 30, 41, 48, 57, and 75 will get clear error messages even if they skip the envelope validation check at lines 84–97.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_agentic/routes/query.py`:
- Around line 156-166: The response structure in the _response call for the
RunResult is changing the contract with the operator, which expects flat
top-level fields but will now only receive metrics and result, causing workflow
fields to be empty and proposal steps to fail. Add a rollout guard
(compatibility gate or feature flag) that conditionally returns the response in
the old contract format during rollout until the operator parsing is updated and
deployed. This guard should wrap the response generation logic to maintain
backward compatibility while the operator is being updated in parallel.
---
Nitpick comments:
In `@tests/e2e/steps/then.py`:
- Around line 14-16: The _run_result() helper function lacks defensive error
handling and directly accesses body["result"] without validating that the key
exists, causing opaque KeyError exceptions if the envelope is malformed. Add a
check to verify the "result" key exists in the body parameter before accessing
it, and raise a more informative error message if it is missing (for example,
indicating that the response envelope is malformed or missing the expected
"result" key). This ensures that test steps calling _run_result() at lines 30,
41, 48, 57, and 75 will get clear error messages even if they skip the envelope
validation check at lines 84–97.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1a9bc143-3a0c-4786-97d5-92c7a03f153b
📒 Files selected for processing (10)
.ai/spec/how/provider-architecture.md.ai/spec/what/run-api.mdAGENTS.mdevals/runner.pysrc/lightspeed_agentic/routes/models.pysrc/lightspeed_agentic/routes/query.pytests/e2e/features/structured_output.featuretests/e2e/runner.pytests/e2e/steps/then.pytests/test_routes.py
🔗 Linked repositories identified
CodeRabbit considers these linked repositories for cross-repo context during reviews:
openshift/lightspeed-agentic-operator(manual)
| return _response( | ||
| RunResult( | ||
| success=parsed.get("success", True), | ||
| summary=parsed.get("summary", text), | ||
| **{ | ||
| k: v | ||
| for k, v in parsed.items() | ||
| if k not in ("success", "summary", "metrics") | ||
| }, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Add a rollout guard for the response-contract break.
This endpoint now emits only {metrics, result}. The linked operator currently unmarshals flat top-level fields; with this shape change, workflow fields end up unread (zero-valued/empty), so proposal steps can fail or be treated as unsuccessful after deploy unless rollout ordering is strictly enforced. Add a compatibility gate (or versioned endpoint) until operator parsing is updated and deployed.
Also applies to: 171-177
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lightspeed_agentic/routes/query.py` around lines 156 - 166, The response
structure in the _response call for the RunResult is changing the contract with
the operator, which expects flat top-level fields but will now only receive
metrics and result, causing workflow fields to be empty and proposal steps to
fail. Add a rollout guard (compatibility gate or feature flag) that
conditionally returns the response in the old contract format during rollout
until the operator parsing is updated and deployed. This guard should wrap the
response generation logic to maintain backward compatibility while the operator
is being updated in parallel.
Source: Linked repositories
7ed401e to
d83adae
Compare
|
@blublinsky: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary
Implements OLS-3130: POST /v1/agent/run now returns a fixed {metrics, result} envelope instead of a flat {success, summary, …} body.
metrics (sandbox-owned): latency_ms, input_tokens, output_tokens, cost_usd?, model, provider, tool_calls_count
result (agent-owned): success, summary, plus structured fields from outputSchema
Both keys are present on success, timeout, agent error, and empty-result paths. Model-emitted metrics keys are stripped from parsed JSON; cost_usd is omitted when unknown (never faked as 0).
Breaking change: the operator must unwrap data["result"] for workflow fields and read data["metrics"] for observability. Operator changes are out of scope here and must land before production rollout.