Skip to content

[codex-analytics] emit tool item events from item lifecycle#17090

Merged
rhan-oai merged 3 commits intomainfrom
pr17090
May 6, 2026
Merged

[codex-analytics] emit tool item events from item lifecycle#17090
rhan-oai merged 3 commits intomainfrom
pr17090

Conversation

@rhan-oai
Copy link
Copy Markdown
Collaborator

@rhan-oai rhan-oai commented Apr 8, 2026

Why

After the tool-item schemas are in place, analytics needs to emit them from the app-server item lifecycle rather than requiring bespoke tracking at each callsite. The reducer should also reuse the shared thread analytics context introduced below it in the stack so later event families do not repeat the same reducer joins or missing-state ladder.

What changed

  • Tracks tool-item completion notifications and emits the matching tool analytics event when a terminal item arrives.
  • Derives event-specific payload details for command execution, file changes, MCP calls, dynamic tools, collaboration tools, web search, and image generation.
  • Denormalizes thread, app-server client, runtime, and subagent provenance metadata through the shared thread analytics context.
  • Adds reducer coverage for item lifecycle emission and subagent metadata inheritance.

Duration semantics

duration_ms is computed from the app-server item lifecycle timestamps: completed_at_ms - started_at_ms. That makes it the duration of the lifecycle Codex observed locally, not necessarily the upstream provider's full execution time.

  • Web search usually has a meaningful observed lifecycle because Responses can send response.output_item.added before response.output_item.done; in that case started_at_ms comes from the added event and completed_at_ms comes from the done event.
  • Image generation can be much less precise. In the current observed stream, image generation often arrives only as a completed response.output_item.done; when there is no earlier added event, Codex synthesizes the started item immediately before completion, so duration_ms can be 0 even though upstream image generation took longer.
  • Standalone web search and standalone image generation work is expected to land after this stack. Those paths may introduce more direct lifecycle events or timing points, so the current web-search/image-generation duration semantics should be treated as the best available item-lifecycle approximation, not the final latency contract for those tool families.
  • execution_duration_ms is populated only where the completed item already carries a native execution duration; otherwise it remains null while duration_ms still reflects the local lifecycle interval.

Currently placeholder / partial fields

Some fields are included in the schema for the intended steady-state contract, but this PR does not yet populate them from real approval/review state:

  • review_count, guardian_review_count, and user_review_count currently default to 0.
  • final_approval_outcome currently defaults to unknown.
  • requested_additional_permissions and requested_network_access currently default to false.

Verification

  • cargo test -p codex-analytics

Stack created with Sapling. Best reviewed with ReviewStack.

@rhan-oai rhan-oai force-pushed the pr17090 branch 11 times, most recently from c01eb26 to 94b0702 Compare April 14, 2026 06:13
@rhan-oai rhan-oai changed the title [codex-analytics] emit basic tool events from item lifecycle [codex-analytics] emit tool item events from item lifecycle Apr 14, 2026
@rhan-oai rhan-oai force-pushed the pr17090 branch 8 times, most recently from ff079ef to 35c9400 Compare April 20, 2026 20:37
@rhan-oai rhan-oai force-pushed the pr17090 branch 3 times, most recently from 6d66905 to 6f6d9a6 Compare April 22, 2026 07:02
@rhan-oai rhan-oai force-pushed the pr17090 branch 2 times, most recently from c103b93 to da2363e Compare April 28, 2026 21:57
@rhan-oai rhan-oai changed the base branch from main to pr17089 April 28, 2026 22:24
@rhan-oai rhan-oai force-pushed the pr17090 branch 2 times, most recently from c2fa71a to 2dcdd29 Compare April 29, 2026 18:25
@rhan-oai rhan-oai force-pushed the pr17089 branch 2 times, most recently from a8a52ed to 090907d Compare April 29, 2026 18:27
@rhan-oai rhan-oai force-pushed the pr17090 branch 2 times, most recently from 05ee118 to 559e5b8 Compare April 29, 2026 18:29
@rhan-oai rhan-oai force-pushed the pr17089 branch 2 times, most recently from 175603e to 63d64d5 Compare April 29, 2026 18:58
@rhan-oai rhan-oai force-pushed the pr17090 branch 2 times, most recently from ab9f9ea to 7b7ac90 Compare April 29, 2026 19:05
rhan-oai added a commit that referenced this pull request Apr 29, 2026
## Why

Codex analytics needs a typed seam for app-server-originated
request/response traffic so future tool-approval analytics can consume
those facts without adding bespoke callsite tracking each time. Server
responses arrive as JSON-RPC `id + result` payloads, so analytics has to
reconstruct the matching typed response from the original typed request
while that request context still exists in app-server.

This also puts analytics on the app-server outbound path, which needs to
avoid keeping the runtime alive during shutdown. The final ownership fix
keeps the normal strong auth-manager retention in analytics and makes
the external-auth refresh bridge hold a weak back-reference to
`OutgoingMessageSender`, breaking the runtime cycle at the bridge
boundary instead of exposing retention policy through the analytics
client API.

## What changed

- Adds typed `ServerRequest` and `ServerResponse` analytics facts, plus
`AnalyticsEventsClient::track_server_request` and
`track_server_response`.
- Renames the existing client-side facts to `ClientRequest` and
`ClientResponse` so reducers can distinguish client-to-server traffic
from server-to-client traffic.
- Adds `ServerRequest::response_from_result`, allowing a stored typed
request to decode the matching typed server response from a raw JSON-RPC
result payload.
- Threads `AnalyticsEventsClient` through `OutgoingMessageSender` and
records targeted server requests, replayed targeted requests, and
matching targeted responses with the responding connection id needed for
correlation.
- Intentionally leaves broadcast server requests/responses out of
analytics for now because the current model is per connection, while
broadcasts fan one logical request out across multiple connections.
- Breaks the app-server shutdown cycle by storing
`Weak<OutgoingMessageSender>` in `ExternalAuthRefreshBridge` and
upgrading it only when an external-auth refresh is actually requested.
- Keeps reducer ingestion of the new server-side facts as no-ops for
now; this PR is plumbing for later tool-approval analytics work.

## Verification

- `cargo test -p codex-analytics`
- `cargo test -p codex-app-server outgoing_message::tests::`
- Covers typed-response reconstruction plus the targeted, replayed,
broadcast-exclusion, and response-attribution analytics paths.

## Follow-up

This PR intentionally stops at ingestion plumbing, so `ServerRequest`
and `ServerResponse` facts are still reducer no-ops. Once a follow-up PR
adds real downstream analytics output for those facts:

- replace the temporary pre-reducer observation seam with reducer tests
for the emitted event shape;
- add end-to-end coverage in `app-server/tests/suite/v2/analytics.rs`
for the real app-server workflow and captured analytics payload;
- remove the temporary sender-level observer tests added here in favor
of the real-output coverage above.

---

[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/17088).
* #18748
* #18747
* #17090
* #17089
* #20241
* #20239
* __->__ #17088
@rhan-oai rhan-oai force-pushed the pr17090 branch 2 times, most recently from 2351355 to 7b7ac90 Compare April 29, 2026 21:14
@rhan-oai
Copy link
Copy Markdown
Collaborator Author

rhan-oai commented May 6, 2026

/merge

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants