[otel-advisor] OTel improvement: surface agent finish_reason (gen_ai.response.finish_reasons) on the reliably-exported conclusion span

### 📡 OTel Instrumentation Improvement: surface agent finish_reason on the conclusion span

**Analysis Date**: 2026-05-31 
**Priority**: High 
**Effort**: Medium (2–4h)

### Problem

The OpenTelemetry GenAI semantic layer — `gen_ai.response.finish_reasons`, `gen_ai.response.model`, and `gen_ai.usage.*` — is gated on `jobName === "agent"` in `actions/setup/js/send_otlp_span.cjs` (the `if (jobName === "agent")` block at lines 1911–1923, and the dedicated agent sub-span at lines 2117–2151). It is therefore attached **only** to the `agent` job's conclusion span (`gh-aw.agent.conclusion`) and its dedicated `gh-aw.agent.agent` sub-span.

Live sampling of the last 24h confirms these signals are **not reaching either backend**: `gen_ai.response.finish_reasons` appears on **zero** spans when checked against the *complete* Grafana Tempo span-tag index, and is likewise absent in Sentry. The agent job's spans are under-represented in the telemetry, so the entire GenAI layer is effectively invisible in dashboards — even though the code comment at lines 1920–1921 explicitly intends finish_reasons to be *"always present ... so length-truncation is always queryable in Sentry/dashboards."*

Because `finish_reasons` is sourced from `readAgentRuntimeMetrics()` which reads `agent-stdio.log` (lines 1601–1632) — a file that exists in the **agent** job's workspace — it cannot be emitted from the always-present `conclusion` job span today. The result: a DevOps engineer **cannot answer "how often do agent runs stop because they hit the model's length/token limit vs. a content filter vs. a normal end-of-turn vs. an error?"** from the OTel backends, because the span that carries that answer never lands.

<details>
<summary>Why This Matters (DevOps Perspective)</summary>

Finish/stop reason is the single highest-value GenAI observability field for an agentic-workflow platform. It distinguishes:

- `max_tokens` / `length` — the agent was truncated mid-thought (a silent quality failure that does **not** show up as a job failure)
- `content_filter` — a safety stop
- `tool_use` loops / `end_turn` — normal completion
- `error` — engine-side failure

Without it landing in a backend, length-truncation is a **silent failure**: the job is green, the conclusion span says `STATUS_CODE_OK`, but the agent produced incomplete output. Surfacing finish_reason on a reliably-exported span unblocks a Grafana/Sentry panel like *"% of agent runs truncated by token limit, by workflow/engine"* and an alert on truncation-rate spikes — directly reducing MTTR for "the agent's answers got worse" investigations that today require pulling raw logs per run.

</details>

<details>
<summary>Current Behavior</summary>

The finish reason is computed and attached only inside the `jobName === "agent"` branch, on the agent job's own spans:

```javascript
// Current: actions/setup/js/send_otlp_span.cjs (lines 1911-1923)
if (jobName === "agent") {
 attributes.push(buildAttr("gen_ai.operation.name", "chat"));
 if (workflowName) attributes.push(buildAttr("gen_ai.workflow.name", workflowName));
 if (runtimeMetrics.resolvedModel) attributes.push(buildAttr("gen_ai.response.model", runtimeMetrics.resolvedModel));
 // ... fall back to "timeout"/"unknown" so finish_reasons is "always present"
 const effectiveStopReason = runtimeMetrics.stopReason || (isAgentTimedOut ? "timeout" : "unknown");
 attributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [effectiveStopReason]));
}
```

The stop reason originates from `agent-stdio.log`, which lives in the agent job — not the conclusion job:

```javascript
// Current: actions/setup/js/send_otlp_span.cjs (lines 1623-1632)
if (parsed.type !== "result") { return; }
// ...
if (typeof parsed.stop_reason === "string" && parsed.stop_reason) {
 metrics.stopReason = parsed.stop_reason;
}
```

</details>

<details>
<summary>Proposed Change</summary>

Reuse the **existing** agent→conclusion env propagation channel (the same mechanism already used for `GH_AW_AGENT_CONCLUSION`, documented in the `action_conclusion_otlp.cjs` header) so the finish reason rides along to the conclusion job and is emitted on the `gh-aw.*.conclusion` span — which reliably reaches Tempo and Sentry today.

1. In the agent job's conclusion step, export the resolved stop reason (and model) to `$GITHUB_OUTPUT` / the job-result env that already carries `GH_AW_AGENT_CONCLUSION`.
2. In `sendJobConclusionSpan`, read that env and emit the GenAI attributes on the conclusion span regardless of `jobName`, falling back to `runtimeMetrics.stopReason` when the file is locally available:

```javascript
// Proposed addition to actions/setup/js/send_otlp_span.cjs
// Emit finish_reason on the conclusion span too, sourced from the agent job
// via env (GH_AW_AGENT_STOP_REASON) when agent-stdio.log is not local.
const propagatedStopReason = process.env.GH_AW_AGENT_STOP_REASON || "";
const effectiveStopReason =
 runtimeMetrics.stopReason || propagatedStopReason || (isAgentTimedOut ? "timeout" : "");
if (effectiveStopReason) {
 // Standard OTel GenAI attribute for dashboards...
 attributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [effectiveStopReason]));
 // ...and a flat gh-aw.* mirror so it is queryable in Sentry, which does not
 // promote OTLP resource attrs and indexes the gh-aw.* span namespace.
 attributes.push(buildAttr("gh-aw.agent.finish_reason", effectiveStopReason));
}
```

Keep the existing agent-span emission intact; this change makes the signal *additionally* present on the conclusion span that actually lands.

</details>

<details>
<summary>Expected Outcome</summary>

After this change:

- **In Grafana / Sentry / Honeycomb / Datadog**: `gen_ai.response.finish_reasons` (and `gh-aw.agent.finish_reason`) become queryable on the `gh-aw.<job>.conclusion` span — enabling a *"truncation rate by workflow/engine"* panel and a length-limit alert that work even when the agent job's own spans are missing.
- **In the JSONL mirror** (`/tmp/gh-aw/otel.jsonl`): the conclusion-span entry now carries the finish reason, so post-hoc debugging without a live collector can distinguish truncation from a clean stop.
- **For on-call engineers**: "the agent's output looks cut off" becomes a one-query answer (filter `gh-aw.agent.finish_reason = length`) instead of a per-run log dig.

</details>

<details>
<summary>Implementation Steps</summary>

- [ ] In the agent job's conclusion path, export the resolved stop reason (and model) into the env that already propagates `GH_AW_AGENT_CONCLUSION` to the conclusion job (see `action_conclusion_otlp.cjs` header docs)
- [ ] In `actions/setup/js/send_otlp_span.cjs`, read `GH_AW_AGENT_STOP_REASON` and emit `gen_ai.response.finish_reasons` + `gh-aw.agent.finish_reason` on the conclusion span (reference the snippet above), keeping the existing `jobName === "agent"` emission
- [ ] Update `send_otlp_span.test.cjs` to assert the conclusion span carries `gen_ai.response.finish_reasons` / `gh-aw.agent.finish_reason` when the env/metric is set
- [ ] Recompile any affected workflow templates if the propagation touches generated YAML (`pkg/workflow/observability_otlp.go`)
- [ ] Run `make test-unit` (or `cd actions/setup/js && npx vitest run`) to confirm tests pass
- [ ] Run `make fmt`
- [ ] Open a PR referencing this issue

</details>

<details>
<summary>Evidence from Live OTel Data (Sentry/Grafana)</summary>

Sampled 2026-05-31 ~10:03 UTC. The same trace appears in both backends: `trace_id = c1040eca6d9da732c4df4a4414a523e4`, root span `gh-aw.pre_activation.setup`, service `gh-aw.scout`.

**Grafana Tempo** (`grafanacloud-traces`) — resource attributes healthy, GenAI finish reason missing:

| Attribute | Present | Value |
|---|---|---|
| service.version | ✅ | 2.1.156 |
| github.repository | ✅ | github/gh-aw |
| github.event_name | ✅ | workflow_dispatch |
| github.run_id | ✅ | 26709587451 |
| deployment.environment | ✅ | production |
| `gen_ai.response.finish_reasons` | ❌ | **absent across the full Tempo span-tag index — no such tag is emitted at all** |

**Sentry** (org `github`, project `gh-aw`) — 19,396 spans/24h; transactions seen: `gh-aw.pre_activation.setup` (509), `gh-aw.activation.setup` (169), `gh-aw.conclusion.setup` (6), `gh-aw.pre_activation.conclusion` (1). Notably **no `gh-aw.agent.*` spans appeared in the sample**, and `gen_ai.response.finish_reasons` / `gen_ai.response.finish_reason` are null for all spans. (Sentry also strips the OTLP resource attrs — only `environment=production` and the `gh-aw.*` span namespace are queryable there, which is why the flat `gh-aw.agent.finish_reason` mirror is included in the proposed change.)

Cross-backend: the agent job's spans (which carry the GenAI layer) are under-represented/absent in live telemetry, confirming the gated finish_reason never lands.

</details>

<details>
<summary>Related Files</summary>

- `actions/setup/js/send_otlp_span.cjs` (lines 1601–1632 metrics source; 1911–1923 gated emission; 2117–2151 agent sub-span)
- `actions/setup/js/action_conclusion_otlp.cjs` (env propagation channel — `GH_AW_AGENT_CONCLUSION`)
- `actions/setup/js/send_otlp_span.test.cjs` (assertions)
- `pkg/workflow/observability_otlp.go` (compiled-workflow env wiring, if touched)
- `actions/setup/js/generate_observability_summary.cjs`

</details>

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26709512205) workflow*







> Generated by [📊 Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/26709512205) · opus48 3.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on Jun 7, 2026, 10:11 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] OTel improvement: surface agent finish_reason (gen_ai.response.finish_reasons) on the reliably-exported conclusion span #36101

📡 OTel Instrumentation Improvement: surface agent finish_reason on the conclusion span

Problem

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Attribute	Present	Value
service.version	✅	2.1.156
github.repository	✅	github/gh-aw
github.event_name	✅	workflow_dispatch
github.run_id	✅	26709587451
deployment.environment	✅	production
`gen_ai.response.finish_reasons`	❌	absent across the full Tempo span-tag index — no such tag is emitted at all

[otel-advisor] OTel improvement: surface agent finish_reason (gen_ai.response.finish_reasons) on the reliably-exported conclusion span #36101

Description

📡 OTel Instrumentation Improvement: surface agent finish_reason on the conclusion span

Problem

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions