π‘ OTel Instrumentation Improvement: surface agent finish_reason on the conclusion span
Analysis Date: 2026-05-31
Priority: High
Effort: Medium (2β4h)
Problem
The OpenTelemetry GenAI semantic layer β gen_ai.response.finish_reasons, gen_ai.response.model, and gen_ai.usage.* β is gated on jobName === "agent" in actions/setup/js/send_otlp_span.cjs (the if (jobName === "agent") block at lines 1911β1923, and the dedicated agent sub-span at lines 2117β2151). It is therefore attached only to the agent job's conclusion span (gh-aw.agent.conclusion) and its dedicated gh-aw.agent.agent sub-span.
Live sampling of the last 24h confirms these signals are not reaching either backend: gen_ai.response.finish_reasons appears on zero spans when checked against the complete Grafana Tempo span-tag index, and is likewise absent in Sentry. The agent job's spans are under-represented in the telemetry, so the entire GenAI layer is effectively invisible in dashboards β even though the code comment at lines 1920β1921 explicitly intends finish_reasons to be "always present ... so length-truncation is always queryable in Sentry/dashboards."
Because finish_reasons is sourced from readAgentRuntimeMetrics() which reads agent-stdio.log (lines 1601β1632) β a file that exists in the agent job's workspace β it cannot be emitted from the always-present conclusion job span today. The result: a DevOps engineer cannot answer "how often do agent runs stop because they hit the model's length/token limit vs. a content filter vs. a normal end-of-turn vs. an error?" from the OTel backends, because the span that carries that answer never lands.
Why This Matters (DevOps Perspective)
Finish/stop reason is the single highest-value GenAI observability field for an agentic-workflow platform. It distinguishes:
max_tokens / length β the agent was truncated mid-thought (a silent quality failure that does not show up as a job failure)
content_filter β a safety stop
tool_use loops / end_turn β normal completion
error β engine-side failure
Without it landing in a backend, length-truncation is a silent failure: the job is green, the conclusion span says STATUS_CODE_OK, but the agent produced incomplete output. Surfacing finish_reason on a reliably-exported span unblocks a Grafana/Sentry panel like "% of agent runs truncated by token limit, by workflow/engine" and an alert on truncation-rate spikes β directly reducing MTTR for "the agent's answers got worse" investigations that today require pulling raw logs per run.
Current Behavior
The finish reason is computed and attached only inside the jobName === "agent" branch, on the agent job's own spans:
// Current: actions/setup/js/send_otlp_span.cjs (lines 1911-1923)
if (jobName === "agent") {
attributes.push(buildAttr("gen_ai.operation.name", "chat"));
if (workflowName) attributes.push(buildAttr("gen_ai.workflow.name", workflowName));
if (runtimeMetrics.resolvedModel) attributes.push(buildAttr("gen_ai.response.model", runtimeMetrics.resolvedModel));
// ... fall back to "timeout"/"unknown" so finish_reasons is "always present"
const effectiveStopReason = runtimeMetrics.stopReason || (isAgentTimedOut ? "timeout" : "unknown");
attributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [effectiveStopReason]));
}
The stop reason originates from agent-stdio.log, which lives in the agent job β not the conclusion job:
// Current: actions/setup/js/send_otlp_span.cjs (lines 1623-1632)
if (parsed.type !== "result") { return; }
// ...
if (typeof parsed.stop_reason === "string" && parsed.stop_reason) {
metrics.stopReason = parsed.stop_reason;
}
Proposed Change
Reuse the existing agentβconclusion env propagation channel (the same mechanism already used for GH_AW_AGENT_CONCLUSION, documented in the action_conclusion_otlp.cjs header) so the finish reason rides along to the conclusion job and is emitted on the gh-aw.*.conclusion span β which reliably reaches Tempo and Sentry today.
- In the agent job's conclusion step, export the resolved stop reason (and model) to
$GITHUB_OUTPUT / the job-result env that already carries GH_AW_AGENT_CONCLUSION.
- In
sendJobConclusionSpan, read that env and emit the GenAI attributes on the conclusion span regardless of jobName, falling back to runtimeMetrics.stopReason when the file is locally available:
// Proposed addition to actions/setup/js/send_otlp_span.cjs
// Emit finish_reason on the conclusion span too, sourced from the agent job
// via env (GH_AW_AGENT_STOP_REASON) when agent-stdio.log is not local.
const propagatedStopReason = process.env.GH_AW_AGENT_STOP_REASON || "";
const effectiveStopReason =
runtimeMetrics.stopReason || propagatedStopReason || (isAgentTimedOut ? "timeout" : "");
if (effectiveStopReason) {
// Standard OTel GenAI attribute for dashboards...
attributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [effectiveStopReason]));
// ...and a flat gh-aw.* mirror so it is queryable in Sentry, which does not
// promote OTLP resource attrs and indexes the gh-aw.* span namespace.
attributes.push(buildAttr("gh-aw.agent.finish_reason", effectiveStopReason));
}
Keep the existing agent-span emission intact; this change makes the signal additionally present on the conclusion span that actually lands.
Expected Outcome
After this change:
- In Grafana / Sentry / Honeycomb / Datadog:
gen_ai.response.finish_reasons (and gh-aw.agent.finish_reason) become queryable on the gh-aw.<job>.conclusion span β enabling a "truncation rate by workflow/engine" panel and a length-limit alert that work even when the agent job's own spans are missing.
- In the JSONL mirror (
/tmp/gh-aw/otel.jsonl): the conclusion-span entry now carries the finish reason, so post-hoc debugging without a live collector can distinguish truncation from a clean stop.
- For on-call engineers: "the agent's output looks cut off" becomes a one-query answer (filter
gh-aw.agent.finish_reason = length) instead of a per-run log dig.
Implementation Steps
Evidence from Live OTel Data (Sentry/Grafana)
Sampled 2026-05-31 ~10:03 UTC. The same trace appears in both backends: trace_id = c1040eca6d9da732c4df4a4414a523e4, root span gh-aw.pre_activation.setup, service gh-aw.scout.
Grafana Tempo (grafanacloud-traces) β resource attributes healthy, GenAI finish reason missing:
| Attribute |
Present |
Value |
| service.version |
β
|
2.1.156 |
| github.repository |
β
|
github/gh-aw |
| github.event_name |
β
|
workflow_dispatch |
| github.run_id |
β
|
26709587451 |
| deployment.environment |
β
|
production |
gen_ai.response.finish_reasons |
β |
absent across the full Tempo span-tag index β no such tag is emitted at all |
Sentry (org github, project gh-aw) β 19,396 spans/24h; transactions seen: gh-aw.pre_activation.setup (509), gh-aw.activation.setup (169), gh-aw.conclusion.setup (6), gh-aw.pre_activation.conclusion (1). Notably no gh-aw.agent.* spans appeared in the sample, and gen_ai.response.finish_reasons / gen_ai.response.finish_reason are null for all spans. (Sentry also strips the OTLP resource attrs β only environment=production and the gh-aw.* span namespace are queryable there, which is why the flat gh-aw.agent.finish_reason mirror is included in the proposed change.)
Cross-backend: the agent job's spans (which carry the GenAI layer) are under-represented/absent in live telemetry, confirming the gated finish_reason never lands.
Related Files
actions/setup/js/send_otlp_span.cjs (lines 1601β1632 metrics source; 1911β1923 gated emission; 2117β2151 agent sub-span)
actions/setup/js/action_conclusion_otlp.cjs (env propagation channel β GH_AW_AGENT_CONCLUSION)
actions/setup/js/send_otlp_span.test.cjs (assertions)
pkg/workflow/observability_otlp.go (compiled-workflow env wiring, if touched)
actions/setup/js/generate_observability_summary.cjs
Generated by the Daily OTel Instrumentation Advisor workflow
Generated by π Daily OTel Instrumentation Advisor Β· opus48 3.2M Β· β·
π‘ OTel Instrumentation Improvement: surface agent finish_reason on the conclusion span
Analysis Date: 2026-05-31
Priority: High
Effort: Medium (2β4h)
Problem
The OpenTelemetry GenAI semantic layer β
gen_ai.response.finish_reasons,gen_ai.response.model, andgen_ai.usage.*β is gated onjobName === "agent"inactions/setup/js/send_otlp_span.cjs(theif (jobName === "agent")block at lines 1911β1923, and the dedicated agent sub-span at lines 2117β2151). It is therefore attached only to theagentjob's conclusion span (gh-aw.agent.conclusion) and its dedicatedgh-aw.agent.agentsub-span.Live sampling of the last 24h confirms these signals are not reaching either backend:
gen_ai.response.finish_reasonsappears on zero spans when checked against the complete Grafana Tempo span-tag index, and is likewise absent in Sentry. The agent job's spans are under-represented in the telemetry, so the entire GenAI layer is effectively invisible in dashboards β even though the code comment at lines 1920β1921 explicitly intends finish_reasons to be "always present ... so length-truncation is always queryable in Sentry/dashboards."Because
finish_reasonsis sourced fromreadAgentRuntimeMetrics()which readsagent-stdio.log(lines 1601β1632) β a file that exists in the agent job's workspace β it cannot be emitted from the always-presentconclusionjob span today. The result: a DevOps engineer cannot answer "how often do agent runs stop because they hit the model's length/token limit vs. a content filter vs. a normal end-of-turn vs. an error?" from the OTel backends, because the span that carries that answer never lands.Why This Matters (DevOps Perspective)
Finish/stop reason is the single highest-value GenAI observability field for an agentic-workflow platform. It distinguishes:
max_tokens/lengthβ the agent was truncated mid-thought (a silent quality failure that does not show up as a job failure)content_filterβ a safety stoptool_useloops /end_turnβ normal completionerrorβ engine-side failureWithout it landing in a backend, length-truncation is a silent failure: the job is green, the conclusion span says
STATUS_CODE_OK, but the agent produced incomplete output. Surfacing finish_reason on a reliably-exported span unblocks a Grafana/Sentry panel like "% of agent runs truncated by token limit, by workflow/engine" and an alert on truncation-rate spikes β directly reducing MTTR for "the agent's answers got worse" investigations that today require pulling raw logs per run.Current Behavior
The finish reason is computed and attached only inside the
jobName === "agent"branch, on the agent job's own spans:The stop reason originates from
agent-stdio.log, which lives in the agent job β not the conclusion job:Proposed Change
Reuse the existing agentβconclusion env propagation channel (the same mechanism already used for
GH_AW_AGENT_CONCLUSION, documented in theaction_conclusion_otlp.cjsheader) so the finish reason rides along to the conclusion job and is emitted on thegh-aw.*.conclusionspan β which reliably reaches Tempo and Sentry today.$GITHUB_OUTPUT/ the job-result env that already carriesGH_AW_AGENT_CONCLUSION.sendJobConclusionSpan, read that env and emit the GenAI attributes on the conclusion span regardless ofjobName, falling back toruntimeMetrics.stopReasonwhen the file is locally available:Keep the existing agent-span emission intact; this change makes the signal additionally present on the conclusion span that actually lands.
Expected Outcome
After this change:
gen_ai.response.finish_reasons(andgh-aw.agent.finish_reason) become queryable on thegh-aw.<job>.conclusionspan β enabling a "truncation rate by workflow/engine" panel and a length-limit alert that work even when the agent job's own spans are missing./tmp/gh-aw/otel.jsonl): the conclusion-span entry now carries the finish reason, so post-hoc debugging without a live collector can distinguish truncation from a clean stop.gh-aw.agent.finish_reason = length) instead of a per-run log dig.Implementation Steps
GH_AW_AGENT_CONCLUSIONto the conclusion job (seeaction_conclusion_otlp.cjsheader docs)actions/setup/js/send_otlp_span.cjs, readGH_AW_AGENT_STOP_REASONand emitgen_ai.response.finish_reasons+gh-aw.agent.finish_reasonon the conclusion span (reference the snippet above), keeping the existingjobName === "agent"emissionsend_otlp_span.test.cjsto assert the conclusion span carriesgen_ai.response.finish_reasons/gh-aw.agent.finish_reasonwhen the env/metric is setpkg/workflow/observability_otlp.go)make test-unit(orcd actions/setup/js && npx vitest run) to confirm tests passmake fmtEvidence from Live OTel Data (Sentry/Grafana)
Sampled 2026-05-31 ~10:03 UTC. The same trace appears in both backends:
trace_id = c1040eca6d9da732c4df4a4414a523e4, root spangh-aw.pre_activation.setup, servicegh-aw.scout.Grafana Tempo (
grafanacloud-traces) β resource attributes healthy, GenAI finish reason missing:gen_ai.response.finish_reasonsSentry (org
github, projectgh-aw) β 19,396 spans/24h; transactions seen:gh-aw.pre_activation.setup(509),gh-aw.activation.setup(169),gh-aw.conclusion.setup(6),gh-aw.pre_activation.conclusion(1). Notably nogh-aw.agent.*spans appeared in the sample, andgen_ai.response.finish_reasons/gen_ai.response.finish_reasonare null for all spans. (Sentry also strips the OTLP resource attrs β onlyenvironment=productionand thegh-aw.*span namespace are queryable there, which is why the flatgh-aw.agent.finish_reasonmirror is included in the proposed change.)Cross-backend: the agent job's spans (which carry the GenAI layer) are under-represented/absent in live telemetry, confirming the gated finish_reason never lands.
Related Files
actions/setup/js/send_otlp_span.cjs(lines 1601β1632 metrics source; 1911β1923 gated emission; 2117β2151 agent sub-span)actions/setup/js/action_conclusion_otlp.cjs(env propagation channel βGH_AW_AGENT_CONCLUSION)actions/setup/js/send_otlp_span.test.cjs(assertions)pkg/workflow/observability_otlp.go(compiled-workflow env wiring, if touched)actions/setup/js/generate_observability_summary.cjsGenerated by the Daily OTel Instrumentation Advisor workflow