Skip to content

[otel-advisor] OTel improvement: surface agent finish_reason (gen_ai.response.finish_reasons) on the reliably-exported conclusion spanΒ #36101

@github-actions

Description

@github-actions

πŸ“‘ OTel Instrumentation Improvement: surface agent finish_reason on the conclusion span

Analysis Date: 2026-05-31
Priority: High
Effort: Medium (2–4h)

Problem

The OpenTelemetry GenAI semantic layer β€” gen_ai.response.finish_reasons, gen_ai.response.model, and gen_ai.usage.* β€” is gated on jobName === "agent" in actions/setup/js/send_otlp_span.cjs (the if (jobName === "agent") block at lines 1911–1923, and the dedicated agent sub-span at lines 2117–2151). It is therefore attached only to the agent job's conclusion span (gh-aw.agent.conclusion) and its dedicated gh-aw.agent.agent sub-span.

Live sampling of the last 24h confirms these signals are not reaching either backend: gen_ai.response.finish_reasons appears on zero spans when checked against the complete Grafana Tempo span-tag index, and is likewise absent in Sentry. The agent job's spans are under-represented in the telemetry, so the entire GenAI layer is effectively invisible in dashboards β€” even though the code comment at lines 1920–1921 explicitly intends finish_reasons to be "always present ... so length-truncation is always queryable in Sentry/dashboards."

Because finish_reasons is sourced from readAgentRuntimeMetrics() which reads agent-stdio.log (lines 1601–1632) β€” a file that exists in the agent job's workspace β€” it cannot be emitted from the always-present conclusion job span today. The result: a DevOps engineer cannot answer "how often do agent runs stop because they hit the model's length/token limit vs. a content filter vs. a normal end-of-turn vs. an error?" from the OTel backends, because the span that carries that answer never lands.

Why This Matters (DevOps Perspective)

Finish/stop reason is the single highest-value GenAI observability field for an agentic-workflow platform. It distinguishes:

  • max_tokens / length β€” the agent was truncated mid-thought (a silent quality failure that does not show up as a job failure)
  • content_filter β€” a safety stop
  • tool_use loops / end_turn β€” normal completion
  • error β€” engine-side failure

Without it landing in a backend, length-truncation is a silent failure: the job is green, the conclusion span says STATUS_CODE_OK, but the agent produced incomplete output. Surfacing finish_reason on a reliably-exported span unblocks a Grafana/Sentry panel like "% of agent runs truncated by token limit, by workflow/engine" and an alert on truncation-rate spikes β€” directly reducing MTTR for "the agent's answers got worse" investigations that today require pulling raw logs per run.

Current Behavior

The finish reason is computed and attached only inside the jobName === "agent" branch, on the agent job's own spans:

// Current: actions/setup/js/send_otlp_span.cjs (lines 1911-1923)
if (jobName === "agent") {
  attributes.push(buildAttr("gen_ai.operation.name", "chat"));
  if (workflowName) attributes.push(buildAttr("gen_ai.workflow.name", workflowName));
  if (runtimeMetrics.resolvedModel) attributes.push(buildAttr("gen_ai.response.model", runtimeMetrics.resolvedModel));
  // ... fall back to "timeout"/"unknown" so finish_reasons is "always present"
  const effectiveStopReason = runtimeMetrics.stopReason || (isAgentTimedOut ? "timeout" : "unknown");
  attributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [effectiveStopReason]));
}

The stop reason originates from agent-stdio.log, which lives in the agent job β€” not the conclusion job:

// Current: actions/setup/js/send_otlp_span.cjs (lines 1623-1632)
if (parsed.type !== "result") { return; }
// ...
if (typeof parsed.stop_reason === "string" && parsed.stop_reason) {
  metrics.stopReason = parsed.stop_reason;
}
Proposed Change

Reuse the existing agent→conclusion env propagation channel (the same mechanism already used for GH_AW_AGENT_CONCLUSION, documented in the action_conclusion_otlp.cjs header) so the finish reason rides along to the conclusion job and is emitted on the gh-aw.*.conclusion span — which reliably reaches Tempo and Sentry today.

  1. In the agent job's conclusion step, export the resolved stop reason (and model) to $GITHUB_OUTPUT / the job-result env that already carries GH_AW_AGENT_CONCLUSION.
  2. In sendJobConclusionSpan, read that env and emit the GenAI attributes on the conclusion span regardless of jobName, falling back to runtimeMetrics.stopReason when the file is locally available:
// Proposed addition to actions/setup/js/send_otlp_span.cjs
// Emit finish_reason on the conclusion span too, sourced from the agent job
// via env (GH_AW_AGENT_STOP_REASON) when agent-stdio.log is not local.
const propagatedStopReason = process.env.GH_AW_AGENT_STOP_REASON || "";
const effectiveStopReason =
  runtimeMetrics.stopReason || propagatedStopReason || (isAgentTimedOut ? "timeout" : "");
if (effectiveStopReason) {
  // Standard OTel GenAI attribute for dashboards...
  attributes.push(buildArrayAttr("gen_ai.response.finish_reasons", [effectiveStopReason]));
  // ...and a flat gh-aw.* mirror so it is queryable in Sentry, which does not
  // promote OTLP resource attrs and indexes the gh-aw.* span namespace.
  attributes.push(buildAttr("gh-aw.agent.finish_reason", effectiveStopReason));
}

Keep the existing agent-span emission intact; this change makes the signal additionally present on the conclusion span that actually lands.

Expected Outcome

After this change:

  • In Grafana / Sentry / Honeycomb / Datadog: gen_ai.response.finish_reasons (and gh-aw.agent.finish_reason) become queryable on the gh-aw.<job>.conclusion span β€” enabling a "truncation rate by workflow/engine" panel and a length-limit alert that work even when the agent job's own spans are missing.
  • In the JSONL mirror (/tmp/gh-aw/otel.jsonl): the conclusion-span entry now carries the finish reason, so post-hoc debugging without a live collector can distinguish truncation from a clean stop.
  • For on-call engineers: "the agent's output looks cut off" becomes a one-query answer (filter gh-aw.agent.finish_reason = length) instead of a per-run log dig.
Implementation Steps
  • In the agent job's conclusion path, export the resolved stop reason (and model) into the env that already propagates GH_AW_AGENT_CONCLUSION to the conclusion job (see action_conclusion_otlp.cjs header docs)
  • In actions/setup/js/send_otlp_span.cjs, read GH_AW_AGENT_STOP_REASON and emit gen_ai.response.finish_reasons + gh-aw.agent.finish_reason on the conclusion span (reference the snippet above), keeping the existing jobName === "agent" emission
  • Update send_otlp_span.test.cjs to assert the conclusion span carries gen_ai.response.finish_reasons / gh-aw.agent.finish_reason when the env/metric is set
  • Recompile any affected workflow templates if the propagation touches generated YAML (pkg/workflow/observability_otlp.go)
  • Run make test-unit (or cd actions/setup/js && npx vitest run) to confirm tests pass
  • Run make fmt
  • Open a PR referencing this issue
Evidence from Live OTel Data (Sentry/Grafana)

Sampled 2026-05-31 ~10:03 UTC. The same trace appears in both backends: trace_id = c1040eca6d9da732c4df4a4414a523e4, root span gh-aw.pre_activation.setup, service gh-aw.scout.

Grafana Tempo (grafanacloud-traces) β€” resource attributes healthy, GenAI finish reason missing:

Attribute Present Value
service.version βœ… 2.1.156
github.repository βœ… github/gh-aw
github.event_name βœ… workflow_dispatch
github.run_id βœ… 26709587451
deployment.environment βœ… production
gen_ai.response.finish_reasons ❌ absent across the full Tempo span-tag index β€” no such tag is emitted at all

Sentry (org github, project gh-aw) β€” 19,396 spans/24h; transactions seen: gh-aw.pre_activation.setup (509), gh-aw.activation.setup (169), gh-aw.conclusion.setup (6), gh-aw.pre_activation.conclusion (1). Notably no gh-aw.agent.* spans appeared in the sample, and gen_ai.response.finish_reasons / gen_ai.response.finish_reason are null for all spans. (Sentry also strips the OTLP resource attrs β€” only environment=production and the gh-aw.* span namespace are queryable there, which is why the flat gh-aw.agent.finish_reason mirror is included in the proposed change.)

Cross-backend: the agent job's spans (which carry the GenAI layer) are under-represented/absent in live telemetry, confirming the gated finish_reason never lands.

Related Files
  • actions/setup/js/send_otlp_span.cjs (lines 1601–1632 metrics source; 1911–1923 gated emission; 2117–2151 agent sub-span)
  • actions/setup/js/action_conclusion_otlp.cjs (env propagation channel β€” GH_AW_AGENT_CONCLUSION)
  • actions/setup/js/send_otlp_span.test.cjs (assertions)
  • pkg/workflow/observability_otlp.go (compiled-workflow env wiring, if touched)
  • actions/setup/js/generate_observability_summary.cjs

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by πŸ“Š Daily OTel Instrumentation Advisor Β· opus48 3.2M Β· β—·

  • expires on Jun 7, 2026, 10:11 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions