Skip to content

[otel-advisor] OTel improvement: add deployment.environment and gh-aw.staged to conclusion spans #24702

@github-actions

Description

@github-actions

📡 OTel Instrumentation Improvement: Surface staged (dry-run) flag in OTLP spans

Analysis Date: 2026-04-05
Priority: High
Effort: Small (< 2h)

Problem

The staged flag (dry-run mode) is read from aw_info.json and shown in the GitHub Actions job summary, but it is never propagated to OTLP spans. Every other awInfo field used in the job summary (workflow_name, engine_id, model) is mirrored into span attributes by sendJobConclusionSpanstaged is the sole omission.

As a result, OTLP backends (Grafana, Honeycomb, Datadog, Sentry) cannot distinguish staged (dry-run) workflow executions from real production runs. A DevOps engineer cannot answer: "Is this alert firing because a production workflow failed, or because a dry-run was tested?"

Why This Matters (DevOps Perspective)

  • Alert fatigue: staged runs that "fail" trigger the same error-status span as production failures. On-call engineers investigate dry-run noise.
  • Polluted metrics: dashboard panels showing failure rate, token consumption, or agent conclusions mix staging and production data with no way to separate them.
  • No environment filter: deployment.environment is an officially recommended OTel resource attribute (semantic conventions). Its absence means backends cannot split dashboards by environment, block staging traffic from SLO calculations, or create environment-scoped alerts.
  • MTTR impact: without this attribute, triaging a pager requires opening the GitHub run to determine if it was a staged run — an entirely avoidable manual step.

Current Behavior

sendJobConclusionSpan in actions/setup/js/send_otlp_span.cjs reads many fields from awInfo but drops staged:

// Current: actions/setup/js/send_otlp_span.cjs (lines 543–550)
const workflowName = awInfo.workflow_name || "";
const engineId = awInfo.engine_id || "";
const model = awInfo.model || "";
const jobName = process.env.INPUT_JOB_NAME || "";
const runId = process.env.GITHUB_RUN_ID || "";
const runAttempt = awInfo.run_attempt || process.env.GITHUB_RUN_ATTEMPT || "1";
const actor = process.env.GITHUB_ACTOR || "";
const repository = process.env.GITHUB_REPOSITORY || "";
// ⚠️  awInfo.staged is never read here — gap starts here

The observability summary does expose staged correctly (confirming the data exists):

// actions/setup/js/generate_observability_summary.cjs (line 69)
staged: awInfo.staged === true,

But the span attributes block (lines 576–610) has no corresponding entry for staged, and the resourceAttributes array (lines 612–619) never sets deployment.environment.

Proposed Change

// Proposed addition to actions/setup/js/send_otlp_span.cjs
// In sendJobConclusionSpan, after reading other awInfo fields (~line 550):
const staged = awInfo.staged === true;

// Add to span attributes (after existing attribute pushes, ~line 590):
attributes.push(buildAttr("gh-aw.staged", staged));

// Add to resourceAttributes (after existing entries, ~line 618):
resourceAttributes.push(buildAttr("deployment.environment", staged ? "staging" : "production"));

Expected Outcome

After this change:

  • In Grafana / Honeycomb / Datadog: filter panels by deployment.environment = production to exclude dry-run noise. Create environment-specific SLO rules. Add gh-aw.staged as a dashboard variable for toggle.
  • In the JSONL mirror (/tmp/gh-aw/otel.jsonl): every span line will include "key":"gh-aw.staged" and "key":"deployment.environment" making local debugging trivially filterable with jq.
  • For on-call engineers: when an alert fires, the span detail in the backend immediately shows whether it was a production or staging run — no need to open the GitHub Actions UI.

Implementation Steps

  • In actions/setup/js/send_otlp_span.cjs, inside sendJobConclusionSpan, read awInfo.staged and push gh-aw.staged to attributes and deployment.environment to resourceAttributes
  • Update actions/setup/js/send_otlp_span.test.cjs to assert both gh-aw.staged (boolean) and deployment.environment appear in the conclusion span for both staged=true and staged=false cases
  • Run cd actions/setup/js && npx vitest run (or make test-unit) to confirm tests pass
  • Run make fmt to ensure formatting
  • Open a PR referencing this issue

Evidence from Live Sentry Data

A Sentry MCP tool was not available in this workflow run, so live span payloads could not be queried directly. The gap is confirmed statically:

  1. generate_observability_summary.cjs:69 reads awInfo.staged and surfaces it in the job summary.
  2. send_otlp_span.cjs:543–619 reads all other awInfo fields into span attributes but contains no reference to staged or deployment.environment (confirmed via grep across all .cjs files in actions/setup/js/).
  3. The resourceAttributes array (lines 612–619) sets github.repository, github.run_id, github.actions.run_url, and github.event_name — but not deployment.environment.

Related Files

  • actions/setup/js/send_otlp_span.cjs — primary change site (sendJobConclusionSpan)
  • actions/setup/js/send_otlp_span.test.cjs — test assertions to add
  • actions/setup/js/generate_observability_summary.cjs — reference implementation showing staged is available

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor · ● 118.1K ·

  • expires on Apr 12, 2026, 9:50 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions