-
Notifications
You must be signed in to change notification settings - Fork 329
[otel-advisor] OTel improvement: add deployment.environment and gh-aw.staged to conclusion spans #24702
Description
📡 OTel Instrumentation Improvement: Surface staged (dry-run) flag in OTLP spans
Analysis Date: 2026-04-05
Priority: High
Effort: Small (< 2h)
Problem
The staged flag (dry-run mode) is read from aw_info.json and shown in the GitHub Actions job summary, but it is never propagated to OTLP spans. Every other awInfo field used in the job summary (workflow_name, engine_id, model) is mirrored into span attributes by sendJobConclusionSpan — staged is the sole omission.
As a result, OTLP backends (Grafana, Honeycomb, Datadog, Sentry) cannot distinguish staged (dry-run) workflow executions from real production runs. A DevOps engineer cannot answer: "Is this alert firing because a production workflow failed, or because a dry-run was tested?"
Why This Matters (DevOps Perspective)
- Alert fatigue: staged runs that "fail" trigger the same error-status span as production failures. On-call engineers investigate dry-run noise.
- Polluted metrics: dashboard panels showing failure rate, token consumption, or agent conclusions mix staging and production data with no way to separate them.
- No environment filter:
deployment.environmentis an officially recommended OTel resource attribute (semantic conventions). Its absence means backends cannot split dashboards by environment, block staging traffic from SLO calculations, or create environment-scoped alerts. - MTTR impact: without this attribute, triaging a pager requires opening the GitHub run to determine if it was a staged run — an entirely avoidable manual step.
Current Behavior
sendJobConclusionSpan in actions/setup/js/send_otlp_span.cjs reads many fields from awInfo but drops staged:
// Current: actions/setup/js/send_otlp_span.cjs (lines 543–550)
const workflowName = awInfo.workflow_name || "";
const engineId = awInfo.engine_id || "";
const model = awInfo.model || "";
const jobName = process.env.INPUT_JOB_NAME || "";
const runId = process.env.GITHUB_RUN_ID || "";
const runAttempt = awInfo.run_attempt || process.env.GITHUB_RUN_ATTEMPT || "1";
const actor = process.env.GITHUB_ACTOR || "";
const repository = process.env.GITHUB_REPOSITORY || "";
// ⚠️ awInfo.staged is never read here — gap starts hereThe observability summary does expose staged correctly (confirming the data exists):
// actions/setup/js/generate_observability_summary.cjs (line 69)
staged: awInfo.staged === true,But the span attributes block (lines 576–610) has no corresponding entry for staged, and the resourceAttributes array (lines 612–619) never sets deployment.environment.
Proposed Change
// Proposed addition to actions/setup/js/send_otlp_span.cjs
// In sendJobConclusionSpan, after reading other awInfo fields (~line 550):
const staged = awInfo.staged === true;
// Add to span attributes (after existing attribute pushes, ~line 590):
attributes.push(buildAttr("gh-aw.staged", staged));
// Add to resourceAttributes (after existing entries, ~line 618):
resourceAttributes.push(buildAttr("deployment.environment", staged ? "staging" : "production"));Expected Outcome
After this change:
- In Grafana / Honeycomb / Datadog: filter panels by
deployment.environment = productionto exclude dry-run noise. Create environment-specific SLO rules. Addgh-aw.stagedas a dashboard variable for toggle. - In the JSONL mirror (
/tmp/gh-aw/otel.jsonl): every span line will include"key":"gh-aw.staged"and"key":"deployment.environment"making local debugging trivially filterable withjq. - For on-call engineers: when an alert fires, the span detail in the backend immediately shows whether it was a production or staging run — no need to open the GitHub Actions UI.
Implementation Steps
- In
actions/setup/js/send_otlp_span.cjs, insidesendJobConclusionSpan, readawInfo.stagedand pushgh-aw.stagedtoattributesanddeployment.environmenttoresourceAttributes - Update
actions/setup/js/send_otlp_span.test.cjsto assert bothgh-aw.staged(boolean) anddeployment.environmentappear in the conclusion span for both staged=true and staged=false cases - Run
cd actions/setup/js && npx vitest run(ormake test-unit) to confirm tests pass - Run
make fmtto ensure formatting - Open a PR referencing this issue
Evidence from Live Sentry Data
A Sentry MCP tool was not available in this workflow run, so live span payloads could not be queried directly. The gap is confirmed statically:
generate_observability_summary.cjs:69readsawInfo.stagedand surfaces it in the job summary.send_otlp_span.cjs:543–619reads all otherawInfofields into span attributes but contains no reference tostagedordeployment.environment(confirmed viagrepacross all.cjsfiles inactions/setup/js/).- The
resourceAttributesarray (lines 612–619) setsgithub.repository,github.run_id,github.actions.run_url, andgithub.event_name— but notdeployment.environment.
Related Files
actions/setup/js/send_otlp_span.cjs— primary change site (sendJobConclusionSpan)actions/setup/js/send_otlp_span.test.cjs— test assertions to addactions/setup/js/generate_observability_summary.cjs— reference implementation showingstagedis available
Generated by the Daily OTel Instrumentation Advisor workflow
Generated by Daily OTel Instrumentation Advisor · ● 118.1K · ◷
- expires on Apr 12, 2026, 9:50 AM UTC