Skip to content

Fix microsoft.gen_ai.main_agent.* propagation and self-promotion issues#196

Open
hectorhdzg wants to merge 1 commit into
microsoft:mainfrom
hectorhdzg:fix/main-agent-propagation-bugs
Open

Fix microsoft.gen_ai.main_agent.* propagation and self-promotion issues#196
hectorhdzg wants to merge 1 commit into
microsoft:mainfrom
hectorhdzg:fix/main-agent-propagation-bugs

Conversation

@hectorhdzg

Copy link
Copy Markdown
Member

Fixes three issues identified during MAF Python and LangChain Python testing:

Issue 1: on_end self-promotion never fires (MAF + LC Python)

Symptom: Root invoke_agent spans never get microsoft.gen_ai.main_agent.* attributes. They have gen_ai.agent.name but self-promotion to microsoft.gen_ai.main_agent.name never happens.

Root cause: The OTel SDK's SpanProcessor.on_end() receives a ReadableSpan, not a Span. ReadableSpan does NOT have set_attribute(). The old code had:

if not hasattr(span, "set_attribute"):
    return  # ALWAYS returns, ReadableSpan never has set_attribute

This guard silently aborted self-promotion for every span.

Fix: Write directly to span._attributes (the internal BoundedAttributes dict) instead of calling set_attribute(). Mutations are visible to downstream SpanExporters since they read from the same object.

Issue 2: on_start inheritance misses late-set parent attributes (MAF Python)

Symptom: Some chat, execute_tool, and HTTP spans inside a trace whose invoke_agent ancestor has gen_ai.agent.name end up with no microsoft.gen_ai.main_agent.name.

Root cause: on_start reads parent.attributes at child-creation time. If the parent's gen_ai.agent.* attributes are set AFTER the child span is created (timing issue in some MAF SDK flows), on_start sees an empty parent and propagates nothing. No fallback existed.

Fix: Store a reference to the parent Span during on_start. During on_end, if the span still lacks microsoft.gen_ai.main_agent.* and is not an invoke_agent span, re-read from the stored parent. By on_end time the parent attributes have been set, so the fallback succeeds. References are cleaned up in on_end and shutdown().

Issue 3: Nested LangChain agent gen_ai.agent.name overwritten by parent (LC Python)

Symptom: A nested invoke_agent span for a sub-agent (e.g. "Data agent") shows the parent agent's name (e.g. "Main agent") in gen_ai.agent.name, while gen_ai.agent.id and gen_ai.agent.description correctly show the sub-agent's values.

Root cause: _resolve_agent_name() always checked self._agent_config.get("agent_name") first - the shared top-level config. For nested sub-agents, this override clobbered the sub-agent's own name from run metadata. agent_id/description were correct because extract_agent_metadata(run) via set_attributes() ran after config values, but agent_name from config won because _resolve_agent_name checked it first.

Fix: Added use_config parameter to _resolve_agent_name(). Nested agents (detected via _find_agent_ancestor()) pass use_config=False so they derive identity solely from run metadata. Config-based agent_id, agent_description, and agent_version are also skipped for nested agents.

Fix three issues with main_agent attribute handling identified during
MAF Python and LangChain Python testing:

1. on_end self-promotion never fired: ReadableSpan lacks set_attribute(),
   so the hasattr guard always bailed out. Fixed by writing directly to
   the internal _attributes (BoundedAttributes) mapping.

2. on_start inheritance missed late-set parent attributes: When a child
   span was created before the parent's gen_ai.agent.* attributes were
   set, on_start propagation silently skipped them. Fixed by storing the
   parent Span reference in on_start and re-reading from it during
   on_end as a fallback.

3. Nested LangChain agents had gen_ai.agent.name overwritten by the
   top-level agent's config: _resolve_agent_name() unconditionally read
   from the shared _agent_config, clobbering sub-agent identity. Fixed
   by adding a use_config flag and skipping config-based identity for
   nested agents (detected via _find_agent_ancestor).
Copilot AI review requested due to automatic review settings June 8, 2026 22:09
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Performance comparison

Threshold: regressions >15.0% on gating scenarios fail the build. Higher ops/s is better; positive Δ means the PR is slower.

Scenario Gating Baseline (ops/s) Candidate (ops/s) Δ % Status
azure_monitor_log yes 26,990.6 26,688.0 +1.13%
azure_monitor_span yes 160,205.1 158,957.2 +0.79%
otel_log no 32,725.7 32,941.3 -0.65%
otel_span no 34,395.0 34,924.7 -1.52%

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the GenAI main-agent attribution flow so microsoft.gen_ai.main_agent.* attributes are reliably populated across SDK spans (including timing edge-cases) and LangChain nested-agent runs (avoiding top-level config overriding nested agent identity).

Changes:

  • Fix GenAIMainAgentSpanProcessor.on_end() to self-promote by mutating the underlying span attributes mapping and add an on-end fallback propagation path for late-set parent attributes.
  • Expand real-SDK tests to cover on-end fallback propagation and root-span self-promotion behavior.
  • Adjust LangChain tracer agent-name resolution so nested agents don’t inherit identity from shared top-level _agent_config.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/microsoft/opentelemetry/_genai/main_agent/_processor.py Adds parent tracking for on-end fallback propagation and switches on-end enrichment to mutate internal attributes.
src/microsoft/opentelemetry/_genai/_langchain/_tracer.py Prevents nested agents from using top-level config overrides for agent identity fields.
tests/genai/main_agent/test_span_processor.py Updates unit tests to validate on-end enrichment via internal attributes mutation.
tests/genai/main_agent/test_sdk_propagation.py Adds/updates real-SDK regression tests for timing recovery and self-promotion scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +131 to +135
if has_internal_attrs:
span._attributes = dict(attributes)
else:
del span._attributes
return span

@JacksonWeber JacksonWeber left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants