Skip to content

fix: otel warning during tail sampling buffer replay#1632

Merged
alexmojaki merged 11 commits intopydantic:mainfrom
cyberksh:tail-sampling
Apr 9, 2026
Merged

fix: otel warning during tail sampling buffer replay#1632
alexmojaki merged 11 commits intopydantic:mainfrom
cyberksh:tail-sampling

Conversation

@cyberksh
Copy link
Copy Markdown
Contributor

When TailSamplingProcessor.push_buffer() replays buffered spans, it was calling on_start for spans that had already ended. This caused wrapped processors like DirectBaggageAttributesSpanProcessor to call span.set_attributes() on ended spans, triggering OpenTelemetry warnings.

This fix tracks which spans have already ended and skips the on_start call for them, since setting attributes on ended spans has no effect anyway and the spans are still properly exported via on_end.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cyberksh cyberksh marked this pull request as draft January 16, 2026 21:32
@cyberksh
Copy link
Copy Markdown
Contributor Author

Reproducer:

"""
Reproducer for logfire tail sampling "Setting attribute on ended span" warning.

The warning occurs when:
1. Tail sampling buffers spans (not immediately sampled)
2. A later event triggers sampling (error, duration threshold)
3. push_buffer() calls on_start for already-ended spans

Run this script and make requests to trigger the warning.
"""

import asyncio
import logging
import time

# Configure logging BEFORE logfire to see warnings
logging.basicConfig(
    level=logging.DEBUG,
    format="%(levelname)s | %(name)s:%(funcName)s:%(lineno)d | %(message)s",
)

import logfire
from fastapi import FastAPI
import uvicorn


def tail_sampling_function(span_info: logfire.sampling.TailSamplingSpanInfo) -> float:
    """
    Tail sampling that triggers on errors or long duration.
    Most traces are buffered and discarded (rate=0.0).
    Only traces with errors or duration > 0.5s are sampled.
    """
    # Only sample if there's an error or duration > 0.5 seconds
    if span_info.level >= "error":
        return 1.0
    if span_info.duration > 0.5:
        return 1.0
    # Don't sample normal requests - they stay buffered then get discarded
    return 0.0


# Configure logfire with tail sampling
logfire.configure(
    service_name="reproducer",
    send_to_logfire=False,
    console=logfire.ConsoleOptions(),  # Enable console output to see traces
    sampling=logfire.SamplingOptions(tail=tail_sampling_function),
)

app = FastAPI()
logfire.instrument_fastapi(app)


@app.get("/fast")
async def fast_endpoint():
    """Fast endpoint - won't trigger sampling, no warning."""
    return {"status": "fast"}


@app.get("/slow")
async def slow_endpoint():
    """
    Slow endpoint - triggers sampling after child spans have ended.
    This WILL produce the warning because:
    1. Child spans are created and end quickly
    2. They get buffered
    3. After 0.6s, duration threshold is exceeded
    4. push_buffer() calls on_start for already-ended child spans
    """
    # Create some child spans that end before the sampling decision
    with logfire.span("child_span_1"):
        await asyncio.sleep(0.05)

    with logfire.span("child_span_2"):
        await asyncio.sleep(0.05)

    with logfire.span("child_span_3"):
        await asyncio.sleep(0.05)

    # Now wait to exceed duration threshold
    # This triggers sampling AFTER child spans have ended
    await asyncio.sleep(0.5)

    return {"status": "slow", "message": "Check logs for 'Setting attribute on ended span' warning"}


@app.get("/error")
async def error_endpoint():
    """
    Error endpoint - triggers sampling when error is logged.
    Similar to slow, but triggers on error level instead of duration.
    """
    with logfire.span("setup_span"):
        await asyncio.sleep(0.05)

    with logfire.span("processing_span"):
        await asyncio.sleep(0.05)

    # Log an error - this triggers sampling after child spans ended
    logfire.error("Something went wrong!")

    return {"status": "error logged"}


@app.get("/nested")
async def nested_endpoint():
    """
    Deeply nested spans - more child spans = more warnings.
    """

    async def do_work(depth: int):
        if depth <= 0:
            return
        with logfire.span(f"nested_level_{depth}"):
            await asyncio.sleep(0.02)
            await do_work(depth - 1)

    await do_work(5)

    # Trigger sampling after all nested spans ended
    await asyncio.sleep(0.5)

    return {"status": "nested complete"}


if __name__ == "__main__":
    print("""
Reproducer for "Setting attribute on ended span" warning.

Endpoints:
  GET /fast   - Fast request, no sampling, no warning
  GET /slow   - Slow request (0.6s), triggers sampling, PRODUCES WARNING
  GET /error  - Logs error, triggers sampling, PRODUCES WARNING
  GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS

To reproduce:
  curl http://localhost:8000/slow

Watch the console for:
  WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
""")
    uvicorn.run(app, host="0.0.0.0", port=8000, log_level="debug")
python reproducer.py

Reproducer for "Setting attribute on ended span" warning.

Endpoints:
  GET /fast   - Fast request, no sampling, no warning
  GET /slow   - Slow request (0.6s), triggers sampling, PRODUCES WARNING
  GET /error  - Logs error, triggers sampling, PRODUCES WARNING
  GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS

To reproduce:
  curl http://localhost:8000/slow

Watch the console for:
  WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.

INFO:     Started server process [3139581]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:42244 - "GET /slow HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.329 GET /error
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.436   Something went wrong!
INFO:     127.0.0.1:42254 - "GET /error HTTP/1.1" 200 OK
INFO:     127.0.0.1:38208 - "GET /nested HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.

@cyberksh cyberksh marked this pull request as ready for review January 16, 2026 21:55
@alexmojaki
Copy link
Copy Markdown
Collaborator

Thanks for pointing this out. What we actually need is to usually call on_start even when Tail Sampling is active. Looking at this, I see that delaying that was a mistake. The big thing that we don't want to do in Tail Sampling is call the on_start method of the pending span processor because that doesn't just set attributes, it creates a span and then calls on_end with it. But I imagine many cases of on_start set attributes, the baggage span processor is an important one, and that's not a big deal to do even if Tail Sampling leaves out the spans.

@cyberksh
Copy link
Copy Markdown
Contributor Author

Do you want to restructure TailSamplingProcessor to call on_start on wrapped processors immediately, only buffer on_end?

@alexmojaki
Copy link
Copy Markdown
Collaborator

Yes, with an exception for the pending span processor, and ideally a way for advanced users to mark other processors as similar exceptions.

@cyberksh
Copy link
Copy Markdown
Contributor Author

Appreciate any input/suggestions on the current implementation.

Comment thread tests/test_tail_sampling.py
Redesign TailSamplingProcessor so that on_start is called immediately
on the main (wrapped) processor, and only on_end is buffered until the
sampling decision. This avoids "Setting attribute on ended span" warnings
from processors like DirectBaggageAttributesSpanProcessor that set
attributes in on_start.

A new deferred_processor parameter allows processors like
PendingSpanProcessor (which create spans in on_start) to have both
on_start and on_end deferred. Advanced users can mark their own
processors for deferral via tail_sampling_defer_on_start = True.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@cyberksh cyberksh requested a review from alexmojaki April 3, 2026 18:59
Comment thread logfire/sampling/_tail_sampling.py Outdated
Comment thread logfire/sampling/_tail_sampling.py Outdated
cyberksh added 3 commits April 8, 2026 13:23
  optimize on_start

- Replace `first_span` cached_property with a plain field set at trace creation
- Only buffer started spans when a deferred_processor is present
- Wrap deferred_processor shutdown/force_flush in suppress_instrumentation
devin-ai-integration[bot]

This comment was marked as resolved.

@cyberksh cyberksh requested a review from alexmojaki April 8, 2026 12:14
Comment thread logfire/sampling/_tail_sampling.py
Comment thread tests/test_tail_sampling.py Outdated
cyberksh added 2 commits April 9, 2026 15:16
…pling

Replace SimpleSpanProcessor(TestExporter) with a custom CollectorProcessor
(no span_exporter attribute) so has_pending=False and deferred_processors
stays empty, covering the if deferred_processors: False branch.
@cyberksh cyberksh requested a review from alexmojaki April 9, 2026 13:29
@alexmojaki alexmojaki enabled auto-merge (squash) April 9, 2026 14:02
@alexmojaki
Copy link
Copy Markdown
Collaborator

Thanks!

@alexmojaki alexmojaki merged commit 955c32a into pydantic:main Apr 9, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants