fix: otel warning during tail sampling buffer replay by cyberksh · Pull Request #1632 · pydantic/logfire

cyberksh · 2026-01-16T21:26:18Z

When TailSamplingProcessor.push_buffer() replays buffered spans, it was calling on_start for spans that had already ended. This caused wrapped processors like DirectBaggageAttributesSpanProcessor to call span.set_attributes() on ended spans, triggering OpenTelemetry warnings.

This fix tracks which spans have already ended and skips the on_start call for them, since setting attributes on ended spans has no effect anyway and the spans are still properly exported via on_end.

codecov · 2026-01-16T21:29:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

cyberksh · 2026-01-16T21:44:31Z

Reproducer:

"""
Reproducer for logfire tail sampling "Setting attribute on ended span" warning.

The warning occurs when:
1. Tail sampling buffers spans (not immediately sampled)
2. A later event triggers sampling (error, duration threshold)
3. push_buffer() calls on_start for already-ended spans

Run this script and make requests to trigger the warning.
"""

import asyncio
import logging
import time

# Configure logging BEFORE logfire to see warnings
logging.basicConfig(
    level=logging.DEBUG,
    format="%(levelname)s | %(name)s:%(funcName)s:%(lineno)d | %(message)s",
)

import logfire
from fastapi import FastAPI
import uvicorn


def tail_sampling_function(span_info: logfire.sampling.TailSamplingSpanInfo) -> float:
    """
    Tail sampling that triggers on errors or long duration.
    Most traces are buffered and discarded (rate=0.0).
    Only traces with errors or duration > 0.5s are sampled.
    """
    # Only sample if there's an error or duration > 0.5 seconds
    if span_info.level >= "error":
        return 1.0
    if span_info.duration > 0.5:
        return 1.0
    # Don't sample normal requests - they stay buffered then get discarded
    return 0.0


# Configure logfire with tail sampling
logfire.configure(
    service_name="reproducer",
    send_to_logfire=False,
    console=logfire.ConsoleOptions(),  # Enable console output to see traces
    sampling=logfire.SamplingOptions(tail=tail_sampling_function),
)

app = FastAPI()
logfire.instrument_fastapi(app)


@app.get("/fast")
async def fast_endpoint():
    """Fast endpoint - won't trigger sampling, no warning."""
    return {"status": "fast"}


@app.get("/slow")
async def slow_endpoint():
    """
    Slow endpoint - triggers sampling after child spans have ended.
    This WILL produce the warning because:
    1. Child spans are created and end quickly
    2. They get buffered
    3. After 0.6s, duration threshold is exceeded
    4. push_buffer() calls on_start for already-ended child spans
    """
    # Create some child spans that end before the sampling decision
    with logfire.span("child_span_1"):
        await asyncio.sleep(0.05)

    with logfire.span("child_span_2"):
        await asyncio.sleep(0.05)

    with logfire.span("child_span_3"):
        await asyncio.sleep(0.05)

    # Now wait to exceed duration threshold
    # This triggers sampling AFTER child spans have ended
    await asyncio.sleep(0.5)

    return {"status": "slow", "message": "Check logs for 'Setting attribute on ended span' warning"}


@app.get("/error")
async def error_endpoint():
    """
    Error endpoint - triggers sampling when error is logged.
    Similar to slow, but triggers on error level instead of duration.
    """
    with logfire.span("setup_span"):
        await asyncio.sleep(0.05)

    with logfire.span("processing_span"):
        await asyncio.sleep(0.05)

    # Log an error - this triggers sampling after child spans ended
    logfire.error("Something went wrong!")

    return {"status": "error logged"}


@app.get("/nested")
async def nested_endpoint():
    """
    Deeply nested spans - more child spans = more warnings.
    """

    async def do_work(depth: int):
        if depth <= 0:
            return
        with logfire.span(f"nested_level_{depth}"):
            await asyncio.sleep(0.02)
            await do_work(depth - 1)

    await do_work(5)

    # Trigger sampling after all nested spans ended
    await asyncio.sleep(0.5)

    return {"status": "nested complete"}


if __name__ == "__main__":
    print("""
Reproducer for "Setting attribute on ended span" warning.

Endpoints:
  GET /fast   - Fast request, no sampling, no warning
  GET /slow   - Slow request (0.6s), triggers sampling, PRODUCES WARNING
  GET /error  - Logs error, triggers sampling, PRODUCES WARNING
  GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS

To reproduce:
  curl http://localhost:8000/slow

Watch the console for:
  WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
""")
    uvicorn.run(app, host="0.0.0.0", port=8000, log_level="debug")

python reproducer.py

Reproducer for "Setting attribute on ended span" warning.

Endpoints:
  GET /fast   - Fast request, no sampling, no warning
  GET /slow   - Slow request (0.6s), triggers sampling, PRODUCES WARNING
  GET /error  - Logs error, triggers sampling, PRODUCES WARNING
  GET /nested - Nested spans + slow, PRODUCES MULTIPLE WARNINGS

To reproduce:
  curl http://localhost:8000/slow

Watch the console for:
  WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.

INFO:     Started server process [3139581]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:42244 - "GET /slow HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.329 GET /error
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
03:12:53.436   Something went wrong!
INFO:     127.0.0.1:42254 - "GET /error HTTP/1.1" 200 OK
INFO:     127.0.0.1:38208 - "GET /nested HTTP/1.1" 200 OK
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
WARNING | opentelemetry.sdk.trace:set_attributes:850 | Setting attribute on ended span.
^CINFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.

alexmojaki · 2026-01-19T10:50:53Z

Thanks for pointing this out. What we actually need is to usually call on_start even when Tail Sampling is active. Looking at this, I see that delaying that was a mistake. The big thing that we don't want to do in Tail Sampling is call the on_start method of the pending span processor because that doesn't just set attributes, it creates a span and then calls on_end with it. But I imagine many cases of on_start set attributes, the baggage span processor is an important one, and that's not a big deal to do even if Tail Sampling leaves out the spans.

cyberksh · 2026-01-19T18:36:27Z

Do you want to restructure TailSamplingProcessor to call on_start on wrapped processors immediately, only buffer on_end?

alexmojaki · 2026-01-19T19:04:54Z

Yes, with an exception for the pending span processor, and ideally a way for advanced users to mark other processors as similar exceptions.

cyberksh · 2026-01-19T20:09:39Z

Appreciate any input/suggestions on the current implementation.

Redesign TailSamplingProcessor so that on_start is called immediately on the main (wrapped) processor, and only on_end is buffered until the sampling decision. This avoids "Setting attribute on ended span" warnings from processors like DirectBaggageAttributesSpanProcessor that set attributes in on_start. A new deferred_processor parameter allows processors like PendingSpanProcessor (which create spans in on_start) to have both on_start and on_end deferred. Advanced users can mark their own processors for deferral via tail_sampling_defer_on_start = True.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

optimize on_start - Replace `first_span` cached_property with a plain field set at trace creation - Only buffer started spans when a deferred_processor is present - Wrap deferred_processor shutdown/force_flush in suppress_instrumentation

…pling Replace SimpleSpanProcessor(TestExporter) with a custom CollectorProcessor (no span_exporter attribute) so has_pending=False and deferred_processors stays empty, covering the if deferred_processors: False branch.

alexmojaki · 2026-04-09T14:02:44Z

Thanks!

cyberksh marked this pull request as draft January 16, 2026 21:32

cyberksh marked this pull request as ready for review January 16, 2026 21:55

alexmojaki requested changes Jan 20, 2026

View reviewed changes

Comment thread tests/test_tail_sampling.py

cyberksh force-pushed the tail-sampling branch from 4d6d4a6 to 9cc7349 Compare April 3, 2026 18:50

devin-ai-integration bot reviewed Apr 3, 2026

View reviewed changes

cyberksh requested a review from alexmojaki April 3, 2026 18:59

cyberksh added 3 commits April 4, 2026 00:40

fix: add tests for full coverage of tail sampling branches

15c6a56

fix: cover force_flush with deferred_processor present

b57bf27

fix: use public API for force_flush in test

e1d2b70

alexmojaki reviewed Apr 7, 2026

View reviewed changes

Comment thread logfire/sampling/_tail_sampling.py Outdated

alexmojaki reviewed Apr 7, 2026

View reviewed changes

Comment thread logfire/sampling/_tail_sampling.py Outdated

cyberksh added 3 commits April 8, 2026 13:23

chore: revert formatting changes

caeca80

refactor: import suppress_instrumentation directly from internal utils

b750011

This comment was marked as resolved.

Sign in to view

cyberksh requested a review from alexmojaki April 8, 2026 12:14

alexmojaki reviewed Apr 8, 2026

View reviewed changes

Comment thread logfire/sampling/_tail_sampling.py

alexmojaki reviewed Apr 8, 2026

View reviewed changes

Comment thread tests/test_tail_sampling.py Outdated

cyberksh added 2 commits April 9, 2026 15:16

fix: address PR review comments and polish code quality

3196b51

cyberksh requested a review from alexmojaki April 9, 2026 13:29

Merge branch 'main' into tail-sampling

6904726

alexmojaki enabled auto-merge (squash) April 9, 2026 14:02

Merge branch 'main' into tail-sampling

112dfc9

alexmojaki merged commit 955c32a into pydantic:main Apr 9, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: otel warning during tail sampling buffer replay#1632

fix: otel warning during tail sampling buffer replay#1632
alexmojaki merged 11 commits intopydantic:mainfrom
cyberksh:tail-sampling

cyberksh commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

cyberksh commented Jan 16, 2026

Uh oh!

alexmojaki commented Jan 19, 2026

Uh oh!

cyberksh commented Jan 19, 2026

Uh oh!

alexmojaki commented Jan 19, 2026

Uh oh!

cyberksh commented Jan 19, 2026

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

alexmojaki commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cyberksh commented Jan 16, 2026

Uh oh!

codecov bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cyberksh commented Jan 16, 2026

Uh oh!

alexmojaki commented Jan 19, 2026

Uh oh!

cyberksh commented Jan 19, 2026

Uh oh!

alexmojaki commented Jan 19, 2026

Uh oh!

cyberksh commented Jan 19, 2026

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

alexmojaki commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 16, 2026 •

edited

Loading