-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Filter bot traffic from Sentry spans using tracesSampler #16213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a tracesSampler function to drop spans from HeadlessChrome, bots, crawlers, and other automated traffic. This prevents bot-induced span throughput anomalies while maintaining 100% sampling for real users. Fixes DOCS-A4C Co-Authored-By: Claude <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Hoist bot patterns to module scope to avoid recreation on each trace - Use single regex test instead of array iteration with includes() - Add monitoring tool patterns: lighthouse, pagespeed, gtmetrix, pingdom, uptimerobot Co-Authored-By: Claude <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Add allowlist for AI agents (ClaudeBot, GPTBot, Cursor, Codex, Copilot, etc.) to ensure we have tracing data for agentic tools consuming our markdown docs. These are checked before the bot filter so they won't be dropped by the generic 'bot' pattern. Co-Authored-By: Claude <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
The generic 'bot' pattern incorrectly matched Cubot phone user agents (e.g., "CUBOT GT99"), dropping traces for legitimate mobile users. Replace with explicit bot names: googlebot, bingbot, slackbot, etc. Co-Authored-By: Claude <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
4fc2e5c to
b6c90b3
Compare
sergical
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna unblock this, lgtm, but would be really cool to add Sentry.metrics.count to get insights into what we block
) Add Sentry metrics to the tracesSampler to provide visibility into traffic patterns hitting the docs site. Each sampling decision now emits a `docs.trace.sampled` metric with attributes: - `traffic_type`: ai_agent, bot, user, or unknown - `agent_match` / `bot_match`: the specific pattern that matched (e.g., "claudebot", "googlebot") - `sample_rate`: the sampling rate applied (0, 0.3, or 1) This builds on #16213 which added the tracesSampler for bot filtering. With these metrics, we can now query in Sentry to understand volume breakdown by traffic type, which bots hit the site most, and which AI agents are consuming docs content. --------- Co-authored-by: Claude <noreply@anthropic.com>
Add tracesSampler with bot filtering and 30% user sampling
Filter out crawlers/bots while allowlisting AI agents at 100% for
docs consumption visibility. Real users sampled at 30% for high-traffic site.
Fixes DOCS-A4C
DESCRIBE YOUR PR
Tell us what you're changing and why. If your PR resolves an issue, please link it so it closes automatically.
IS YOUR CHANGE URGENT?
Help us prioritize incoming PRs by letting us know when the change needs to go live.
SLA
Thanks in advance for your help!
PRE-MERGE CHECKLIST
Make sure you've checked the following before merging your changes:
LEGAL BOILERPLATE
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
EXTRA RESOURCES