Skip to content

Replace Cloudwatch with Athena on the debug page#2762

Merged
dwwoelfel merged 3 commits into
mainfrom
athena-debug
Jun 11, 2026
Merged

Replace Cloudwatch with Athena on the debug page#2762
dwwoelfel merged 3 commits into
mainfrom
athena-debug

Conversation

@dwwoelfel

@dwwoelfel dwwoelfel commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Replaces the link to a cloudwatch search for the traceid with a link to athena.

Athena doesn't let you include a query in the url, so I put the query in a select box beneath and also copy it to your clipboard when you click the link.

I also improved the querying-logs.md file with some improvements after going on a debugging session with claude.

This is what it looks likes when you're logged in as an admin:

Screenshot 2026-06-11 at 1 27 06 PM

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 707dc4f7-02e2-4946-b445-819442b5190a

📥 Commits

Reviewing files that changed from the base of the PR and between 86c7a4a and 813c5c7.

📒 Files selected for processing (1)
  • server/querying-logs.md

📝 Walkthrough

Walkthrough

This PR adds Athena-based trace querying to replace CloudWatch URIs. The backend introduces Athena infrastructure with time-partitioned SQL generation, the debug route wires Athena queries into responses, the frontend adds a clipboard-friendly QueryBlock component, and documentation is updated with DuckDB/S3 querying guidance including parallelism tuning and simplified query patterns.

Changes

Athena Trace Querying

Layer / File(s) Summary
Athena query builder with time-window partitioning
server/src/instant/util/tracer.clj
Imports OffsetDateTime and ZoneOffset for time calculations. Adds athena-log-table mapping environments to Athena table names. Introduces athena-window-clause to generate SQL partition filters covering ±10-minute windows around a trace timestamp. Adds athena-query to build Athena query URLs with parameterized trace-id filters constrained by the generated partition window, replacing the removed cloudwatch-uri helper.
Debug route wiring for Athena queries
server/src/instant/dash/routes.clj
Admin debug endpoint now calls tracer/athena-query to include an "Search trace in Athena" link in the returned urls array.
Frontend QueryBlock UI and AdminInfo expansion
client/www/pages/debug-uri/[trace-id]/[span-id].tsx
New QueryBlock component renders a monospace query display with a copy-to-clipboard button that toggles a success icon. Expands AdminInfo to accept traceId and spanId, render per-URL optional query fields via QueryBlock, and add an agent-prompt block. Page passes router trace/span IDs to AdminInfo. Updated imports to remove unused UI/router items and add heroicon and UI components for copy/prompt UI.
DuckDB and query documentation updates
server/querying-logs.md
DuckDB setup now includes SET threads = 16 to control S3 fetch parallelism, with explanation. Bash one-liner includes the same threads setting. Major rewrite of S3 path guidance: removes brace-expansion patterns in favor of explicit per-minute paths, adds trace_id timestamp decoding for minute-window selection with hour/day rollover handling, reiterates union_by_name = true, adds workflow to save parquet subsets locally via COPY ... TO ... (FORMAT PARQUET), and updates query examples to use SELECT * instead of fixed column lists. Updated "Find every event for a trace" and "Errors" queries to use SELECT *, and replaced specific excluded column names with a placeholder some_noisy_column.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: replacing CloudWatch with Athena on the debug page, which is the primary objective of this pull request.
Description check ✅ Passed The description is directly related to the changeset, explaining the CloudWatch-to-Athena migration, the query display mechanism, and documentation improvements mentioned in the changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch athena-debug

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/querying-logs.md (1)

73-96: ⚠️ Potential issue | 🟡 Minor

Fix Athena reserved-word quoting in server/querying-logs.md

In the SQL example (lines 73-96), timestamp / year-minute are left unquoted, but the text advises backticks for reserved words. In Athena, reserved identifiers in SELECT/queries must be escaped with double quotes ("timestamp", "year", etc.); backticks are for DDL. Also, LIMIT/OFFSET/timeout in your example are query keywords (not column names), so the reserved-word backtick guidance should only apply if those are actual column identifiers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/querying-logs.md` around lines 73 - 96, Update the SQL example to use
Athena-compatible identifier quoting: replace backtick guidance with double
quotes for reserved identifiers (e.g., use "timestamp", "year", "month", "day",
"hour", "minute" in the SELECT and WHERE), and clarify that SQL keywords like
LIMIT, OFFSET, and timeout are not column identifiers and should not be quoted;
adjust the paragraph text to state double quotes are required for reserved
identifiers in Athena and that quoting guidance only applies to actual column
names such as timestamp/year-minute.
🧹 Nitpick comments (1)
client/www/pages/debug-uri/[trace-id]/[span-id].tsx (1)

20-26: 💤 Low value

Consider logging clipboard errors for debugging.

The .catch(() => {}) silently swallows clipboard write errors. While this prevents user-facing errors when clipboard access is denied, logging the error could help diagnose issues in development or production monitoring.

Optional logging enhancement
         navigator.clipboard
           .writeText(query)
           .then(() => {
             setCopied(true);
             setTimeout(() => setCopied(false), 1500);
           })
-          .catch(() => {});
+          .catch((err) => {
+            console.warn('Failed to copy to clipboard:', err);
+          });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@client/www/pages/debug-uri/`[trace-id]/[span-id].tsx around lines 20 - 26,
The clipboard write promise currently swallows errors in
navigator.clipboard.writeText(query).catch(() => {}); update the catch to accept
the error and log it (e.g., console.error or your app logger) with context
(include the query and identifiers if available) so failures are visible for
debugging, while still preventing user-facing errors; leave the existing
setCopied(true) and timeout behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/querying-logs.md`:
- Around line 153-170: Update the minute-window example around the read_parquet
snippet to explicitly handle hour/day rollovers: explain that when you list “5
before to 5 after” you must compute minute timestamps across hour and midnight
boundaries (referencing the "Trace ID → partition" decoding step) and include
paths for the previous/next hour/day as needed (e.g., generate explicit s3 paths
for minutes that fall in hour-1, hour, and hour+1 or across date change), and
add a short helper note or pseudocode to produce those explicit paths so readers
don’t miss files at HH:00 or midnight when using read_parquet and filtering by
trace_id.

---

Outside diff comments:
In `@server/querying-logs.md`:
- Around line 73-96: Update the SQL example to use Athena-compatible identifier
quoting: replace backtick guidance with double quotes for reserved identifiers
(e.g., use "timestamp", "year", "month", "day", "hour", "minute" in the SELECT
and WHERE), and clarify that SQL keywords like LIMIT, OFFSET, and timeout are
not column identifiers and should not be quoted; adjust the paragraph text to
state double quotes are required for reserved identifiers in Athena and that
quoting guidance only applies to actual column names such as
timestamp/year-minute.

---

Nitpick comments:
In `@client/www/pages/debug-uri/`[trace-id]/[span-id].tsx:
- Around line 20-26: The clipboard write promise currently swallows errors in
navigator.clipboard.writeText(query).catch(() => {}); update the catch to accept
the error and log it (e.g., console.error or your app logger) with context
(include the query and identifiers if available) so failures are visible for
debugging, while still preventing user-facing errors; leave the existing
setCopied(true) and timeout behavior unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: faa09b2e-3507-4471-a1c6-b8dbd5cd72b1

📥 Commits

Reviewing files that changed from the base of the PR and between 7b81389 and 86c7a4a.

📒 Files selected for processing (4)
  • client/www/pages/debug-uri/[trace-id]/[span-id].tsx
  • server/querying-logs.md
  • server/src/instant/dash/routes.clj
  • server/src/instant/util/tracer.clj

Comment thread server/querying-logs.md
@github-actions

Copy link
Copy Markdown
Contributor

View Vercel preview at instant-www-js-athena-debug-jsv.vercel.app.

@dwwoelfel dwwoelfel marked this pull request as ready for review June 11, 2026 20:31

@stopachka stopachka left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dwwoelfel dwwoelfel merged commit e710176 into main Jun 11, 2026
34 checks passed
@dwwoelfel dwwoelfel deleted the athena-debug branch June 11, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants