Conversation
Introduces comprehensive OpenTelemetry instrumentation for distributed tracing, metrics, and logging across frontend, backend, and infrastructure. Integrates OTel Collector, Tempo, Loki, Prometheus, and Grafana for local dev.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43e90e4bf5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
infra/app/templates/backend.yaml
Outdated
| verbs: ["get"] | ||
| - apiGroups: ["batch"] | ||
| resources: ["jobs"] | ||
| verbs: ["create", "list"] |
There was a problem hiding this comment.
Grant get verb on jobs for datapuller status checks
The datapuller status endpoint calls readNamespacedJob in getDatapullerJobStatus (apps/backend/src/modules/datapuller/controller.ts), but this Role only grants create and list on batch/jobs. On clusters enforcing RBAC, every status poll will fail with 403, so the new Data tab cannot reliably show Running/Succeeded/Failed state in production. Add get to the jobs verbs for this Role.
Useful? React with 👍 / 👎.
git-rebase-todo
Outdated
| drop df5cf8460 # feat: Implement OpenTelemetry-based observability stack | ||
| pick 46aabe3c0 # datapuller button test | ||
| pick ea932f783 # added status for data puller | ||
| pick 6bef7b42f # fix: resolve circular data reference in useDatapullerJobStatus | ||
| pick 1343d049a # package updates | ||
| pick bf5aa310a # package changes | ||
| pick 27900b265 # changed to stable selectors | ||
| pick 8ca60d940 # revert: disable deploy_staff in dev workflow | ||
| pick ecb2c3d22 # staff dashboard test in dev | ||
| pick a5c2072bb # staff dashboard deployment in dev | ||
| pick 9d8a1e6d6 # don't deploy staff dashboard | ||
|
|
||
| # Rebase 57fb92c94..9d8a1e6d6 onto 57fb92c94 (11 commands) | ||
| # | ||
| # Commands: | ||
| # p, pick <commit> = use commit | ||
| # r, reword <commit> = use commit, but edit the commit message | ||
| # e, edit <commit> = use commit, but stop for amending | ||
| # s, squash <commit> = use commit, but meld into previous commit | ||
| # f, fixup [-C | -c] <commit> = like "squash" but keep only the previous | ||
| # commit's log message, unless -C is used, in which case | ||
| # keep only this commit's message; -c is same as -C but | ||
| # opens the editor | ||
| # x, exec <command> = run command (the rest of the line) using shell | ||
| # b, break = stop here (continue rebase later with 'git rebase --continue') | ||
| # d, drop <commit> = remove commit | ||
| # l, label <label> = label current HEAD with a name | ||
| # t, reset <label> = reset HEAD to a label | ||
| # m, merge [-C <commit> | -c <commit>] <label> [# <oneline>] | ||
| # create a merge commit using the original merge commit's | ||
| # message (or the oneline, if no original merge commit was | ||
| # specified); use -c <commit> to reword the commit message | ||
| # u, update-ref <ref> = track a placeholder for the <ref> to be updated | ||
| # to this position in the new commits. The <ref> is | ||
| # updated at the end of the rebase | ||
| # | ||
| # These lines can be re-ordered; they are executed from top to bottom. | ||
| # | ||
| # If you remove a line here THAT COMMIT WILL BE LOST. | ||
| # | ||
| # However, if you remove everything, the rebase will be aborted. | ||
| # |
|
Looks fine, but @sashakmurray should do final approval. @MatthewZhu13 can you also put a screenshot of what the UI looks like? |
…/berkeleytime into matthewz-observability
|
|
this is fire |
|
but why have it in the staff dashboard vs github action where observability and debugging is better? @sashakmurray @MatthewZhu13 |
Added a button on the staff frontend under a new tab (Data) that can manually trigger the datapuller. Tricky to test in dev so hopefully can test in prod since isn't user facing.
Changes include: