[WIP] DB size tracking with less impact on database#2764
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
View Vercel preview at instant-www-js-new-pg-size-jsv.vercel.app. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
server/resources/migrations/113_triples_size_updates_table.up.sql (1)
293-307: 📐 Maintainability & Code Quality | 💤 Low valueMisleading variable alias
nin delete trigger.The delete trigger uses
nas an alias foroldrows, which is confusing sincentypically implies "new" rows. The insert and update triggers correctly usenfornewrows. Consider renaming toofor consistency and clarity.Suggested rename
with by_etype as ( -- Forward entities - select a.etype, n.entity_id - from oldrows n + select a.etype, o.entity_id + from oldrows o join attrs a - on n.attr_id = a.id + on o.attr_id = a.id and a.app_id in (app_id_setting, 'a1111111-1111-1111-1111-111111111ca7') union -- Ref entities - select a.reverse_etype etype, json_uuid_to_uuid(n.value) entity_id - from oldrows n + select a.reverse_etype etype, json_uuid_to_uuid(o.value) entity_id + from oldrows o join attrs a - on n.attr_id = a.id + on o.attr_id = a.id and a.app_id in (app_id_setting, 'a1111111-1111-1111-1111-111111111ca7') - where n.vae + where o.vae🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/resources/migrations/113_triples_size_updates_table.up.sql` around lines 293 - 307, The delete trigger's CTE uses the alias n for oldrows which is misleading; rename that alias to o throughout the CTE and any subsequent references in the trigger (e.g., change "from oldrows n" to "from oldrows o" and update "n.entity_id", "n.value", "n.vae", etc.) so the delete trigger consistently uses o for "old" rows (matching insert/update using newrows as n) and avoid confusion in the by_etype CTE and any downstream clauses that reference that alias.server/src/instant/model/triples_size_updates.clj (1)
45-62: 📐 Maintainability & Code Quality | ⚡ Quick winMissing trace data when loop limit is reached.
When
loops == max-loops, the function sends a Discord alert but doesn't record any tracing attributes. The successful path at line 59-60 recordstotal-collectedandloops, but the backlog case returnsnilwithout this data, making it harder to diagnose issues.Suggested fix
(if (= loops max-loops) - (when (config/prod?) - (discord/send-error-async! (str (:instateam discord/mention-constants) - " collect triples size is backed up after " loops " iterations."))) + (do + (tracer/add-data! {:attributes {:total-collected total-collected + :loops loops + :backed-up? true}}) + (when (config/prod?) + (discord/send-error-async! (str (:instateam discord/mention-constants) + " collect triples size is backed up after " loops " iterations.")))) (let [update-count (:next.jdbc/update-count (first (collect-batch!)))]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/instant/model/triples_size_updates.clj` around lines 45 - 62, When the loop limit is reached in collect-batches! the code sends discord/send-error-async! but omits tracer/add-data! so traces lack context; update the branch where (= loops max-loops) to call tracer/add-data! with the same attributes used in the successful path (e.g. {:attributes {:total-collected total-collected :loops loops}}) before or alongside calling discord/send-error-async! (keep the config/prod? guard intact) so traces record total-collected and loops when the backlog alert is triggered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/src/instant/bootstrap_triples_size.clj`:
- Line 99: The printed instruction is wrong: it tells users to set
disable-triples-size-collection to true to start collection but the flag
(disable-triples-size-collection) defaults to true (disabled); update the
println message in bootstrap_triples_size.clj to instruct users to set
disable-triples-size-collection to false to enable/start the collection process
and reference the flag name disable-triples-size-collection in the corrected
message so it’s unambiguous.
In `@server/src/instant/dash/admin.clj`:
- Around line 112-135: The new-db-size? branch is aggregating from
triples-size-aggregate (and similarly attr-sketches) without excluding deleted
attributes; update the queries in the conditional around flags/new-db-size? to
join attrs (or the equivalent attrs table used elsewhere) and add the same
filter used in app.clj/org.clj (e.g., attrs.deletion_marked_at IS NULL) so
deleted attributes are excluded, applying the same join+where change to both the
triples-size-aggregate and attr-sketches branches to keep billing/admin
reporting consistent.
In `@server/src/instant/jdbc/failover.clj`:
- Around line 568-571: do-failover-to-new-db currently only advances
transactions_id_seq on the replica, missing the same triples_size_updates_id_seq
increment added in do-failover-to-new-db-single-instance-only; update the
multi-instance path (do-failover-to-new-db) to also execute a SQL setval for
triples_size_updates_id_seq (using the same pattern as the single-instance
implementation) so both sequences are advanced on next-pool via sql/execute!
(match the existing transactions_id_seq call and add the SELECT setval for
triples_size_updates_id_seq).
In `@server/src/instant/reactive/aggregator.clj`:
- Line 53: The decoder schema no longer includes :pg-size while triples-copy-sql
still selects pg_size, causing a row-shape mismatch for initial-sketch-seq; fix
by making them consistent: either re-add :pg-size into the decoder field vector
where the decoder fields are defined, or remove pg_size from the SELECT
projection in triples-copy-sql so the SQL output matches the decoder, then run a
quick parse test of initial-sketch-seq to confirm alignment.
---
Nitpick comments:
In `@server/resources/migrations/113_triples_size_updates_table.up.sql`:
- Around line 293-307: The delete trigger's CTE uses the alias n for oldrows
which is misleading; rename that alias to o throughout the CTE and any
subsequent references in the trigger (e.g., change "from oldrows n" to "from
oldrows o" and update "n.entity_id", "n.value", "n.vae", etc.) so the delete
trigger consistently uses o for "old" rows (matching insert/update using newrows
as n) and avoid confusion in the by_etype CTE and any downstream clauses that
reference that alias.
In `@server/src/instant/model/triples_size_updates.clj`:
- Around line 45-62: When the loop limit is reached in collect-batches! the code
sends discord/send-error-async! but omits tracer/add-data! so traces lack
context; update the branch where (= loops max-loops) to call tracer/add-data!
with the same attributes used in the successful path (e.g. {:attributes
{:total-collected total-collected :loops loops}}) before or alongside calling
discord/send-error-async! (keep the config/prod? guard intact) so traces record
total-collected and loops when the backlog alert is triggered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b5f28e7b-abea-4c59-975c-4c8e08584ec9
📒 Files selected for processing (16)
client/www/components/dash/Billing.tsxserver/resources/migrations/113_triples_size_updates_table.down.sqlserver/resources/migrations/113_triples_size_updates_table.up.sqlserver/src/instant/bootstrap_triples_size.cljserver/src/instant/core.cljserver/src/instant/dash/admin.cljserver/src/instant/db/attr_sketch.cljserver/src/instant/flags.cljserver/src/instant/jdbc/failover.cljserver/src/instant/model/app.cljserver/src/instant/model/org.cljserver/src/instant/model/triples_size_updates.cljserver/src/instant/reactive/aggregator.cljserver/src/instant/util/test.cljserver/test/instant/model/triples_size_updates_test.cljserver/test/instant/reactive/aggregator_test.clj
This PR changes how we keep track of the database size.
Before, we would update the
pg_sizecolumn on the triple in a database trigger. Then the aggregator would pick that up and add it to thetriples_pg_sizecolumn on the attr sketch.That caused a lot of extra write volume to the wal log because we get the full triple any time it is updated.
The new approach still uses the same database triggers, but it writes to a separate table called
triples_size_updates.The trigger will insert one row per attr that was touched in the transaction (unless the size didn't change). We keep the table as bare-minimum as we can to make the writes fast. None of our wal watchers subscribe to that table, so the impact of writing to the table is also lower.
Then there is a separate process that deletes the rows from
triples_size_updatesand adds their sizes totriples_size_aggregate. There is one row in the aggregate table for each attr. We use thepg_sizefield on the aggregate table to determine the db size the same way we're currently usetriples_pg_sizeonattr_sketches.Deploy Plan
There's a
bootstraphelper ininstant.bootstrap-triples-size.nreplinto once instance and call(bootstrap)disable-triples-size-collectionflag and trigger a collectionuse-new-db-sizeflagHow bootstrap works
We create a snapshot connection and get the list of
triples_size_updatesthat have been created since we ran the migration--call thembefore-updates. We dopg_column_size(triple)for every triple in the db and then write a singletriples_size_updatesrow for each attr.Those writes represent the full database size as of the snapshot. We don't want to double-count anything, so we delete all of the
before-updates.Then we start the process that aggregates the sizes. It will collect the
triples_size_updatesfrom our rollup of the triples and everything created while we were rolling up the triples. From there, it just runs as normal.Interesting tidbits
alter sequence triples_size_updates_id_seq cache 128. Postgres grabs a block of 128 ids instead of just one, letting it lock the sequence fewer times. We might want to do this for the transactions seq also once we no longer rely on tx-ids increasing (the instant ISN solves this problem).