Skip to content

[WIP] DB size tracking with less impact on database#2764

Draft
dwwoelfel wants to merge 7 commits into
mainfrom
new-pg-size
Draft

[WIP] DB size tracking with less impact on database#2764
dwwoelfel wants to merge 7 commits into
mainfrom
new-pg-size

Conversation

@dwwoelfel

@dwwoelfel dwwoelfel commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

This PR changes how we keep track of the database size.

Before, we would update the pg_size column on the triple in a database trigger. Then the aggregator would pick that up and add it to the triples_pg_size column on the attr sketch.

That caused a lot of extra write volume to the wal log because we get the full triple any time it is updated.

The new approach still uses the same database triggers, but it writes to a separate table called triples_size_updates.

The trigger will insert one row per attr that was touched in the transaction (unless the size didn't change). We keep the table as bare-minimum as we can to make the writes fast. None of our wal watchers subscribe to that table, so the impact of writing to the table is also lower.

Then there is a separate process that deletes the rows from triples_size_updates and adds their sizes to triples_size_aggregate. There is one row in the aggregate table for each attr. We use the pg_size field on the aggregate table to determine the db size the same way we're currently use triples_pg_size on attr_sketches.

Deploy Plan

There's a bootstrap helper in instant.bootstrap-triples-size.

  1. Merge PR
  2. Run the migration
  3. Wait for code to fully deploy
  4. nrepl into once instance and call (bootstrap)
  5. Flip the disable-triples-size-collection flag and trigger a collection
  6. Check that the sizes are correct
  7. Flip the use-new-db-size flag

How bootstrap works

We create a snapshot connection and get the list of triples_size_updates that have been created since we ran the migration--call them before-updates. We do pg_column_size(triple) for every triple in the db and then write a single triples_size_updates row for each attr.

Those writes represent the full database size as of the snapshot. We don't want to double-count anything, so we delete all of the before-updates.

Then we start the process that aggregates the sizes. It will collect the triples_size_updates from our rollup of the triples and everything created while we were rolling up the triples. From there, it just runs as normal.

Interesting tidbits

  1. We use alter sequence triples_size_updates_id_seq cache 128. Postgres grabs a block of 128 ids instead of just one, letting it lock the sequence fewer times. We might want to do this for the transactions seq also once we no longer rely on tx-ids increasing (the instant ISN solves this problem).

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d2cde6bb-64d2-4f1a-b47e-9d09767a4e95

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch new-pg-size

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

View Vercel preview at instant-www-js-new-pg-size-jsv.vercel.app.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
server/resources/migrations/113_triples_size_updates_table.up.sql (1)

293-307: 📐 Maintainability & Code Quality | 💤 Low value

Misleading variable alias n in delete trigger.

The delete trigger uses n as an alias for oldrows, which is confusing since n typically implies "new" rows. The insert and update triggers correctly use n for newrows. Consider renaming to o for consistency and clarity.

Suggested rename
     with by_etype as (
       -- Forward entities
-      select a.etype, n.entity_id
-        from oldrows n
+      select a.etype, o.entity_id
+        from oldrows o
         join attrs a
-          on n.attr_id = a.id
+          on o.attr_id = a.id
          and a.app_id in (app_id_setting, 'a1111111-1111-1111-1111-111111111ca7')
       union
       -- Ref entities
-      select a.reverse_etype etype, json_uuid_to_uuid(n.value) entity_id
-        from oldrows n
+      select a.reverse_etype etype, json_uuid_to_uuid(o.value) entity_id
+        from oldrows o
         join attrs a
-          on n.attr_id = a.id
+          on o.attr_id = a.id
          and a.app_id in (app_id_setting, 'a1111111-1111-1111-1111-111111111ca7')
-       where n.vae
+       where o.vae
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/resources/migrations/113_triples_size_updates_table.up.sql` around
lines 293 - 307, The delete trigger's CTE uses the alias n for oldrows which is
misleading; rename that alias to o throughout the CTE and any subsequent
references in the trigger (e.g., change "from oldrows n" to "from oldrows o" and
update "n.entity_id", "n.value", "n.vae", etc.) so the delete trigger
consistently uses o for "old" rows (matching insert/update using newrows as n)
and avoid confusion in the by_etype CTE and any downstream clauses that
reference that alias.
server/src/instant/model/triples_size_updates.clj (1)

45-62: 📐 Maintainability & Code Quality | ⚡ Quick win

Missing trace data when loop limit is reached.

When loops == max-loops, the function sends a Discord alert but doesn't record any tracing attributes. The successful path at line 59-60 records total-collected and loops, but the backlog case returns nil without this data, making it harder to diagnose issues.

Suggested fix
       (if (= loops max-loops)
-        (when (config/prod?)
-          (discord/send-error-async! (str (:instateam discord/mention-constants)
-                                          " collect triples size is backed up after " loops " iterations.")))
+        (do
+          (tracer/add-data! {:attributes {:total-collected total-collected
+                                          :loops loops
+                                          :backed-up? true}})
+          (when (config/prod?)
+            (discord/send-error-async! (str (:instateam discord/mention-constants)
+                                            " collect triples size is backed up after " loops " iterations."))))
         (let [update-count (:next.jdbc/update-count (first (collect-batch!)))]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/instant/model/triples_size_updates.clj` around lines 45 - 62, When
the loop limit is reached in collect-batches! the code sends
discord/send-error-async! but omits tracer/add-data! so traces lack context;
update the branch where (= loops max-loops) to call tracer/add-data! with the
same attributes used in the successful path (e.g. {:attributes {:total-collected
total-collected :loops loops}}) before or alongside calling
discord/send-error-async! (keep the config/prod? guard intact) so traces record
total-collected and loops when the backlog alert is triggered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/instant/bootstrap_triples_size.clj`:
- Line 99: The printed instruction is wrong: it tells users to set
disable-triples-size-collection to true to start collection but the flag
(disable-triples-size-collection) defaults to true (disabled); update the
println message in bootstrap_triples_size.clj to instruct users to set
disable-triples-size-collection to false to enable/start the collection process
and reference the flag name disable-triples-size-collection in the corrected
message so it’s unambiguous.

In `@server/src/instant/dash/admin.clj`:
- Around line 112-135: The new-db-size? branch is aggregating from
triples-size-aggregate (and similarly attr-sketches) without excluding deleted
attributes; update the queries in the conditional around flags/new-db-size? to
join attrs (or the equivalent attrs table used elsewhere) and add the same
filter used in app.clj/org.clj (e.g., attrs.deletion_marked_at IS NULL) so
deleted attributes are excluded, applying the same join+where change to both the
triples-size-aggregate and attr-sketches branches to keep billing/admin
reporting consistent.

In `@server/src/instant/jdbc/failover.clj`:
- Around line 568-571: do-failover-to-new-db currently only advances
transactions_id_seq on the replica, missing the same triples_size_updates_id_seq
increment added in do-failover-to-new-db-single-instance-only; update the
multi-instance path (do-failover-to-new-db) to also execute a SQL setval for
triples_size_updates_id_seq (using the same pattern as the single-instance
implementation) so both sequences are advanced on next-pool via sql/execute!
(match the existing transactions_id_seq call and add the SELECT setval for
triples_size_updates_id_seq).

In `@server/src/instant/reactive/aggregator.clj`:
- Line 53: The decoder schema no longer includes :pg-size while triples-copy-sql
still selects pg_size, causing a row-shape mismatch for initial-sketch-seq; fix
by making them consistent: either re-add :pg-size into the decoder field vector
where the decoder fields are defined, or remove pg_size from the SELECT
projection in triples-copy-sql so the SQL output matches the decoder, then run a
quick parse test of initial-sketch-seq to confirm alignment.

---

Nitpick comments:
In `@server/resources/migrations/113_triples_size_updates_table.up.sql`:
- Around line 293-307: The delete trigger's CTE uses the alias n for oldrows
which is misleading; rename that alias to o throughout the CTE and any
subsequent references in the trigger (e.g., change "from oldrows n" to "from
oldrows o" and update "n.entity_id", "n.value", "n.vae", etc.) so the delete
trigger consistently uses o for "old" rows (matching insert/update using newrows
as n) and avoid confusion in the by_etype CTE and any downstream clauses that
reference that alias.

In `@server/src/instant/model/triples_size_updates.clj`:
- Around line 45-62: When the loop limit is reached in collect-batches! the code
sends discord/send-error-async! but omits tracer/add-data! so traces lack
context; update the branch where (= loops max-loops) to call tracer/add-data!
with the same attributes used in the successful path (e.g. {:attributes
{:total-collected total-collected :loops loops}}) before or alongside calling
discord/send-error-async! (keep the config/prod? guard intact) so traces record
total-collected and loops when the backlog alert is triggered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b5f28e7b-abea-4c59-975c-4c8e08584ec9

📥 Commits

Reviewing files that changed from the base of the PR and between e710176 and 4accd4b.

📒 Files selected for processing (16)
  • client/www/components/dash/Billing.tsx
  • server/resources/migrations/113_triples_size_updates_table.down.sql
  • server/resources/migrations/113_triples_size_updates_table.up.sql
  • server/src/instant/bootstrap_triples_size.clj
  • server/src/instant/core.clj
  • server/src/instant/dash/admin.clj
  • server/src/instant/db/attr_sketch.clj
  • server/src/instant/flags.clj
  • server/src/instant/jdbc/failover.clj
  • server/src/instant/model/app.clj
  • server/src/instant/model/org.clj
  • server/src/instant/model/triples_size_updates.clj
  • server/src/instant/reactive/aggregator.clj
  • server/src/instant/util/test.clj
  • server/test/instant/model/triples_size_updates_test.clj
  • server/test/instant/reactive/aggregator_test.clj

Comment thread server/src/instant/bootstrap_triples_size.clj Outdated
Comment thread server/src/instant/dash/admin.clj Outdated
Comment thread server/src/instant/jdbc/failover.clj
Comment thread server/src/instant/reactive/aggregator.clj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant