[WIP] DB size tracking with less impact on database by dwwoelfel · Pull Request #2764 · instantdb/instant

dwwoelfel · 2026-06-12T22:45:23Z

This PR changes how we keep track of the database size.

Before, we would update the pg_size column on the triple in a database trigger. Then the aggregator would pick that up and add it to the triples_pg_size column on the attr sketch.

That caused a lot of extra write volume to the wal log because we get the full triple any time it is updated.

The new approach still uses the same database triggers, but it writes to a separate table called triples_size_updates.

The trigger will insert one row per attr that was touched in the transaction (unless the size didn't change). We keep the table as bare-minimum as we can to make the writes fast. None of our wal watchers subscribe to that table, so the impact of writing to the table is also lower.

Then there is a separate process that deletes the rows from triples_size_updates and adds their sizes to triples_size_aggregate. There is one row in the aggregate table for each attr. We use the pg_size field on the aggregate table to determine the db size the same way we're currently use triples_pg_size on attr_sketches.

Deploy Plan

There's a bootstrap helper in instant.bootstrap-triples-size.

Merge PR
Run the migration
Wait for code to fully deploy
nrepl into once instance and call (bootstrap)
Flip the disable-triples-size-collection flag and trigger a collection
Check that the sizes are correct
Flip the use-new-db-size flag

How bootstrap works

We create a snapshot connection and get the list of triples_size_updates that have been created since we ran the migration--call them before-updates. We do pg_column_size(triple) for every triple in the db and then write a single triples_size_updates row for each attr.

Those writes represent the full database size as of the snapshot. We don't want to double-count anything, so we delete all of the before-updates.

Then we start the process that aggregates the sizes. It will collect the triples_size_updates from our rollup of the triples and everything created while we were rolling up the triples. From there, it just runs as normal.

Interesting tidbits

We use alter sequence triples_size_updates_id_seq cache 128. Postgres grabs a block of 128 ids instead of just one, letting it lock the sequence fewer times. We might want to do this for the transactions seq also once we no longer rely on tx-ids increasing (the instant ISN solves this problem).

coderabbitai · 2026-06-12T22:45:34Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d2cde6bb-64d2-4f1a-b47e-9d09767a4e95

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch new-pg-size

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-12T22:50:52Z

View Vercel preview at instant-www-js-new-pg-size-jsv.vercel.app.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

server/resources/migrations/113_triples_size_updates_table.up.sql (1)

293-307: 📐 Maintainability & Code Quality | 💤 Low value

Misleading variable alias n in delete trigger.

The delete trigger uses n as an alias for oldrows, which is confusing since n typically implies "new" rows. The insert and update triggers correctly use n for newrows. Consider renaming to o for consistency and clarity.

Suggested rename

     with by_etype as (
       -- Forward entities
-      select a.etype, n.entity_id
-        from oldrows n
+      select a.etype, o.entity_id
+        from oldrows o
         join attrs a
-          on n.attr_id = a.id
+          on o.attr_id = a.id
          and a.app_id in (app_id_setting, 'a1111111-1111-1111-1111-111111111ca7')
       union
       -- Ref entities
-      select a.reverse_etype etype, json_uuid_to_uuid(n.value) entity_id
-        from oldrows n
+      select a.reverse_etype etype, json_uuid_to_uuid(o.value) entity_id
+        from oldrows o
         join attrs a
-          on n.attr_id = a.id
+          on o.attr_id = a.id
          and a.app_id in (app_id_setting, 'a1111111-1111-1111-1111-111111111ca7')
-       where n.vae
+       where o.vae

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/resources/migrations/113_triples_size_updates_table.up.sql` around
lines 293 - 307, The delete trigger's CTE uses the alias n for oldrows which is
misleading; rename that alias to o throughout the CTE and any subsequent
references in the trigger (e.g., change "from oldrows n" to "from oldrows o" and
update "n.entity_id", "n.value", "n.vae", etc.) so the delete trigger
consistently uses o for "old" rows (matching insert/update using newrows as n)
and avoid confusion in the by_etype CTE and any downstream clauses that
reference that alias.

server/src/instant/model/triples_size_updates.clj (1)

45-62: 📐 Maintainability & Code Quality | ⚡ Quick win

Missing trace data when loop limit is reached.

When loops == max-loops, the function sends a Discord alert but doesn't record any tracing attributes. The successful path at line 59-60 records total-collected and loops, but the backlog case returns nil without this data, making it harder to diagnose issues.

Suggested fix

       (if (= loops max-loops)
-        (when (config/prod?)
-          (discord/send-error-async! (str (:instateam discord/mention-constants)
-                                          " collect triples size is backed up after " loops " iterations.")))
+        (do
+          (tracer/add-data! {:attributes {:total-collected total-collected
+                                          :loops loops
+                                          :backed-up? true}})
+          (when (config/prod?)
+            (discord/send-error-async! (str (:instateam discord/mention-constants)
+                                            " collect triples size is backed up after " loops " iterations."))))
         (let [update-count (:next.jdbc/update-count (first (collect-batch!)))]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/instant/model/triples_size_updates.clj` around lines 45 - 62, When
the loop limit is reached in collect-batches! the code sends
discord/send-error-async! but omits tracer/add-data! so traces lack context;
update the branch where (= loops max-loops) to call tracer/add-data! with the
same attributes used in the successful path (e.g. {:attributes {:total-collected
total-collected :loops loops}}) before or alongside calling
discord/send-error-async! (keep the config/prod? guard intact) so traces record
total-collected and loops when the backlog alert is triggered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/src/instant/bootstrap_triples_size.clj`:
- Line 99: The printed instruction is wrong: it tells users to set
disable-triples-size-collection to true to start collection but the flag
(disable-triples-size-collection) defaults to true (disabled); update the
println message in bootstrap_triples_size.clj to instruct users to set
disable-triples-size-collection to false to enable/start the collection process
and reference the flag name disable-triples-size-collection in the corrected
message so it’s unambiguous.

In `@server/src/instant/dash/admin.clj`:
- Around line 112-135: The new-db-size? branch is aggregating from
triples-size-aggregate (and similarly attr-sketches) without excluding deleted
attributes; update the queries in the conditional around flags/new-db-size? to
join attrs (or the equivalent attrs table used elsewhere) and add the same
filter used in app.clj/org.clj (e.g., attrs.deletion_marked_at IS NULL) so
deleted attributes are excluded, applying the same join+where change to both the
triples-size-aggregate and attr-sketches branches to keep billing/admin
reporting consistent.

In `@server/src/instant/jdbc/failover.clj`:
- Around line 568-571: do-failover-to-new-db currently only advances
transactions_id_seq on the replica, missing the same triples_size_updates_id_seq
increment added in do-failover-to-new-db-single-instance-only; update the
multi-instance path (do-failover-to-new-db) to also execute a SQL setval for
triples_size_updates_id_seq (using the same pattern as the single-instance
implementation) so both sequences are advanced on next-pool via sql/execute!
(match the existing transactions_id_seq call and add the SELECT setval for
triples_size_updates_id_seq).

In `@server/src/instant/reactive/aggregator.clj`:
- Line 53: The decoder schema no longer includes :pg-size while triples-copy-sql
still selects pg_size, causing a row-shape mismatch for initial-sketch-seq; fix
by making them consistent: either re-add :pg-size into the decoder field vector
where the decoder fields are defined, or remove pg_size from the SELECT
projection in triples-copy-sql so the SQL output matches the decoder, then run a
quick parse test of initial-sketch-seq to confirm alignment.

---

Nitpick comments:
In `@server/resources/migrations/113_triples_size_updates_table.up.sql`:
- Around line 293-307: The delete trigger's CTE uses the alias n for oldrows
which is misleading; rename that alias to o throughout the CTE and any
subsequent references in the trigger (e.g., change "from oldrows n" to "from
oldrows o" and update "n.entity_id", "n.value", "n.vae", etc.) so the delete
trigger consistently uses o for "old" rows (matching insert/update using newrows
as n) and avoid confusion in the by_etype CTE and any downstream clauses that
reference that alias.

In `@server/src/instant/model/triples_size_updates.clj`:
- Around line 45-62: When the loop limit is reached in collect-batches! the code
sends discord/send-error-async! but omits tracer/add-data! so traces lack
context; update the branch where (= loops max-loops) to call tracer/add-data!
with the same attributes used in the successful path (e.g. {:attributes
{:total-collected total-collected :loops loops}}) before or alongside calling
discord/send-error-async! (keep the config/prod? guard intact) so traces record
total-collected and loops when the backlog alert is triggered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b5f28e7b-abea-4c59-975c-4c8e08584ec9

📥 Commits

Reviewing files that changed from the base of the PR and between e710176 and 4accd4b.

📒 Files selected for processing (16)

client/www/components/dash/Billing.tsx
server/resources/migrations/113_triples_size_updates_table.down.sql
server/resources/migrations/113_triples_size_updates_table.up.sql
server/src/instant/bootstrap_triples_size.clj
server/src/instant/core.clj
server/src/instant/dash/admin.clj
server/src/instant/db/attr_sketch.clj
server/src/instant/flags.clj
server/src/instant/jdbc/failover.clj
server/src/instant/model/app.clj
server/src/instant/model/org.clj
server/src/instant/model/triples_size_updates.clj
server/src/instant/reactive/aggregator.clj
server/src/instant/util/test.clj
server/test/instant/model/triples_size_updates_test.clj
server/test/instant/reactive/aggregator_test.clj

make pg_size update have less impact on the server

56a8a2f

dwwoelfel deployed to Preview June 12, 2026 22:45 — with GitHub Actions View deployment

fix lint

3a01a88

dwwoelfel added 4 commits June 12, 2026 15:53

don't test for pg size in the aggregator

933ba2a

fix with-sketches helper

559f492

add some tests

f854cc3

just use pg_column_size

4accd4b

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread server/src/instant/bootstrap_triples_size.clj Outdated

Comment thread server/src/instant/dash/admin.clj Outdated

Comment thread server/src/instant/jdbc/failover.clj

Comment thread server/src/instant/reactive/aggregator.clj

robot feedback

4e68223

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] DB size tracking with less impact on database#2764

[WIP] DB size tracking with less impact on database#2764
dwwoelfel wants to merge 7 commits into
mainfrom
new-pg-size

dwwoelfel commented Jun 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dwwoelfel commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy Plan

How bootstrap works

Interesting tidbits

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dwwoelfel commented Jun 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading