Status: Draft
Replaces (in scope): docs/postgres_reintegration_assessment.md Phase 3 framing
Complements: docs/hybrid_warehouses_supabase_bigquery.md (architectural patterns) — this doc is the implementation spec
The restored Postgres adapter must not be a BigQuery adapter with translated SQL. sqlanvil ships two warehouse variants — postgres (standard) and supabase (Postgres + Supabase platform features) — both generating idiomatic SQL with native action config blocks. A user who has never touched BigQuery must never see BigQuery quirks (NOT ENFORCED PKs, OPTIONS(...) table options, PARTITION BY DATE_TRUNC(...) clauses, MERGE dialect).
The Antigravity assessment frames reintegration as: implement IDbAdapter for Postgres, branch on projectConfig.warehouse, done. That is necessary but not sufficient. It leaves three structural problems:
- SQL generation is BQ-shaped.
core/compilation_sql/was written around BigQuery's dialect (MERGE,CREATE OR REPLACE TABLE ... OPTIONS(...), partitioning DSL). Acase warehousebranch inside each generator produces brittle, half-translated SQL. - Action config blocks are BQ-shaped.
table.bigquery = { partitionBy, clusterBy, requirePartitionFilter, ... }exposes BigQuery concepts. Postgres equivalents (tablespace,fillfactor, nativePARTITION BY RANGE/LIST/HASH, btree/gin/gist/hnsw indexes) don't map. Forcing them through BQ-shaped fields is leaky. - Supabase isn't just "Postgres on a host." RLS, Realtime publications,
auth.usersintegration, pgvector, pg_cron, Supabase Wrappers — none of these are addressable via a generic Postgres adapter, and all of them are why someone picks Supabase.
PostgresDbAdapter (postgres)
│
├── implements IDbAdapter
├── uses node-postgres (`pg`, `pg-query-stream`)
├── delegates SQL generation to PostgresSqlGenerator
└── connection: standard libpq DSN / JDBC-style credential
SupabaseDbAdapter extends PostgresDbAdapter (supabase)
│
├── inherits all Postgres behavior
├── adds: RLS introspection, Realtime publication management,
│ Supabase Wrapper foreign-server discovery
├── delegates SQL generation to SupabaseSqlGenerator
│ (extends PostgresSqlGenerator)
└── connection: Supabase project credentials
(project_ref + service_role_key OR direct DB url + service_role for RLS bypass)
Both adapters live under cli/api/dbadapters/.
As-built (supersedes the original plan): execution-time SQL generation (the
create table, indexes, partitioning, upserts, materialized views, etc. emitted at run time) lives incli/api/dbadapters/<warehouse>_execution_sql.ts—postgres_execution_sql.tsandsupabase_execution_sql.ts(Supabase extends Postgres) implementing theIExecutionSqlinterface, parallel to the existingbigquery_execution_sql.ts. They are not undercore/compilation_sql/postgres/.core/compilation_sql/keeps only compile-time, warehouse-agnostic SQL (target resolution, inline-assertion SQL). This placement keeps each warehouse's execution SQL co-located with the adapter that runs it and matches the established BigQuery layout.
Methods the BigQuery adapter exposes today (executeRaw, tables, deleteTable, prepareSchema, dryRun, ...) must all be implementable against pg without semantic distortion. Where BigQuery returns BQ-specific metadata (tableType: "VIEW" | "TABLE" | "MATERIALIZED_VIEW" | "EXTERNAL"), Postgres returns its equivalent set (tableType: "TABLE" | "VIEW" | "MATERIALIZED_VIEW" | "FOREIGN_TABLE" | "PARTITIONED_TABLE"). The ITableMetadata interface should be a union over warehouse-specific extensions, not a lowest-common-denominator struct.
Add two new message types alongside the existing BigQueryOptions:
message PostgresOptions {
// Physical storage
string tablespace = 1;
uint32 fillfactor = 2;
bool unlogged = 3;
// Partitioning (native Postgres declarative partitioning)
message Partition {
enum Kind { RANGE = 0; LIST = 1; HASH = 2; }
Kind kind = 1;
repeated string columns = 2;
}
Partition partition = 4;
// Indexes
message Index {
string name = 1;
repeated string columns = 2;
enum Method { BTREE = 0; HASH = 1; GIN = 2; GIST = 3; BRIN = 4; }
Method method = 3;
string where = 4; // partial index predicate
bool unique = 5;
repeated string include = 6; // INCLUDE columns
}
repeated Index indexes = 5;
// Materialized view options
bool with_data = 6; // WITH DATA / WITH NO DATA on initial creation
string refresh_policy = 7; // "manual" | "on_dependency_change"
}
message SupabaseOptions {
// Standard Postgres options apply
PostgresOptions postgres = 1;
// Supabase platform
bool publish_to_realtime = 2; // ALTER PUBLICATION supabase_realtime
bool enable_rls = 3; // ALTER TABLE ... ENABLE ROW LEVEL SECURITY
string owner_role = 4; // typically "postgres" or "service_role"
// pgvector convenience (otherwise expressible via PostgresOptions.indexes)
message VectorConfig {
string column = 1;
uint32 dimensions = 2;
enum IndexType { IVFFLAT = 0; HNSW = 1; }
IndexType index_type = 3;
map<string, string> params = 4; // ivfflat: lists; hnsw: m, ef_construction
}
repeated VectorConfig vectors = 5;
}publish("daily_orders", {
type: "incremental",
uniqueKey: ["order_id"],
postgres: {
partition: { kind: "range", columns: ["order_date"] },
indexes: [
{ name: "ix_daily_orders_customer", columns: ["customer_id"], method: "btree" },
{ name: "ix_daily_orders_search", columns: ["description"], method: "gin" }
]
}
}).query(ctx => `SELECT ... FROM ${ctx.ref("raw_orders")} WHERE order_date >= ${ctx.incremental() ? "(SELECT MAX(order_date) FROM ${ctx.self()})" : "'2020-01-01'"}`);Compare to existing BigQuery shape:
publish("daily_orders", {
type: "incremental",
bigquery: { partitionBy: "DATE(order_date)", clusterBy: ["customer_id"], requirePartitionFilter: true }
}).query(...);Each warehouse owns its own config namespace. Compilation errors if the wrong block is used against the wrong warehouse.
| Concern | BigQuery (existing) | Postgres (this spec) |
|---|---|---|
| Create table | CREATE OR REPLACE TABLE \x.y.z` OPTIONS(...) AS SELECT ...` |
CREATE TABLE schema.tbl (...); INSERT INTO ... (atomic via BEGIN; DROP IF EXISTS; CREATE; INSERT; COMMIT;) |
| Replace table | Single-statement CREATE OR REPLACE TABLE |
Transactional drop + create + populate |
| Incremental upsert | MERGE ... USING ... WHEN MATCHED THEN UPDATE ... WHEN NOT MATCHED THEN INSERT |
INSERT ... ON CONFLICT (cols) DO UPDATE SET ... |
| View | CREATE OR REPLACE VIEW |
CREATE OR REPLACE VIEW (works in PG) |
| Materialized view | CREATE MATERIALIZED VIEW (auto-refresh) |
CREATE MATERIALIZED VIEW ... WITH [NO] DATA; + explicit REFRESH MATERIALIZED VIEW [CONCURRENTLY] |
| Partitioning | PARTITION BY DATE(col), PARTITION BY RANGE_BUCKET(col, ...) |
PARTITION BY RANGE/LIST/HASH (cols) + CREATE TABLE part PARTITION OF parent FOR VALUES ... |
| Clustering | CLUSTER BY col1, col2 (storage layout) |
No direct equivalent. Closest: CLUSTER table USING index (one-shot reorder) + appropriate btree index. Not silently translated. |
| Primary key | CREATE PRIMARY KEY ... NOT ENFORCED (informational only in BQ) |
PRIMARY KEY (...) — actually enforced |
| Assertions | SELECT ... FROM x WHERE failing_condition |
Same shape; can additionally compile to CHECK constraints when user opts in |
Rule of thumb: if BigQuery has a concept Postgres lacks (clustering, NOT ENFORCED PKs, OPTIONS(description = ...)), the Postgres generator either translates to the nearest meaningful equivalent and warns in compilation output, or refuses with a clear error pointing at postgres:-namespaced alternatives.
Add to core/actions/:
publish("orders_policy", {
type: "rlsPolicy",
table: "orders",
name: "users_see_own_orders",
command: "select", // "all" | "select" | "insert" | "update" | "delete"
roles: ["authenticated"],
using: "user_id = auth.uid()",
withCheck: "user_id = auth.uid()"
});Compiles to:
CREATE POLICY users_see_own_orders ON orders
FOR SELECT TO authenticated
USING (user_id = auth.uid())
WITH CHECK (user_id = auth.uid());Refs into tables work via the standard action graph — declaring an RLS policy creates a dependency edge to the underlying table.
publish("orders_realtime", {
type: "realtimePublication",
table: "orders",
events: ["insert", "update", "delete"]
});Compiles to ALTER PUBLICATION supabase_realtime ADD TABLE orders; (with replica identity adjustments).
Pairs with the hybrid-warehouse doc's Pattern C (BigQuery → Supabase):
publish("bq_churn_predictions", {
type: "wrapper",
wrapper: "bigquery",
server: "bq_analytics",
options: { project: "my-bq-project", dataset: "models", table: "churn_predictions" }
});Compiles to CREATE FOREIGN TABLE ... SERVER bq_analytics OPTIONS (...).
Convenience wrapper for pgvector — could also be expressed as a PostgresOptions.Index with method: HNSW, but a dedicated type makes RAG pipelines first-class.
Phases 1-2 (deps + relocation) from the Antigravity doc stand as written. The remaining work is re-scoped:
- Implement
PostgresDbAdapteragainstpgandpg-query-stream. - Map
IDbAdaptermethods to Postgres semantics. Where BigQuery returns warehouse-specific metadata, return Postgres equivalents — do not force into BQ shape. - Credential format: standard
{ host, port, database, user, password, ssl }(or DSN string). Separate from BigQuery's{ projectId, credentials, location }.
- As-built: implemented in
cli/api/dbadapters/postgres_execution_sql.ts(implementsIExecutionSql, parallel tobigquery_execution_sql.ts), not a newcore/compilation_sql/postgres/directory. See the as-built note in §2. - One code path per action type (table, view, incremental, materialized view, operation, assertion, declaration).
- Tests: unit in
cli/api/execution_sql_test.ts; integration intests/integration/postgres.spec.tsagainst live Postgres.
- Add
PostgresOptionsandSupabaseOptionstoprotos/configs.proto. - Surface in TypeScript action types alongside existing
bigquery:namespace. - Compilation errors for cross-warehouse misuse (
postgres:block againstwarehouse: "bigquery", etc.).
- Branch
cli/index.tsadapter instantiation onprojectConfig.warehouse ∈ {bigquery, postgres, supabase}. - Credential file format auto-detection.
dataform init(rename target:sqlanvil init) templates for both new warehouses.
SupabaseDbAdapter extends PostgresDbAdapter.- Supabase-specific SQL generator extensions.
- New action types:
rlsPolicy,realtimePublication,wrapper,vectorIndex. - Connection config supports both direct Postgres URL and Supabase project_ref + service_role_key.
- Extend
tools/postgres/postgres_fixture.tswith a paralleltools/supabase/fixture (Docker-compose'd Supabase stack —supabase/postgresimage + Realtime + PostgREST optional). - Test specs:
tests/integration/postgres.spec.ts(already restored), newtests/integration/supabase.spec.ts.
Revised total estimate: 7-10 engineering days. Antigravity's 1-2 day estimate covers Phase 1-2 + a minimal Phase 3a only.
| Risk | Mitigation |
|---|---|
| Postgres + Supabase generator divergence over time | Supabase generator inherits from Postgres via class extension; share a fixture suite where behaviors overlap. |
New action types (rlsPolicy, etc.) bloat the core graph |
Gate them behind the Supabase variant; standard Postgres users never see them. |
| Connection-string format proliferation | Document one canonical format per variant; provide a sqlanvil credentials check subcommand that validates and reports which warehouse it inferred. |
| BigQuery users expecting feature parity (clustering, GA partition pruning) on Postgres | Compilation warnings + docs page mapping BQ concepts → Postgres equivalents, with explicit "no equivalent" markers. |
The full dataform → sqlanvil rename happens before any public-facing artifact ships. Trademark risk from Google is the driver. Scope:
- Proto package names:
dataform.proto.*→sqlanvil.proto.*. Touches every.protofile'spackageline and every TS import of generated types. - npm packages:
@dataform/core,@dataform/cli, etc. →@sqlanvil/core,@sqlanvil/cli. Republish under new scope; old@dataform/*namespace was never Ivan's anyway. - CLI binary:
dataform→sqlanvil. Updatecli/BUILD,scripts/run, install docs. - Config files:
dataform.json→sqlanvil.json.workflow_settings.yamlkeys retained (already neutral). - Internal class names:
IDataformConfig→ISqlanvilConfig, etc. - Docs, error messages, telemetry user-agent strings.
Recommended sequencing: rename first, in a single sweep PR on the restore-postgres-adapter branch, then layer the adapter work on top. Rationale: a partial rename is worse than either state — grep ambiguity, broken imports, mixed branding. Get it done in one painful day.
sqlanvil-com/index.html should also gain a one-line legal notice acknowledging the fork's origin per Apache 2.0 license terms (Dataform OSS is Apache-2.0; attribution is required, derivative naming is not — but credit upstream cleanly).
Decision (revised — supersedes the original "nested" proposal shown below): FLAT.
workflow_settings.yaml carries only the non-secret warehouse: string plus defaults; the
connection (with secrets) lives in a gitignored .df-credentials.json (the strict
PostgresConnection shape):
# workflow_settings.yaml
warehouse: postgres # bigquery | postgres | supabase (validated; unknown values rejected)
defaultDataset: analytics # Postgres schema / BigQuery dataset
defaultAssertionDataset: analytics_assertions
sqlanvilCoreVersion: 3.0.59// .df-credentials.json (gitignored)
{ "host": "...", "port": 5432, "database": "...", "user": "...",
"password": "...", "sslMode": "require", "defaultSchema": "public" }Why flat + separate credentials wins (overturning the original nested decision):
- Secret hygiene. The nested form put
host/user/password(even env-interpolated) in the committedworkflow_settings.yaml. Keeping the connection in a gitignored.df-credentials.jsonkeeps secrets out of version control entirely. - Upstream convention. Dataform separates
workflow_settings.yaml(settings) from.df-credentials.json(connection); matching it keeps upstream merges clean. - Already shipped. The CLI parser,
init, integration tests, thepostgres_shopexample, and the agent docs all use the flat form — it's the de-facto contract. - Extensibility is unaffected. A new engine = a new
warehouse:enum value (validated inworkflowSettingsAsProjectConfig) plus a*Connectionproto. No top-level-key invention needed.
The naming-collision concern (flat warehouse: postgres vs a database literally named postgres)
is moot: warehouse is the engine kind in workflow_settings.yaml, while the database/schema live
in .df-credentials.json / defaultDataset — different files and keys.
Original proposal (superseded, kept for context) — connection nested under warehouse:. Shape:
# workflow_settings.yaml — BigQuery variant
warehouse:
kind: bigquery
project: my-bq-project
location: US
defaultDataset: analytics
# workflow_settings.yaml — Postgres variant
warehouse:
kind: postgres
host: db.example.com
port: 5432
database: analytics
user: sqlanvil_writer
password: ${PG_PASSWORD} # env interpolation
ssl: require
defaultSchema: public
# workflow_settings.yaml — Supabase variant
warehouse:
kind: supabase
projectRef: abcdefghijklmnop # from supabase dashboard
serviceRoleKey: ${SUPABASE_SERVICE_ROLE_KEY}
defaultSchema: public
# alternative: direct DB URL bypassing PostgREST
# connectionString: postgresql://postgres:${PASSWORD}@db.${PROJECT_REF}.supabase.co:5432/postgresOriginal rationale for nested (retained for context; see the FLAT decision above for why it was overturned — chiefly secret hygiene): extensibility via a kind discriminator, field grouping, and per-variant validation.
Proto representation (protos/configs.proto) — these *Connection messages define the
.df-credentials.json shape (validated via verifyObjectMatchesProto(PostgresConnection, ...)).
The WarehouseConfig oneof exists in the proto but workflow_settings.yaml uses the flat
warehouse: string, not this union:
message WarehouseConfig {
oneof connection {
BigQueryConnection bigquery = 1;
PostgresConnection postgres = 2;
SupabaseConnection supabase = 3;
}
}
message BigQueryConnection { string project = 1; string location = 2; string default_dataset = 3; }
message PostgresConnection { string host = 1; uint32 port = 2; string database = 3; string user = 4; string password = 5; string ssl_mode = 6; string default_schema = 7; }
message SupabaseConnection { string project_ref = 1; string service_role_key = 2; string default_schema = 3; string connection_string = 4; /* optional override */ }YAML parser uses the kind: tag to discriminate before unmarshalling into the appropriate oneof arm.
Acuantia (~/projects/acuantia-gcp-dataform/) and other BQ-only Dataform projects stay on Google Cloud / upstream Dataform. They are not migration targets. sqlanvil's audience is new personal/OSS projects, especially Supabase-backed ones like listanvil (in this monorepo at ../listanvil/).
This means: no need for dataform-compat translation layer, no need to accept dataform.json in addition to sqlanvil.json, no need to support @dataform/... action config blocks under the rename. Clean break.
Three sequential PRs on top of restore-postgres-adapter:
rename/dataform-to-sqlanvil— pure rename, no behavior change. Mechanical. Reviewable as a diff against upstream.adapter/postgres-first-class— Phases 1-4 of section 6. New proto messages, Postgres SQL generator,PostgresDbAdapter, CLI wiring. No Supabase code yet.adapter/supabase-variant— Phase 5-6.SupabaseDbAdapter, new action types (rlsPolicy,realtimePublication,wrapper,vectorIndex), Supabase integration fixture.
Each PR self-contained, mergeable independently. PR 1 unblocks any future public artifact; PRs 2-3 unblock listanvil-style projects using sqlanvil.