Post-V1 (BigQuery + Postgres/Supabase first-class) feature backlog. Captured as a checklist; each item has notes so it can be picked up later without re-deriving context.
- MySQL / MariaDB (committed — do first).
- Research the next tier, splitting operational (OLTP) vs warehouse (OLAP/analytical):
- Operational: SQLite (great for local/dev + tests), SQL Server (T-SQL), Oracle, CockroachDB (Postgres wire-compatible — may be cheap), DuckDB (embedded analytical, also a file-query engine — overlaps item 2).
- Warehouse: Snowflake, Amazon Redshift, ClickHouse, Databricks / Spark SQL, Trino/Presto, Amazon Athena, Microsoft Fabric.
- What each adapter costs (the pattern is established by
postgres/supabase): acli/api/dbadapters/<db>.ts+<db>_execution_sql.ts, a<Db>Connectionproto +credentials.tsvalidation, a<Db>Optionsaction config block inprotos/configs.proto, dialect-specific DDL/DML in the execution SQL generator, a CLI branch onwarehouse, and a docker integration fixture +tests/integration/<db>.spec.ts. - Decision to make: how much warehouse-agnostic SQL generation can be shared vs. per-dialect.
MySQL/MariaDB differ from Postgres on: identifier quoting (backticks vs
"), upsert (INSERT ... ON DUPLICATE KEY UPDATEvsON CONFLICT), no materialized views (emulate?), noCREATE INDEX ... INCLUDE, different partitioning syntax, no transactional DDL.
A project/run stays single-warehouse — multiple live warehouse connections in one project is explicitly not wanted (adapters stay isolated). Everything internally happens on one DB; this item is about the edges: getting data in (sources) and pushing results out (sinks), whether the other end is a file / object store or another database. The two transports share a type-translation layer.
- Model a file as a first-class source (a
declaration/source variant or a new action type that resolves to a queryable relation) and the reverse for export (sink). - Why it's uneven: BigQuery has it built in — external/federated tables,
LOAD DATA, wildcardgs://…URIs for read, andEXPORT DATA OPTIONS(uri = 'gs://…', format = 'JSON'|'CSV'|'PARQUET') AS SELECT …for write — all in a plain operation (working example: the acuantia project'sdefinitions/operations/gemstone_ai/op_export_gemstone_product_jsonl.sqlx). Postgres has no nativegs:///file I/O out of the box — options:file_fdw(server-side CSV),COPY … FROM/TO(incl.… PROGRAMto stream to/from object storage),parquet_fdw/pg_parquet, a Supabase Wrapper/FDW, or DuckDB (httpfs) as an embedded bridge for Parquet/CSV/JSON on S3/GCS/Azure. - V2 goal: a warehouse-agnostic
{ format, location }surface (gs:///s3:///local URIs) each adapter implements its own way — specifically a Postgres/Supabase equivalent of BigQuery's native object-storage read and write.
- An import/export action (or
declaration/sink variant) that reads from / writes to a secondary database — BigQuery ↔ Postgres ↔ Supabase. - Today (no dedicated feature): warehouse-native federation — BigQuery federated queries reading
Postgres/Supabase, or Supabase Wrappers / Postgres FDW reading BigQuery (see
hybrid_warehouses_supabase_bigquery.md, Patterns A/C) — plus sequential separate runs for multi-platform outputs (Pattern B). Works, but manual and warehouse-specific. Named connections (shipped in core 1.2.0) already cover the read direction for FDW-reachable sources; this item is the first-class, type-aware generalization (incl. the write/export direction). - Secondary-platform credentials live alongside the primary
.df-credentials.json(import/export only — not a second execution engine).
- Type translation (both transports): BigQuery
STRUCT/ARRAY↔ Postgresjsonb/arrays,NUMERIC/BIGNUMERIC↔numeric,TIMESTAMPsemantics, Supabasevector, etc. - Out of scope — consumer file-sync (Dropbox, Google Drive, …): not queryable object stores; supporting them is a separate fetch connector (API auth → download → load), not in-place access.
- Decisions: model the edges as new action types vs. an external
sqlanvil import/exportCLI; one warehouse-agnostic{ format, location }/connection surface vs. per-warehouse blocks; how far to push automatic type translation vs. requiring explicit casts.
- A migration tool that converts an existing Dataform project to SQLAnvil.
- Scope (mirrors the deltas in the main repo's
AGENTS.md/ thesqlanvil-engineering-fundamentalsskill — reuse that delta list as the conversion spec):dataform.json→workflow_settings.yaml(dropdefaultProject/defaultLocationfor non-BigQuery;dataformCoreVersion→sqlanvilCoreVersion).@dataform/core→@sqlanvil/core;.df-credentials.jsonshape mapping per target warehouse.- When targeting Postgres: translate
bigquery: { partitionBy, clusterBy }→postgres: { partition, indexes }where possible; flag what can't auto-translate (e.g. BQ-specificOPTIONS,bigqueryPolicyTags). - Leave BigQuery-targeted projects mostly intact (just the rename/core swap).
- Decision to make: lossy auto-translate with warnings vs. a report-only "what needs manual attention" mode (probably offer both).
- Replace/augment
sqlanvil initwith an interactive question/answer flow. - Flow: pick warehouse (bigquery/postgres/supabase/…) → collect connection params → generate a
valid
workflow_settings.yaml, a.df-credentials.json(strict JSON — no comment keys; the parser rejects them), a starterdefinitions/(a declaration/source + one model), and a.gitignorethat excludes credentials. - Lessons to bake in (learned building
examples/postgres_shop): setsqlanvilCoreVersioncorrectly (notdataformCoreVersion); for Postgres use the flatPostgresConnectionfield names (host/port/database/user/password/sslMode/defaultSchema); offer to test the connection (the CLI already has a credentials test path). - Ties into design doc Phase 4 ("
sqlanvil inittemplates for both new warehouses").
- Add a first-class
--environment <name>concept for dev/staging/prod isolation. - What already exists (the primitive, don't rebuild it):
--schema-suffix <name>(anddatasetSuffix:inworkflow_settings.yaml) already produce dynamic datasets. The suffix is applied in core at compile time —session.tsappends_<suffix>to every action'starget.schemabefore any adapter runs — so it already works for Postgres/Supabase (the adapter just readstarget.schema).--varsoverrides flow the same way, viaconstructProjectConfigOverride→projectConfigOverride(cli/index.ts). What's missing is the ergonomic, named layer on top. - The new feature:
--environmentis the name that actually makes sense (more intuitive than--schema-suffix). Two possible scopes:- A — thin alias:
--environment <name>simply setsschemaSuffix = <name>. A few lines in the CLI option layer; delivers exactly "append the environment to the schema". Good stopgap. - B — real environments (preferred): an
environments.json(orenvironments:block inworkflow_settings.yaml) defining named environments, each carrying its ownschemaSuffix,vars, and — the real reason environments exist — its own credentials / database (prod vs staging usually differ by more than a schema suffix); optionally a git ref.--environment stagingloads that env and applies its overrides (config override + credential selection).
- A — thin alias:
- Implementation pointers: CLI flag → merge env overrides into the existing
projectConfigOverrideplumbing; per-env credentials means selecting the credentials file/profile by environment (today--credentialsis a single path). Keep--schema-suffixas the documented low-level primitive (and possibly make--environmentan alias that also sets it). - Decision to make: ship A first as a stopgap, or go straight to B. Where do environments live —
separate
environments.json(Dataform's legacy location) or a block inworkflow_settings.yaml(more consistent with the rest of sqlanvil config)?