This guide outlines a highly robust, sandboxed process for pulling, merging, and resolving conflicts when syncing changes from Google's upstream dataform-co/dataform releases (such as 3.0.58) into the SQLAnvil codebase.
Last synced: upstream 3.0.59 → main on 2026-06-03 (merge commit dfeb1e5d). The local dataform branch mirrors the upstream release tag; each sync advances main's merge-base with upstream, so the next merge only sees new commits.
The #1 lesson: the Bazel build — not grep — is the source of truth for rename leftovers. Git auto-merges most upstream changes cleanly, but those auto-merged regions carry upstream
dataform./df//@dataform//__dataform_*tokens that produce no conflict markers yet still break the renamed fork.tsc(via Bazel) flags every one. Grep is unreliable here: macOS BSD grep silently no-ops on\bword\band on-rgiven an explicit file list. Finish every sync by building, fixing what tsc reports, and rebuilding — never trust a clean grep alone.
To ensure your primary local main branch remains 100% stable during the merge, always perform the merge inside a temporary sandbox branch before merging back into main.
upstream/main (Google)
│
├── (Tagged Release: e.g. 3.0.58)
▼
[1. Fetch tag over HTTPS]
│
▼
[2. Create sandbox branch]
`upstream-sync/dataform-3.0.58`
│
▼
[3. Execute Merge & Resolve Conflicts]
- Mechanical import renames (df/ → sa/)
- Proto packages (dataform → sqlanvil)
│
▼
[4. Run Bazel Verification]
`./scripts/docker-bazel test //...`
│
▼
[5. Merge back to main]
Ensure your upstream remote is configured using HTTPS (to unblock sandbox egress) and fetch all tags:
# 1. Update upstream URL to HTTPS
git remote set-url upstream https://github.com/dataform-co/dataform.git
# 2. Fetch latest releases & tags
git fetch upstream --tagsCheckout a fresh sandbox branch from your local stable main branch:
git checkout main
git checkout -b upstream-sync/3.0.58Attempt to merge the targeted release tag (e.g. 3.0.58) into the sandbox:
git merge 3.0.58Since the Dataform
Conflict: Upstream adds new protobuf fields inside package dataform; whereas SQLAnvil uses package sqlanvil;.
- Resolution:
- Keep SQLAnvil's package declaration:
package sqlanvil;. - Copy the new fields added by Google (e.g.,
string jit_code = ...insideAssertionmessage) and insert them using SQLAnvil naming conventions.
- Keep SQLAnvil's package declaration:
Conflict: Upstream imports use df/, e.g.:
import { ActionBuilder } from "df/core/actions";SQLAnvil uses sa/:
import { ActionBuilder } from "sa/core/actions/base";- Resolution:
- Standardize all new/merged imports to use the
sa/prefix.
- Standardize all new/merged imports to use the
Conflict: Upstream TypeScript code references Google's generated proto namespace dataform.Assertion, while SQLAnvil uses sqlanvil.Assertion.
- Resolution:
- Globally replace the merged references to use
sqlanvil.instead ofdataform..
- Globally replace the merged references to use
Not a git conflict. Upstream code that auto-merges cleanly — new helper bodies, new test cases, files upstream rewrote wholesale — arrives carrying dataform.X, df/… imports, @dataform/*, or __dataform_current_file. No conflict markers, but it compile-breaks the fork.
- Resolution:
- After resolving the visible conflicts, build and rename every token
tscreports:./scripts/docker-bazel build //core/... //cli/... //protos/... --jobs=2 --local_ram_resources=2048 - When upstream rewrote a whole file's apparatus (e.g.
cli/vm/compile.tscaller-file machinery in 3.0.59), don't resolve hunk-by-hunk —git checkout --theirs <file>, then re-apply the rename. Piecemeal resolution leaves auto-merged code referencing variables only the upstream side defines (e.g.coreBundlePath,needsCallerFileShim).
- After resolving the visible conflicts, build and rename every token
- Caller-file global: the exposed sandbox global must stay
__sqlanvil_current_file(read bycore/utils.ts); rename upstream's__dataform_current_fileto it.__df_enter/__df_exit/__df_currentare internal helper names — fine to leave. dataform.jsonclean break: never reintroduce thehasDataformJson/global.dataformJsonhandling upstream adds — SQLAnvil reads onlyworkflow_settings.yaml.- Extracted helpers: when upstream moves logic into helpers (e.g.
executionSql.createTableTasks/Operation/Assertion), the rename must follow into the auto-merged helper bodies; prior SQLAnvil behavior (e.g.disabled-action handling) is usually preserved inside them — verify rather than re-add. - CLI install-path tests: upstream tests that
npm i @dataform/core@<ver>become@sqlanvil/core@<ver>(unpublished) — they compile but fail at runtime. Skip or adapt; don't let them block the sync. df_in generated SQL / test fixtures: watch for non-namespace leftovers likedf_osc_,_df_temp_,df_integration_test→ rename tosa_. Casing artifacts too (readsqlanvil…→readSqlanvil…).
Once all conflicts are resolved, run the full validation suite inside your development container. Native macOS Bazel is broken (the wrapped_clang / dyld LC_UUID toolchain error compiling protobuf C++), so all builds/tests go through scripts/docker-bazel. Always pass --jobs=2 --local_ram_resources=2048 — the in-container Bazel JVM gets OOM-killed (Socket closed, error 14) under default parallelism during webpack bundling.
# 1. Build everything affected — THIS is what catches reintroduced dataform tokens
./scripts/docker-bazel build //core/... //cli/... //protos/... --jobs=2 --local_ram_resources=2048
# 2. Run core compiler tests
./scripts/docker-bazel test //core/... --jobs=2 --local_ram_resources=2048
# 2. Run the newly updated integration tests
PG_HOST=host.docker.internal PG_PORT=5432 ./scripts/docker-bazel test //tests/integration:postgres.spec --test_env=PG_HOST --test_env=PG_PORT --test_env=PG_USER --test_env=PG_PASSWORD --test_env=PG_DATABASEIf the build completes and all tests pass:
# 3. Checkout main & merge the verified sync branch
git checkout main
git merge upstream-sync/3.0.58
# 4. Clean up the sandbox branch
git branch -d upstream-sync/3.0.58
# 5. Push updated main to origin (GitHub)
git push origin mainBugs we fixed locally that are also present upstream and that we've reported to
dataform-co/dataform. On each sync, check whether upstream adopted the fix — if so, take
their version and drop our local change so the file stops diverging (less future merge friction).
If not, keep ours and re-apply over the merge.
| File | Local fix | Upstream issue | Status |
|---|---|---|---|
common/flags/index.ts |
Lenient arg parser — ignore non-flag tokens instead of throwing Arg neither flag name nor flag value (which crashed the CLI when a positional followed a flag). Extracted parseArgvFlags() + test. |
dataform-co/dataform#2198 | open (not yet adopted) |