refactor(db): composite PK on M2M association tables (sc-105349)#39859
refactor(db): composite PK on M2M association tables (sc-105349)#39859mikebridge wants to merge 8 commits intoapache:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #39859 +/- ##
=======================================
Coverage 63.87% 63.87%
=======================================
Files 2582 2582
Lines 136413 136413
Branches 31453 31453
=======================================
Hits 87136 87136
Misses 47764 47764
Partials 1513 1513
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| f"Drop or migrate the referencing FK before applying this " | ||
| f"migration." |
There was a problem hiding this comment.
This will likely cause lint errors (f-string without any interpolation).
| """ | ||
| # Identifiers come from the AFFECTED_TABLES whitelist, not user input. | ||
| sql = sa.text( | ||
| f"DELETE FROM {t.name} WHERE {t.fk1} IS NULL OR {t.fk2} IS NULL" # noqa: S608 |
There was a problem hiding this comment.
The combination of sa.text() and f" makes me nervous. Let's build a proper query here using SQLAlchemy core/ORM instead of building the SQL with string interpolation.
| sample_sql = sa.text( | ||
| f"SELECT {t.fk1}, {t.fk2}, id FROM {t.name} WHERE id NOT IN (" # noqa: S608 | ||
| f" SELECT keep_id FROM (" | ||
| f" SELECT MIN(id) AS keep_id FROM {t.name} " | ||
| f"GROUP BY {t.fk1}, {t.fk2}" | ||
| f" ) AS s" | ||
| f") LIMIT 10" | ||
| ) | ||
| sample = list(conn.execute(sample_sql)) | ||
| sql = sa.text( | ||
| f"DELETE FROM {t.name} WHERE id NOT IN (" # noqa: S608 | ||
| f" SELECT keep_id FROM (" | ||
| f" SELECT MIN(id) AS keep_id FROM {t.name} " | ||
| f"GROUP BY {t.fk1}, {t.fk2}" | ||
| f" ) AS s" | ||
| f")" | ||
| ) |
There was a problem hiding this comment.
Same here, let's use a programmatic query instead of building the string.
| f"SELECT COUNT(*) FROM (" # noqa: S608 | ||
| f" SELECT 1 FROM {t.name} GROUP BY {t.fk1}, {t.fk2} HAVING COUNT(*) > 1" | ||
| f") AS s" |
There was a problem hiding this comment.
Same here. Also, not that it's usually much more legible to simply use triple quotes if we wanted to build the query as a string:
sql = f"""
SELECT COUNT(*) FROM (
SELECT 1 FROM {t.name}
GROUP BY {t.fk1}, {t.fk2}
HAVING COUNT(*) > 1
) AS s
"""There was a problem hiding this comment.
These are updated---I am also running the down migrations locally against sqlite, postgres and mysql to make sure they do something sensible.
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dd4d48d to
0e5380d
Compare
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Replace synthetic id INTEGER PRIMARY KEY with composite PRIMARY KEY (fk1, fk2) on the eight pure-junction tables: dashboard_roles, dashboard_slices, dashboard_user, report_schedule_user, rls_filter_roles, rls_filter_tables, slice_user, sqlatable_user. The redundant UNIQUE(fk1, fk2) on dashboard_slices and report_schedule_user is dropped (subsumed by the new PK). Migration handles dialect quirks: copy_from for tables with pre-existing UNIQUE (so SQLite's anonymous-constraint reflection doesn't matter), wrapped- subquery dedupe for MySQL (ERROR 1093), sa.Identity(always=False) on downgrade to backfill the restored id column without NOT NULL violations, and distinct PK names per direction (pk_<table> on upgrade, <table>_pkey on downgrade) to avoid round-trip index-name collisions on Postgres. ORM Table() definitions updated to match. UPDATING.md entry added with operator runbook (BI-tool impact, pre-flight inventory queries, dedupe-row- loss notice, pg_dump workaround, FK-NOT-NULL downgrade asymmetry note). Tests: 8 schema-shape assertions (post-upgrade), 8 duplicate-rejection unit tests, 8 distinct-pair sanity tests, 1 round-trip + idempotency test (in-memory SQLite via Alembic MigrationContext). Continuum-restore verification against the new shape is out of scope for this PR; it is the responsibility of the versioning epic (sc-103156). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups from PR review: 1. ``dashboard_roles.dashboard_id`` was created nullable in revision e11ccdd12658 but was missing from ``TABLES_WITH_NULLABLE_FKS``. A production database with a stray NULL ``dashboard_id`` row would have failed the PK-add with a cryptic constraint violation. Fix by running the NULL-FK cleanup on every affected table — it is a no-op DELETE on tables whose FK columns are already NOT NULL, and it eliminates the risk of further drift in the hardcoded set. ``dashboard_roles`` is added to the documentation set; the runtime now does not consult it. 2. The unit-test parent-table name for ``rls_filter_roles`` and ``rls_filter_tables`` was ``rls_filter`` (does not exist) instead of the real parent ``row_level_security_filters``. Test passes either way (the in-memory FK is self-consistent), but the parameter is now accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four operator-experience improvements from the second review pass: 1. ``TABLES_WITH_NULLABLE_FKS`` is now explicitly documented as an informational set that is not consulted at runtime; the comment explains the previous ``dashboard_roles`` omission was the bug that motivated the always-run cleanup. 2. ``_delete_null_fk_rows`` docstring updated to match the "always run" semantics (was still claiming "called only on tables in TABLES_WITH_NULLABLE_FKS"). 3. ``_check_no_external_fks_to_id`` now documents its scope limitation: ``Inspector.get_table_names()`` returns the default schema only, so cross-schema FKs in non-standard multi-schema PostgreSQL deployments would not be caught. The single-schema case (Superset's documented deployment) is fully covered. 4. ``_dedupe_by_min_id`` now logs a sample of up to 10 discarded ``(fk1, fk2, id)`` tuples at WARN before deletion, so operators can audit which rows the ``MIN(id)`` policy drops. The keep- original policy is correct in practice but discards later re-grants on ownership tables; the sample makes that visible. 5. ``UPDATING.md`` documents the upgrade/downgrade primary-key name divergence (``pk_<table>`` vs ``<table>_pkey``) so operators using schema-comparison tools don't mistake it for migration drift. No schema or runtime-behaviour changes. All 44 migration tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Beto's review comments on apache#39859: replace ``sa.text(f"...")`` SQL construction in the three pre-flight helpers (``_delete_null_fk_rows``, ``_dedupe_by_min_id``, ``_assert_no_duplicates``) with SQLAlchemy core constructs (``sa.delete``, ``sa.select``, ``sa.func``, ``.subquery()``, ``.notin_()``). A small ``_table_clause()`` helper builds a lightweight ``TableClause`` exposing the columns the queries reference; the three helpers consume it. Removes all ``# noqa: S608`` comments — they are no longer needed because there is no string-interpolated SQL. Verified the compiled SQL is identical on Postgres, MySQL, and SQLite, including the MySQL ERROR 1093 workaround (the inner aggregation is wrapped in a derived table via ``.subquery()``, producing ``... NOT IN (SELECT keep_id FROM (SELECT min(id) ...) AS keep_min)``). Also drops the redundant ``f`` prefix on the two non-interpolating lines of the ``_check_no_external_fks_to_id`` error message. 44 migration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI test-mysql failed with: MySQLdb.OperationalError: (1826, "Duplicate foreign key constraint name 'fk_dashboard_slices_slice_id_slices'") Root cause: MySQL scopes foreign-key constraint names per-database, not per-table (PostgreSQL and SQLite scope per-table). The ``batch_alter_table(... recreate="always", copy_from=...)`` path used for ``dashboard_slices`` and ``report_schedule_user`` builds ``_alembic_tmp_<table>`` carrying the original FK names from ``copy_from`` while the original table still holds those names — MySQL rejects the temp-table creation with ERROR 1826. Fix: on MySQL only, drop the original FK constraints by name before the ``batch_alter_table`` runs. The ``copy_from`` re-creates them on the rebuilt table with their original names, so the post-migration shape is unchanged. On PostgreSQL and SQLite the original code path still runs unchanged. Local SQLite tests (44 passed, 1 skipped) still pass; CI will validate on MySQL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two MySQL-only failures in the downgrade path, found by running the full migration history against a fresh MySQL 8 container: 1. ``MySQLdb.OperationalError: (1553, "Cannot drop index 'PRIMARY': needed in a foreign key constraint")``. InnoDB uses the composite PK index to back the FK on the leftmost column. The downgrade tried to drop the composite PK before dropping the FKs, orphaning the FK's backing index. PostgreSQL and SQLite create separate indexes for FK columns and don't trip on this. 2. ``Field 'id' doesn't have a default value`` on subsequent INSERT. ``sa.Identity(always=False)`` only emits ``AUTO_INCREMENT`` on MySQL when the column is created with ``primary_key=True`` — our portable path adds the column first then creates the PK separately, so MySQL leaves the column without auto-generation. Existing rows would all collide on id=0; future inserts fail because no default. Postgres' ``GENERATED BY DEFAULT AS IDENTITY`` and SQLite's ``INTEGER PRIMARY KEY`` rowid alias don't have this gap. Fix: extract ``_downgrade_mysql_table()`` that emits the canonical MySQL idiom — drop FKs, then a single ALTER combining ``DROP PRIMARY KEY, ADD COLUMN id INT NOT NULL AUTO_INCREMENT, ADD PRIMARY KEY (id)`` (which backfills existing rows with sequential ids and preserves AUTO_INCREMENT), restore the redundant UNIQUE on the 2 tables that originally had it, and re-add the FKs with their original names. Postgres and SQLite keep the existing portable ``batch_alter_table`` path. Raw SQL is unavoidable for the combined-ALTER form; per the constitution it's allowed for dialect-specific DDL with no SQLA equivalent, with triple-quoted strings for legibility. Verified end-to-end: upgrade → downgrade → upgrade against a fresh MySQL 8 container with INSERT-without-id sanity check showing the restored ``id`` column auto-increments correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Found by running fresh-install + round-trip against a real SQLite DB: 6 of the 8 affected tables had FK columns that were originally declared nullable. PostgreSQL and MySQL implicitly promote the constituent columns of an ``ALTER TABLE ... ADD PRIMARY KEY`` to ``NOT NULL``; SQLite does not (it's a long-standing SQLite quirk — only ``INTEGER PRIMARY KEY`` enforces NOT NULL on a composite-PK column). Result: a fresh SQLite install would accept ``INSERT INTO dashboard_slices (NULL, 5)`` despite both columns being part of the composite PK. Our integration tests previously masked this: the test fixture seeds columns with ``nullable=False``, so the post-upgrade NOT NULL assertion passed regardless of whether the migration enforced it. Fix: add explicit ``batch_op.alter_column(fk, nullable=False)`` for both FK columns inside the per-table batch_alter_table block. On PostgreSQL and MySQL this is a no-op (PK already implies NOT NULL); on SQLite it adds the missing NOT NULL declaration so a fresh install matches the data-model.md "After" contract. Verified end-to-end: - Postgres + MySQL: column shape unchanged (still NOT NULL) - SQLite fresh install + round-trip: all 8 tables now have NOT NULL on FK columns, ``INSERT (NULL, 5)`` correctly rejected with IntegrityError on dashboard_slices, dashboard_user, sqlatable_user Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3870647 to
7ae35b4
Compare
CI cypress + playwright shards were red with: ERROR [flask_migrate] Error: Multiple head revisions are present for given argument 'head' The recent rebase onto master pulled in ``33d7e0e21daa_add_semantic_layers_and_views.py`` (from PR apache#37815, "semantic layer extension"), which had been authored against ``ce6bd21901ab`` as its parent — the same parent our migration referenced. After the rebase both migrations point at ``ce6bd21901ab``, producing two heads and breaking ``flask db upgrade head`` for any downstream consumer (CI's Cypress / Playwright shards spin up a real Superset instance via ``superset db upgrade``, which is why those shards failed first; the integration shards run against a precomputed schema and didn't surface this). Fix: chain our migration after the semantic-layer migration by pointing ``down_revision`` at ``33d7e0e21daa``. The chain is now linear: ... → ce6bd21901ab → 33d7e0e21daa (semantic layers) → 2bee73611e32 (composite PK, this PR) Verified with ``superset db heads`` (returns single head ``2bee73611e32``) and the local migration test suite (44 passed, 1 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SUMMARY
*** This is a PR to aid in discussion about whether we should replace
idfields in the m2m join tables ***Replace synthetic
id INTEGER PRIMARY KEYwith compositePRIMARY KEY (fk1, fk2)on the eight pure-junction tables. The redundantUNIQUE(fk1, fk2)on the two tables that previously carried one is dropped (subsumed by the new PK). All eight tables are M:N association tables — there is no semantic load on the surrogateid.Affected tables and their composite PK pair:
dashboard_roles(dashboard_id, role_id)dashboard_slices(dashboard_id, slice_id)dashboard_user(user_id, dashboard_id)report_schedule_user(user_id, report_schedule_id)rls_filter_roles(role_id, rls_filter_id)rls_filter_tables(table_id, rls_filter_id)slice_user(user_id, slice_id)sqlatable_user(user_id, table_id)Why now: 6 of the 8 tables had no UNIQUE constraint at all, so duplicate
(fk1, fk2)rows could (and did) accumulate in production data. The migration deduplicates byMIN(id)before the PK promotion. This change also lifts a precondition for the entity-versioning epic (sc-103156) — SQLAlchemy-Continuum's M:N restore replays the version table; surrogate-PK junctions interact poorly with the bulk reassign-and-replace pattern (Continuum issue #129). Continuum verification against the new shape is not part of this PR — it is the responsibility of the versioning epic.BEFORE/AFTER
dashboard_slicesis the canonical example (the other seven follow the same pattern, withdashboard_user,dashboard_roles,slice_user,sqlatable_user,rls_filter_*having no pre-existing UNIQUE):Before:
After:
The redundant
UNIQUE (dashboard_id, slice_id)is dropped (subsumed by the PK).TESTING INSTRUCTIONS
Reproduces locally per
quickstart.mdsteps 4–8:superset db upgrade. Two tables with pre-existing UNIQUE (dashboard_slices,report_schedule_user) are batch-recreated viacopy_from; the other six loseidand gain a composite PK directly. Migration logs the duplicate-row count and NULL-FK row count for each affected table.INSERTa(fk1, fk2)pair, thenINSERTthe same pair a second time. The second insert must raiseIntegrityError(PK violation). Same pair under different IDs is no longer possible.pytest tests/unit_tests/migrations/composite_pk_association_tables_test.py(16 parametrized assertions: 8 duplicate-rejection, 8 distinct-pair).pytest tests/integration_tests/migrations/composite_pk_association_tables__tests.py tests/integration_tests/migrations/composite_pk_round_trip__tests.py(28 schema-shape assertions plus a round-trip + idempotency test against in-memory SQLite viaMigrationContext).superset db downgrade <prior-revision>. Theidcolumn is restored (backfilled viasa.Identity(always=False)on Postgres/MySQL), and the originalUNIQUE(fk1, fk2)is re-added on the two tables that originally had it. Intentional asymmetry: FK columns remainNOT NULLafter downgrade — under SQLAlchemysecondary=semantics,NULL-FK junction rows are meaningless, so we don't restore the original nullable state.superset db upgradeafter downgrade returns the post-upgrade shape.Cross-DB matrix (SC-005)
Verified end-to-end on all three backends locally — fresh full-history install (
superset db upgradefrom scratch) plus round-trip (upgrade → downgrade → upgrade) — and CI re-runs the migration during integration-test setup.test-postgres (current/next/previous),test-postgres-hive,test-postgres-prestoINSERT-without-idsanity check; CItest-mysqlINSERT (NULL, 5)-rejection sanity check; CItest-sqlite; in-memory unit/integration tests (44 passed, 1 skip)Bugs found and fixed during local cross-backend verification
The dedicated unit and integration tests run against in-memory SQLite, and CI's per-backend integration shards exercise the migration only as part of the suite's setup phase (which catches crashes but not subtle behavioural disparities). Doing fresh-install + round-trip locally on each real backend surfaced four bugs that were masked by both layers:
Duplicate foreign key constraint name) duringrecreate="always"— InnoDB scopes FK constraint names per-database, not per-table. The temp table created by the recreate path collides with the original. CI's setup phase did catch this (thetest-mysqlshard turned red on first push); fix: drop FKs by name before the recreate on MySQL.Cannot drop index 'PRIMARY': needed in a foreign key constraint) on downgrade — InnoDB uses the composite PK index to back the FK on the leftmost column. Dropping the PK without first dropping the FKs orphans the index. Only manifests on a real round-trip — CI's setup-only check would never see it; fix: drop FKs before the PK swap on MySQL, re-add them after.AUTO_INCREMENTon the restoredidcolumn on MySQL —sa.Identity(always=False)only emitsAUTO_INCREMENTwhen the column hasprimary_key=Trueat create time, but our portable path adds the column then creates the PK separately. Existing rows would all collide onid=0; subsequentINSERTs would fail withField 'id' doesn't have a default value. Found by an explicitINSERT-without-idtest against the post-downgrade DB; fix: combinedDROP PRIMARY KEY, ADD COLUMN AUTO_INCREMENT, ADD PRIMARY KEYALTER on MySQL.NOT NULLon constituent columns (long-standing SQLite quirk — onlyINTEGER PRIMARY KEYdoes). PostgreSQL and MySQL implicitly promote PK columns toNOT NULL; SQLite leaves them nullable for compound keys. A fresh SQLite install would have acceptedINSERT (NULL, 5)despite both columns being part of the PK. The integration tests masked this because the test fixture seeds columns withnullable=False. Fix: explicitbatch_op.alter_column(fk, nullable=False)for both FK columns — no-op on Postgres/MySQL where the PK already implies it.The MySQL-specific fixes use raw triple-quoted SQL (no SQLA-core equivalent for the dialect-specific combined ALTER); the constitution allows raw SQL for dialect-specific DDL with no programmatic equivalent.
Operator runbook
Operators should run the pre-flight inventory queries from
UPDATING.md("Composite primary keys on many-to-many association tables") before applying — they show how many duplicate(fk1, fk2)rows and how many NULL-FK rows exist in their database. The migration deletes both classes (keepingMIN(id)per group for duplicates). The full operator-facing runbook is inUPDATING.mdand includes:idpg_dumpagainst the new schemaADDITIONAL INFORMATION
copy_fromrecreate path isO(n)table-rewrite fordashboard_slicesandreport_schedule_user; the other six are direct ALTER. Expected to complete in seconds-to-tens-of-seconds.Continuum verification deferral
Verification of SQLAlchemy-Continuum's M:N restore behaviour against the new schema is the responsibility of the versioning epic (sc-103156) and is not part of this PR. This PR delivers the schema precondition; the versioning work picks up
sqlalchemy-continuumas a dependency and verifies the restore behaviour there.