Skip to content

chore(weave): add migration-lock table for ch replicated migrations#6511

Open
gtarpenning wants to merge 10 commits intomasterfrom
griffin/migration-lock-table
Open

chore(weave): add migration-lock table for ch replicated migrations#6511
gtarpenning wants to merge 10 commits intomasterfrom
griffin/migration-lock-table

Conversation

@gtarpenning
Copy link
Copy Markdown
Member

@gtarpenning gtarpenning commented Mar 30, 2026

Adds a distributed migration lock so that only one replica runs ClickHouse migrations at a time in multi-replica deployments (1S2R, 2S2R, etc).

https://coreweave.atlassian.net/browse/WB-32669

  • New migration_lock.py module: insert-then-verify advisory lock backed by a MergeTree table with TTL expiry
  • apply_migrations now acquires the lock before running, releases on exit
  • Lock table creation is engine-aware (plain MergeTree for Replicated DBs, explicit ReplicatedMergeTree with shared ZK path for Atomic DBs)
  • Tested manually on 1S1R, 1S2R, 2S2R topologies; unit + integration tests included

@gtarpenning gtarpenning changed the title Griffin/migration lock table chore(weave): add migration-lock table for ch replicated migrations Mar 30, 2026
@gtarpenning gtarpenning changed the base branch from griffin/fix-altinity-replicated-operator-migration to master March 30, 2026 20:44
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@wandbot-3000
Copy link
Copy Markdown

wandbot-3000 Bot commented Mar 30, 2026

@w-b-hivemind
Copy link
Copy Markdown

w-b-hivemind Bot commented Mar 30, 2026

HiveMind Sessions

8 sessions · 1h 56m · $26

Session Agent Duration Tokens Cost Lines
Pr Review And Sequential Consistency Migration
f97e6eda-cde4-4f70-94a2-3b3b96523729
claude 24m 11.2K $1.66 +2 -11
Pruning Docker, Downloads, And Cache Storage
cba5a381-5b20-4ff0-864f-fa3325a57b81
claude 13m 6.8K $1.61 +0 -0
Fixing Flaky Evaluation Test And Pr Update
803c495f-bd39-4257-956c-3ac974717ac0
claude 28m 54.6K $7.95 +49 -37
Refactor Pr Tests Toward Functional Style
f7506e7b-bec4-46f0-bf87-e3e4d54f9d59
claude 3m 6.9K $0.78 +80 -144
Fix Unnecessary Imports In Migrator Test File
b2e27cc1-4e67-4e2f-ae50-ced161eb215a
claude 10m 33.8K $2.98 +332 -476
Handling DB Migration Locks
fe22fbf7-e4ef-4b77-be50-7ce0be74a391
claude 11m 26.1K $3.17 +225 -62
Fix Merge Conflicts
962a95b8-05f4-4ffd-8c04-4eccc4ecd4f3
claude 1m 3.2K $0.36 +1 -18
Resolve Weave-Trace Init Race
07d79cf4-0ff8-45f2-a0e1-e3aae9cc7512
claude 24m 40.2K $7.31 +458 -59
Total 1h 56m 182.8K $26 +1147 -807

View all sessions in HiveMind →

Run claude --resume f97e6eda-cde4-4f70-94a2-3b3b96523729 to pickup where you left off.

@gtarpenning gtarpenning marked this pull request as ready for review March 31, 2026 00:27
@gtarpenning gtarpenning requested a review from a team as a code owner March 31, 2026 00:27
gtarpenning and others added 7 commits April 13, 2026 15:32
…se assertions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sticky connection (same ch_client for insert + verify) already guarantees
we read our own writes. Idempotent DDL is the ultimate safety net.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gtarpenning gtarpenning force-pushed the griffin/migration-lock-table branch from f5fb781 to 982f123 Compare April 13, 2026 22:33
gtarpenning and others added 3 commits April 14, 2026 09:30
- Retry try_acquire on OperationalError during acquire_with_retry, so
  init-container startup races tolerate brief connection blips.
- Surface persistent transient failures as MigrationLockError (same code
  path as the "lock held too long" timeout) instead of raw driver errors.
- Fix docstring wording (typos hook was tripping on "INSERTs").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants