Skip to content

fix(server): migrate persistence from KV to native DO SQLite#51

Open
kavinsood wants to merge 1 commit into
mainfrom
fix/sql-storage-migration
Open

fix(server): migrate persistence from KV to native DO SQLite#51
kavinsood wants to merge 1 commit into
mainfrom
fix/sql-storage-migration

Conversation

@kavinsood
Copy link
Copy Markdown
Owner

Summary

Migrates YAOS server persistence from the legacy Durable Object KV-style checkpoint/journal store to native DO SQLite.

This fixes the compaction death-spiral class of bugs by replacing multi-key journal/checkpoint orchestration with SQL-backed snapshot chunks and append-only journal rows.

Major changes

  • Add SqlDocStore (native DO SQLite, two tables: snapshot_chunks + journal)
  • Auto-migrate legacy KV state to SQL on first room load
  • Preserve legacy KV data for rollback safety
  • Add _migration_meta marker to distinguish room states
  • Add oversized delta guard (appendUpdate returns null for >1.5MB, coordinator routes to checkpoint)
  • Add ownedBuffer() BLOB safety helper (prevents ArrayBuffer misuse)
  • Add SQL load fallback to legacy KV in non-durable degraded relay mode
  • Gate destructive admin routes behind YAOS_ENABLE_ADMIN_ROUTES env var
  • Add storage observability fields to /__yaos/debug response
  • Add release notes and rollback plan (engineering/sql-storage-migration.md)

Validation

  • npm ci + npm ci --prefix server
  • npm run build — clean
  • npm --prefix server run typecheck — clean
  • npm run test:ci — pass
  • npm run test:regressions → 72 suites, 0 failures
  • Real Obsidian CDP smoke test → 13/13 passing (create, receipt, reconnect, edit convergence)

Rollback

Legacy KV data is preserved after migration. Reverting to the previous server version resumes from KV state with no data loss.

Phase A: Fix compaction death spiral
- Split rewriteCheckpoint into 3 phases: chunk writes (non-txn),
  pointer swap (atomic small txn), cleanup (non-txn best-effort)
- Add circuit breaker for consecutive compaction failures (max 3)
- Make compaction errors visible (console.error + trace)
- Add emergency /__yaos/compact admin endpoint

Phase B: Native DO SQLite storage
- New SqlDocStore using ctx.storage.sql with two tables:
  snapshot_chunks (chunked BLOB) and journal (append-only deltas)
- Automatic KV-to-SQL migration on first load after deploy
- Migration marker (_migration_meta table) for state disambiguation
- ownedBuffer() helper enforces safe ArrayBuffer semantics for BLOBs
- appendUpdate returns null for >1.5MB deltas (explicit size guard,
  not exception-driven), coordinator routes to checkpoint cleanly

Hardening:
- KV fallback if SQL load fails (read-only degraded mode, no SV echoes,
  no persistence attempts — fail-closed, not data-loss waiting room)
- Admin routes (compact, cleanup-kv) gated behind YAOS_ENABLE_ADMIN_ROUTES
- Observability: storageMode, migrationStatus, migrationMeta, coldLoadDurationMs,
  oversizedDeltaCount in /__yaos/debug response

Tests: 145 new assertions across 5 test files
- sql-doc-store.ts: CRUD, compaction, chunking, size guard (28)
- sql-migration-edge-cases.ts: idempotence, equivalence, ArrayBuffer (56)
- sql-oversized-delta-e2e.ts: coordinator fallback path (25)
- admin-route-gating.ts: security surface validation (17)
- chunked-doc-store.ts: existing 19 still pass

Full CI: npm ci + build + test:ci + test:regressions + server typecheck = green
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant