RavenDB-26046 - Add CDC Sink documentation#2387
Open
ayende wants to merge 15 commits intoravendb:mainfrom
Open
RavenDB-26046 - Add CDC Sink documentation#2387ayende wants to merge 15 commits intoravendb:mainfrom
ayende wants to merge 15 commits intoravendb:mainfrom
Conversation
Adds full CDC Sink ongoing task documentation in Docusaurus MDX format: - 16 core pages: overview, how-it-works, schema-design, embedded-tables, linked-tables, column-mapping, patching, delete-strategies, property-retention, attachment-handling, configuration-reference, api-reference, monitoring, failover-and-consistency, troubleshooting, server-configuration - 9 PostgreSQL pages: prerequisites-checklist, wal-configuration, permissions-and-roles, initial-setup, replica-identity, replica-identity-manual-setup, cleanup-and-maintenance, monitoring-postgres, studio-ui - 4 PostgreSQL examples: simple-migration, denormalization, event-sourcing, complex-nesting - 1 SQL Server stub: overview - 4 _category_.json navigation files
…al features
- Replace ColumnsMapping (Dictionary) + AttachmentNameMapping (Dictionary) with
unified Columns list of CdcColumnMapping { Column, Name, Type } across all files
- Add CdcColumnType enum documentation (Default, Json, Attachment)
- Add REST API endpoints table to configuration-reference
- Add CdcSink.PollIntervalInSec to server-configuration
- Add error handling details to monitoring (threshold, fallback, exponential backoff)
- Add ALTER PUBLICATION auto-fix note to postgres/initial-setup
- Fix how-it-works: sequential scan description, Child Before Parent section
- Fix Startup and Verification: split into per-database subsections
- Update all prose references from ColumnsMapping to Columns list
…chment handling - Replace all new() shorthand with new CdcColumnMapping() across all files - attachment-handling: clarify that text columns (text, nvarchar, etc.) as well as binary columns can use Type = CdcColumnType.Attachment
… uses application/octet-stream
…$old documentation - Add postgres/type-mapping.mdx: full reference table of PostgreSQL column types and their JavaScript/CLR equivalents (scalars, arrays, json/jsonb, bytea, pgvector) - Add patching.mdx "$row and $old: Names and Types" - Fix cleanup-and-maintenance.mdx: replace obsolete "Configuration Changes That Rename Slots" section (described hash-based naming, no longer accurate) with correct "Slot and Publication Names Are Immutable" section reflecting enforced immutability
- Name → CollectionName on CdcSinkTableConfig across all files - Remove Type from CdcSinkLinkedTableConfig (linked tables have no relation type) - Remove Disabled from CdcSinkEmbeddedTableConfig, add LinkedTables - Add FactoryName table (Npgsql, SqlClient, MySql) to configuration-reference - Add CdcColumnMapping and CdcColumnType reference sections - Add put(id, document) and del(id) to patch capabilities - Add JSON Columns section to column-mapping - Remove non-existent Array References section from linked-tables - Remove non-existent Disabling an Embedded Table section from embedded-tables - Update server-configuration descriptions (MaxBatchSize, MaxFallbackTimeInSec, PollIntervalInSec applies to SQL Server only) - Fix licensing link in overview - Add SQL Server and MySQL/MariaDB as supported source databases
- postgres/overview.mdx: connection string, logical replication explanation, prerequisites summary, section index - sql-server/overview.mdx: expand from stub to full page with connection string, CDC prerequisites, polling behavior, SourceTableSchema default - mysql/overview.mdx + _category_.json: connection string, binlog prerequisites, streaming behavior, required privileges
…roubleshooting sections - Source Schema Changes: how each database engine handles DDL changes on source tables while CDC Sink is running (adding/removing/renaming columns, SQL Server capture instance limitations) - Partial Export/Import and State Loss: @cdc-states collection, recovery guidance, SkipInitialLoad workaround, LSN editing risks
ayende
commented
Apr 7, 2026
…update behavior Broken links fixed: - api-reference: send-multiple-operations → what-are-operations - attachment-handling: what-are-attachments → attachments/overview - column-mapping, patching: postgres/type-mapping.mdx (nonexistent) → cross-reference to patching.mdx#row-column-types PR comment fixes: - MySQL overview: rename MyMySqlConnection → MySqlConnection - Postgres overview: slot/publication names are GUID-based on first use, not deterministic hash-based New content: - how-it-works: "Updating the Task Configuration" section explaining that config changes only apply to new CDC events going forward. Existing documents are not retroactively re-processed. To apply changes to all documents, delete and recreate the task.
Move schema change documentation from troubleshooting into its own page with per-engine detail: - PostgreSQL: auto-detects via RelationMessage, most resilient - MySQL: detects via TableMapEvent column types, auto-recovers - SQL Server: requires explicit capture instance procedure (create new instance, drain old, then drop) - Quick reference table, SQL examples, recovery mechanism - Troubleshooting retains a short summary with link to the new page
MySQL CDC detects changes by column position. Compound ALTER TABLE statements (add + drop, ADD COLUMN ... AFTER ...) cause positional shifts that are hard to resolve. Apply one change at a time and let CDC Sink catch up between each.
26 tasks
There was a problem hiding this comment.
Pull request overview
Adds a comprehensive new documentation section for the CDC Sink ongoing task (RavenDB 7.2), including core conceptual docs, configuration/API references, and engine-specific guides (PostgreSQL, MySQL, SQL Server) with examples.
Changes:
- Introduces new CDC Sink core documentation pages (overview, how-it-works, schema/mapping, patching, monitoring, troubleshooting, etc.).
- Adds PostgreSQL-specific operational docs (WAL, permissions, REPLICA IDENTITY, initial setup, monitoring, cleanup) and examples.
- Adds initial SQL Server overview (stub) and MySQL overview, plus sidebar category entries.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/server/ongoing-tasks/cdc-sink/category.json | Adds CDC Sink section to the docs sidebar. |
| docs/server/ongoing-tasks/cdc-sink/overview.mdx | CDC Sink high-level overview and positioning. |
| docs/server/ongoing-tasks/cdc-sink/how-it-works.mdx | Explains lifecycle, initial load, streaming, failover mechanics. |
| docs/server/ongoing-tasks/cdc-sink/schema-design.mdx | Describes mapping relational schema to documents (root/embedded/linked). |
| docs/server/ongoing-tasks/cdc-sink/embedded-tables.mdx | Embedded tables configuration and nesting constraints. |
| docs/server/ongoing-tasks/cdc-sink/linked-tables.mdx | Linked table references and composite FK behavior. |
| docs/server/ongoing-tasks/cdc-sink/column-mapping.mdx | Column mapping behavior, types, validation rules. |
| docs/server/ongoing-tasks/cdc-sink/patching.mdx | Patch scripting model, variables, examples, idempotency guidance. |
| docs/server/ongoing-tasks/cdc-sink/delete-strategies.mdx | DELETE handling options and patterns for root/embedded. |
| docs/server/ongoing-tasks/cdc-sink/property-retention.mdx | Explains merge semantics and preserved properties. |
| docs/server/ongoing-tasks/cdc-sink/attachment-handling.mdx | Attachment mapping behavior for root/embedded columns. |
| docs/server/ongoing-tasks/cdc-sink/configuration-reference.mdx | Reference for configuration types, enums, REST endpoints. |
| docs/server/ongoing-tasks/cdc-sink/api-reference.mdx | Client API operations for CDC Sink management. |
| docs/server/ongoing-tasks/cdc-sink/monitoring.mdx | Task states, fallback behavior, stats/alerts overview. |
| docs/server/ongoing-tasks/cdc-sink/source-schema-changes.mdx | Engine-specific schema change handling guidance. |
| docs/server/ongoing-tasks/cdc-sink/troubleshooting.mdx | Common failure modes and remediation steps. |
| docs/server/ongoing-tasks/cdc-sink/server-configuration.mdx | Documents server-wide keys for CDC Sink behavior. |
| docs/server/ongoing-tasks/cdc-sink/postgres/category.json | Adds PostgreSQL subsection to CDC Sink sidebar. |
| docs/server/ongoing-tasks/cdc-sink/postgres/overview.mdx | PostgreSQL-specific overview and entry point. |
| docs/server/ongoing-tasks/cdc-sink/postgres/prerequisites-checklist.mdx | Pre-flight checklist for PostgreSQL setup. |
| docs/server/ongoing-tasks/cdc-sink/postgres/wal-configuration.mdx | WAL/logical replication configuration steps. |
| docs/server/ongoing-tasks/cdc-sink/postgres/permissions-and-roles.mdx | Required permissions and recommended role setup. |
| docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity.mdx | REPLICA IDENTITY rationale and requirements. |
| docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity-manual-setup.mdx | DBA-focused manual REPLICA IDENTITY instructions. |
| docs/server/ongoing-tasks/cdc-sink/postgres/initial-setup.mdx | Slot/publication creation & verification workflow. |
| docs/server/ongoing-tasks/cdc-sink/postgres/monitoring-postgres.mdx | Postgres-side queries for lag/WAL/slot health. |
| docs/server/ongoing-tasks/cdc-sink/postgres/cleanup-and-maintenance.mdx | Cleanup guidance for slots/publications lifecycle. |
| docs/server/ongoing-tasks/cdc-sink/postgres/studio-ui.mdx | Studio UX guidance for creating/editing tasks. |
| docs/server/ongoing-tasks/cdc-sink/postgres/examples/category.json | Adds PostgreSQL examples subsection to sidebar. |
| docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-simple-migration.mdx | Minimal single-table migration example. |
| docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-denormalization.mdx | Denormalization example with embedded tables. |
| docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-event-sourcing.mdx | Event-sourcing/aggregation example with patches. |
| docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-complex-nesting.mdx | Deep nesting + linked references example. |
| docs/server/ongoing-tasks/cdc-sink/sql-server/category.json | Adds SQL Server subsection to CDC Sink sidebar. |
| docs/server/ongoing-tasks/cdc-sink/sql-server/overview.mdx | SQL Server CDC overview, prerequisites, polling config. |
| docs/server/ongoing-tasks/cdc-sink/mysql/category.json | Adds MySQL subsection to CDC Sink sidebar. |
| docs/server/ongoing-tasks/cdc-sink/mysql/overview.mdx | MySQL/MariaDB binlog-based overview and prerequisites. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+96
to
+98
| * **Adding or dropping unmapped columns** — transparent. The unmapped column is ignored. | ||
| The CDC Sink may restart itself to re-resovle the current schema from `INFORMATION_SCHEMA`, | ||
| but normal operations will resume momentarily. |
Comment on lines
+215
to
+216
|
|
||
|
|
Comment on lines
+158
to
+159
| in patch scripts via `$row.base_price` (for the current row's values) and | ||
| `$old?.base_price` (for the previous row's values on UPDATE events). |
Comment on lines
+182
to
+188
| * **PrimaryKeyColumns** — Used to match items within the parent's array for UPDATE and DELETE | ||
| * **JoinColumns** — Foreign key referencing the parent's `PrimaryKeyColumns` | ||
|
|
||
| The `JoinColumns` must exactly match the parent's `PrimaryKeyColumns`: | ||
|
|
||
| | Parent PK | Required JoinColumns | Valid? | | ||
| |-----------|---------------------|--------| |
Comment on lines
+62
to
+64
| **Document ID generation:** `\{CollectionName\}/\{pk1\}/\{pk2\}/...` | ||
| A row with `id = 42` and collection `Orders` becomes document `Orders/42`. | ||
| A composite PK `(region, id)` with values `(US, 42)` becomes `Orders/US/42`. |
| </TabItem> | ||
| </Tabs> | ||
|
|
||
| With `customer_id = 42`, the document gets `"Customer": "Customers/42"`. |
Comment on lines
+70
to
+75
| {`\{ | ||
| "Id": 1, | ||
| "CustomerId": 42, | ||
| "Customer": "Customers/42" | ||
| \} | ||
| `} |
Comment on lines
+59
to
+60
| The maximum time the task will remain in fallback mode before reporting an error | ||
| is controlled by the `CdcSink.MaxFallbackTimeInSec` configuration key. |
Comment on lines
+78
to
+80
| * **Exceeded fallback timeout** — the source was unreachable for longer than | ||
| `CdcSink.MaxFallbackTimeInSec`. The task moves to error state after this timeout. | ||
| Restore connectivity and re-enable the task. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds full documentation for the new CDC Sink ongoing task (RavenDB 7.2, RavenDB-26046).
Core pages (16): overview, how-it-works, schema-design, embedded-tables, linked-tables, column-mapping, patching, delete-strategies, property-retention, attachment-handling, configuration-reference, api-reference, monitoring, failover-and-consistency, troubleshooting, server-configuration
PostgreSQL pages (9): prerequisites-checklist, wal-configuration, permissions-and-roles, initial-setup, replica-identity, replica-identity-manual-setup, cleanup-and-maintenance, monitoring-postgres, studio-ui
PostgreSQL examples (4): simple-migration, denormalization, event-sourcing, complex-nesting
SQL Server (1): overview stub
Key topics covered:
CdcColumnMappingwithColumn,Name, andCdcColumnType(Default,Json,Attachment)$row,$old,load(), OnDelete strategiesTest plan
/server/ongoing-tasks/cdc-sink/overviewand verify sidebar navigation