Skip to content

RavenDB-26046 - Add CDC Sink documentation#2387

Open
ayende wants to merge 15 commits intoravendb:mainfrom
ayende:claude/cdc-sink-docs-main
Open

RavenDB-26046 - Add CDC Sink documentation#2387
ayende wants to merge 15 commits intoravendb:mainfrom
ayende:claude/cdc-sink-docs-main

Conversation

@ayende
Copy link
Copy Markdown
Member

@ayende ayende commented Apr 3, 2026

Summary

Adds full documentation for the new CDC Sink ongoing task (RavenDB 7.2, RavenDB-26046).

Core pages (16): overview, how-it-works, schema-design, embedded-tables, linked-tables, column-mapping, patching, delete-strategies, property-retention, attachment-handling, configuration-reference, api-reference, monitoring, failover-and-consistency, troubleshooting, server-configuration

PostgreSQL pages (9): prerequisites-checklist, wal-configuration, permissions-and-roles, initial-setup, replica-identity, replica-identity-manual-setup, cleanup-and-maintenance, monitoring-postgres, studio-ui

PostgreSQL examples (4): simple-migration, denormalization, event-sourcing, complex-nesting

SQL Server (1): overview stub

Key topics covered:

  • CdcColumnMapping with Column, Name, and CdcColumnType (Default, Json, Attachment)
  • Embedded tables, linked tables, multi-level nesting, relation types
  • JavaScript patches, $row, $old, load(), OnDelete strategies
  • GUID-based slot/publication naming, auto ALTER PUBLICATION
  • Initial load sequence, CDC streaming, failover behavior
  • Error threshold and exponential backoff
  • REST API endpoints, server configuration keys
  • PostgreSQL WAL setup, REPLICA IDENTITY, permissions

Test plan

  • Browse to /server/ongoing-tasks/cdc-sink/overview and verify sidebar navigation
  • Spot-check code samples render correctly (C#, SQL, JavaScript)
  • Verify PostgreSQL and SQL Server subsections appear under CDC Sink in sidebar

ayende added 12 commits April 3, 2026 00:44
Adds full CDC Sink ongoing task documentation in Docusaurus MDX format:

- 16 core pages: overview, how-it-works, schema-design, embedded-tables,
  linked-tables, column-mapping, patching, delete-strategies,
  property-retention, attachment-handling, configuration-reference,
  api-reference, monitoring, failover-and-consistency, troubleshooting,
  server-configuration
- 9 PostgreSQL pages: prerequisites-checklist, wal-configuration,
  permissions-and-roles, initial-setup, replica-identity,
  replica-identity-manual-setup, cleanup-and-maintenance,
  monitoring-postgres, studio-ui
- 4 PostgreSQL examples: simple-migration, denormalization,
  event-sourcing, complex-nesting
- 1 SQL Server stub: overview
- 4 _category_.json navigation files
…al features

- Replace ColumnsMapping (Dictionary) + AttachmentNameMapping (Dictionary) with
  unified Columns list of CdcColumnMapping { Column, Name, Type } across all files
- Add CdcColumnType enum documentation (Default, Json, Attachment)
- Add REST API endpoints table to configuration-reference
- Add CdcSink.PollIntervalInSec to server-configuration
- Add error handling details to monitoring (threshold, fallback, exponential backoff)
- Add ALTER PUBLICATION auto-fix note to postgres/initial-setup
- Fix how-it-works: sequential scan description, Child Before Parent section
- Fix Startup and Verification: split into per-database subsections
- Update all prose references from ColumnsMapping to Columns list
…chment handling

- Replace all new() shorthand with new CdcColumnMapping() across all files
- attachment-handling: clarify that text columns (text, nvarchar, etc.) as well
  as binary columns can use Type = CdcColumnType.Attachment
…$old documentation

- Add postgres/type-mapping.mdx: full reference table of PostgreSQL column types
  and their JavaScript/CLR equivalents (scalars, arrays, json/jsonb, bytea, pgvector)

- Add patching.mdx "$row and $old: Names and Types"

- Fix cleanup-and-maintenance.mdx: replace obsolete "Configuration Changes That Rename
  Slots" section (described hash-based naming, no longer accurate) with correct
  "Slot and Publication Names Are Immutable" section reflecting enforced immutability
- Name → CollectionName on CdcSinkTableConfig across all files
- Remove Type from CdcSinkLinkedTableConfig (linked tables have no relation type)
- Remove Disabled from CdcSinkEmbeddedTableConfig, add LinkedTables
- Add FactoryName table (Npgsql, SqlClient, MySql) to configuration-reference
- Add CdcColumnMapping and CdcColumnType reference sections
- Add put(id, document) and del(id) to patch capabilities
- Add JSON Columns section to column-mapping
- Remove non-existent Array References section from linked-tables
- Remove non-existent Disabling an Embedded Table section from embedded-tables
- Update server-configuration descriptions (MaxBatchSize, MaxFallbackTimeInSec,
  PollIntervalInSec applies to SQL Server only)
- Fix licensing link in overview
- Add SQL Server and MySQL/MariaDB as supported source databases
- postgres/overview.mdx: connection string, logical replication explanation,
  prerequisites summary, section index
- sql-server/overview.mdx: expand from stub to full page with connection string,
  CDC prerequisites, polling behavior, SourceTableSchema default
- mysql/overview.mdx + _category_.json: connection string, binlog prerequisites,
  streaming behavior, required privileges
…roubleshooting sections

- Source Schema Changes: how each database engine handles DDL changes
  on source tables while CDC Sink is running (adding/removing/renaming
  columns, SQL Server capture instance limitations)
- Partial Export/Import and State Loss: @cdc-states collection, recovery
  guidance, SkipInitialLoad workaround, LSN editing risks
ayende added 3 commits April 7, 2026 14:21
…update behavior

Broken links fixed:
- api-reference: send-multiple-operations → what-are-operations
- attachment-handling: what-are-attachments → attachments/overview
- column-mapping, patching: postgres/type-mapping.mdx (nonexistent)
  → cross-reference to patching.mdx#row-column-types

PR comment fixes:
- MySQL overview: rename MyMySqlConnection → MySqlConnection
- Postgres overview: slot/publication names are GUID-based on first
  use, not deterministic hash-based

New content:
- how-it-works: "Updating the Task Configuration" section explaining
  that config changes only apply to new CDC events going forward.
  Existing documents are not retroactively re-processed. To apply
  changes to all documents, delete and recreate the task.
Move schema change documentation from troubleshooting into its own
page with per-engine detail:
- PostgreSQL: auto-detects via RelationMessage, most resilient
- MySQL: detects via TableMapEvent column types, auto-recovers
- SQL Server: requires explicit capture instance procedure (create
  new instance, drain old, then drop)
- Quick reference table, SQL examples, recovery mechanism
- Troubleshooting retains a short summary with link to the new page
MySQL CDC detects changes by column position. Compound ALTER TABLE
statements (add + drop, ADD COLUMN ... AFTER ...) cause positional
shifts that are hard to resolve. Apply one change at a time and let
CDC Sink catch up between each.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a comprehensive new documentation section for the CDC Sink ongoing task (RavenDB 7.2), including core conceptual docs, configuration/API references, and engine-specific guides (PostgreSQL, MySQL, SQL Server) with examples.

Changes:

  • Introduces new CDC Sink core documentation pages (overview, how-it-works, schema/mapping, patching, monitoring, troubleshooting, etc.).
  • Adds PostgreSQL-specific operational docs (WAL, permissions, REPLICA IDENTITY, initial setup, monitoring, cleanup) and examples.
  • Adds initial SQL Server overview (stub) and MySQL overview, plus sidebar category entries.

Reviewed changes

Copilot reviewed 38 out of 38 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
docs/server/ongoing-tasks/cdc-sink/category.json Adds CDC Sink section to the docs sidebar.
docs/server/ongoing-tasks/cdc-sink/overview.mdx CDC Sink high-level overview and positioning.
docs/server/ongoing-tasks/cdc-sink/how-it-works.mdx Explains lifecycle, initial load, streaming, failover mechanics.
docs/server/ongoing-tasks/cdc-sink/schema-design.mdx Describes mapping relational schema to documents (root/embedded/linked).
docs/server/ongoing-tasks/cdc-sink/embedded-tables.mdx Embedded tables configuration and nesting constraints.
docs/server/ongoing-tasks/cdc-sink/linked-tables.mdx Linked table references and composite FK behavior.
docs/server/ongoing-tasks/cdc-sink/column-mapping.mdx Column mapping behavior, types, validation rules.
docs/server/ongoing-tasks/cdc-sink/patching.mdx Patch scripting model, variables, examples, idempotency guidance.
docs/server/ongoing-tasks/cdc-sink/delete-strategies.mdx DELETE handling options and patterns for root/embedded.
docs/server/ongoing-tasks/cdc-sink/property-retention.mdx Explains merge semantics and preserved properties.
docs/server/ongoing-tasks/cdc-sink/attachment-handling.mdx Attachment mapping behavior for root/embedded columns.
docs/server/ongoing-tasks/cdc-sink/configuration-reference.mdx Reference for configuration types, enums, REST endpoints.
docs/server/ongoing-tasks/cdc-sink/api-reference.mdx Client API operations for CDC Sink management.
docs/server/ongoing-tasks/cdc-sink/monitoring.mdx Task states, fallback behavior, stats/alerts overview.
docs/server/ongoing-tasks/cdc-sink/source-schema-changes.mdx Engine-specific schema change handling guidance.
docs/server/ongoing-tasks/cdc-sink/troubleshooting.mdx Common failure modes and remediation steps.
docs/server/ongoing-tasks/cdc-sink/server-configuration.mdx Documents server-wide keys for CDC Sink behavior.
docs/server/ongoing-tasks/cdc-sink/postgres/category.json Adds PostgreSQL subsection to CDC Sink sidebar.
docs/server/ongoing-tasks/cdc-sink/postgres/overview.mdx PostgreSQL-specific overview and entry point.
docs/server/ongoing-tasks/cdc-sink/postgres/prerequisites-checklist.mdx Pre-flight checklist for PostgreSQL setup.
docs/server/ongoing-tasks/cdc-sink/postgres/wal-configuration.mdx WAL/logical replication configuration steps.
docs/server/ongoing-tasks/cdc-sink/postgres/permissions-and-roles.mdx Required permissions and recommended role setup.
docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity.mdx REPLICA IDENTITY rationale and requirements.
docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity-manual-setup.mdx DBA-focused manual REPLICA IDENTITY instructions.
docs/server/ongoing-tasks/cdc-sink/postgres/initial-setup.mdx Slot/publication creation & verification workflow.
docs/server/ongoing-tasks/cdc-sink/postgres/monitoring-postgres.mdx Postgres-side queries for lag/WAL/slot health.
docs/server/ongoing-tasks/cdc-sink/postgres/cleanup-and-maintenance.mdx Cleanup guidance for slots/publications lifecycle.
docs/server/ongoing-tasks/cdc-sink/postgres/studio-ui.mdx Studio UX guidance for creating/editing tasks.
docs/server/ongoing-tasks/cdc-sink/postgres/examples/category.json Adds PostgreSQL examples subsection to sidebar.
docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-simple-migration.mdx Minimal single-table migration example.
docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-denormalization.mdx Denormalization example with embedded tables.
docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-event-sourcing.mdx Event-sourcing/aggregation example with patches.
docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-complex-nesting.mdx Deep nesting + linked references example.
docs/server/ongoing-tasks/cdc-sink/sql-server/category.json Adds SQL Server subsection to CDC Sink sidebar.
docs/server/ongoing-tasks/cdc-sink/sql-server/overview.mdx SQL Server CDC overview, prerequisites, polling config.
docs/server/ongoing-tasks/cdc-sink/mysql/category.json Adds MySQL subsection to CDC Sink sidebar.
docs/server/ongoing-tasks/cdc-sink/mysql/overview.mdx MySQL/MariaDB binlog-based overview and prerequisites.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +96 to +98
* **Adding or dropping unmapped columns** — transparent. The unmapped column is ignored.
The CDC Sink may restart itself to re-resovle the current schema from `INFORMATION_SCHEMA`,
but normal operations will resume momentarily.
Comment on lines +215 to +216


Comment on lines +158 to +159
in patch scripts via `$row.base_price` (for the current row's values) and
`$old?.base_price` (for the previous row's values on UPDATE events).
Comment on lines +182 to +188
* **PrimaryKeyColumns** — Used to match items within the parent's array for UPDATE and DELETE
* **JoinColumns** — Foreign key referencing the parent's `PrimaryKeyColumns`

The `JoinColumns` must exactly match the parent's `PrimaryKeyColumns`:

| Parent PK | Required JoinColumns | Valid? |
|-----------|---------------------|--------|
Comment on lines +62 to +64
**Document ID generation:** `\{CollectionName\}/\{pk1\}/\{pk2\}/...`
A row with `id = 42` and collection `Orders` becomes document `Orders/42`.
A composite PK `(region, id)` with values `(US, 42)` becomes `Orders/US/42`.
</TabItem>
</Tabs>

With `customer_id = 42`, the document gets `"Customer": "Customers/42"`.
Comment on lines +70 to +75
{`\{
"Id": 1,
"CustomerId": 42,
"Customer": "Customers/42"
\}
`}
Comment on lines +59 to +60
The maximum time the task will remain in fallback mode before reporting an error
is controlled by the `CdcSink.MaxFallbackTimeInSec` configuration key.
Comment on lines +78 to +80
* **Exceeded fallback timeout** — the source was unreachable for longer than
`CdcSink.MaxFallbackTimeInSec`. The task moves to error state after this timeout.
Restore connectivity and re-enable the task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants