diff --git a/docs/server/ongoing-tasks/cdc-sink/_category_.json b/docs/server/ongoing-tasks/cdc-sink/_category_.json new file mode 100644 index 0000000000..9edf48317b --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/_category_.json @@ -0,0 +1 @@ +{"position": 2, "label": "CDC Sink"} diff --git a/docs/server/ongoing-tasks/cdc-sink/api-reference.mdx b/docs/server/ongoing-tasks/cdc-sink/api-reference.mdx new file mode 100644 index 0000000000..c5074617f6 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/api-reference.mdx @@ -0,0 +1,222 @@ +--- +title: "CDC Sink: API Reference" +sidebar_label: API Reference +description: "Documents the Client API operations for creating, updating, querying, toggling, and deleting CDC Sink tasks programmatically." +sidebar_position: 11 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: API Reference + + + +* CDC Sink tasks can be created, updated, and managed programmatically using + the RavenDB Client API. + +* In this page: + * [Add a CDC Sink Task](#add-a-cdc-sink-task) + * [Update a CDC Sink Task](#update-a-cdc-sink-task) + * [Get Task Info](#get-task-info) + * [Toggle Task State](#toggle-task-state) + * [Delete a Task](#delete-a-task) + + + +--- + +## Add a CDC Sink Task + +Use `AddCdcSinkOperation` to create a new CDC Sink task: + + + + +{`var config = new CdcSinkConfiguration +\{ + Name = "OrdersSync", + ConnectionStringName = "MyPostgresConnection", + Tables = new List + \{ + new CdcSinkTableConfig + \{ + CollectionName = "Orders", + SourceTableName = "orders", + PrimaryKeyColumns = new List \{ "id" \}, + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "customer_name", Name = "CustomerName" \}, + new CdcColumnMapping() \{ Column = "total", Name = "Total" \}, + ] + \} + \} +\}; + +var result = await store.Maintenance.SendAsync( + new AddCdcSinkOperation(config)); + +long taskId = result.TaskId; +`} + + + + +`AddCdcSinkOperationResult`: + +| Property | Type | Description | +|----------|------|-------------| +| `TaskId` | `long` | Assigned task ID | +| `RaftCommandIndex` | `long` | Raft index of the command | + +--- + +## Update a CDC Sink Task + +Use `UpdateCdcSinkOperation` to modify an existing task. +Pass the full updated configuration including the `TaskId`: + + + + +{`config.TaskId = taskId; // Must be set +config.Tables.Add(new CdcSinkTableConfig +\{ + CollectionName = "Customers", + SourceTableName = "customers", + PrimaryKeyColumns = new List \{ "id" \}, + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "name", Name = "Name" \}, + new CdcColumnMapping() \{ Column = "email", Name = "Email" \}, + ] +\}); + +await store.Maintenance.SendAsync( + new UpdateCdcSinkOperation(taskId, config)); +`} + + + + + +**PostgreSQL — table changes affect the publication:** + +The replication slot and publication names are fixed at task creation and do not +change when you update the task. However, the **publication** controls which tables +CDC Sink receives changes for, so table changes require database-level updates: + +* **Adding tables** — If CDC Sink has the necessary permissions, it will automatically + run `ALTER PUBLICATION ADD TABLE ;` when the task is updated. If it + does not have permissions, a database administrator must run this manually. +* **Removing tables** — `ALTER PUBLICATION DROP TABLE ;` must be + run manually by a database administrator. CDC Sink does not remove tables from the + publication automatically. + +If the slot and publication were auto-named, a database administrator can look up +the current names from the task error log or from PostgreSQL: + +```sql +SELECT slot_name FROM pg_replication_slots WHERE slot_name LIKE 'rvn_cdc_s_%'; +SELECT pubname FROM pg_publication WHERE pubname LIKE 'rvn_cdc_p_%'; +``` + +See [Initial Setup](./postgres/initial-setup.mdx) for +guidance on slot and publication management. + +Dropped or orphaned slots and publications must be cleaned up manually by the +database administrator. +See [Cleanup and Maintenance](./postgres/cleanup-and-maintenance.mdx). + + +--- + +## Get Task Info + +Use `GetOngoingTaskInfoOperation` to retrieve the current state of a CDC Sink task: + + + + +{`var taskInfo = await store.Maintenance.SendAsync( + new GetOngoingTaskInfoOperation(taskId, OngoingTaskType.CdcSink)); +`} + + + + +--- + +## Toggle Task State + +Pause or resume a CDC Sink task using `ToggleOngoingTaskStateOperation`: + + + + +{`// Pause the task +await store.Maintenance.SendAsync( + new ToggleOngoingTaskStateOperation(taskId, OngoingTaskType.CdcSink, disable: true)); + +// Resume the task +await store.Maintenance.SendAsync( + new ToggleOngoingTaskStateOperation(taskId, OngoingTaskType.CdcSink, disable: false)); +`} + + + + + +**PostgreSQL:** Pausing a CDC Sink task stops the replication slot from being consumed. +PostgreSQL retains WAL segments for unconsumed slots, so pausing for an extended period +causes WAL to accumulate on disk. Monitor disk usage if a task is paused for more than +a short time. +See [Monitoring PostgreSQL](./postgres/monitoring-postgres.mdx). + +**SQL Server:** Pausing a CDC Sink task for an extended period (days) may cause SQL Server +to trim the CDC change tables, removing rows that have not yet been consumed. A short +pause is safe. If the task is paused long enough that trimming occurs, CDC Sink will +report an error on resume — the task may need to be recreated rather than simply resumed. + + +--- + +## Delete a Task + +Use `DeleteOngoingTaskOperation` to delete a CDC Sink task: + + + + +{`await store.Maintenance.SendAsync( + new DeleteOngoingTaskOperation(taskId, OngoingTaskType.CdcSink)); +`} + + + + + +Deleting the task in RavenDB does **not** drop the replication slot or publication +in the source database. These must be cleaned up manually by the database administrator. +See [Cleanup and Maintenance](./postgres/cleanup-and-maintenance.mdx). + + +--- + +## Related Articles + +### CDC Sink + +- [Configuration Reference](./configuration-reference.mdx) +- [Overview](./overview.mdx) + +### Client API + +- [What are Operations](../../../client-api/operations/what-are-operations) +- [Ongoing Task Operations](../../../client-api/operations/maintenance/ongoing-tasks/ongoing-task-operations) diff --git a/docs/server/ongoing-tasks/cdc-sink/attachment-handling.mdx b/docs/server/ongoing-tasks/cdc-sink/attachment-handling.mdx new file mode 100644 index 0000000000..208e03d497 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/attachment-handling.mdx @@ -0,0 +1,199 @@ +--- +title: "CDC Sink: Attachment Handling" +sidebar_label: Attachment Handling +description: "Explains how to store binary or text SQL columns as RavenDB attachments using CdcColumnType.Attachment on root and embedded table configurations." +sidebar_position: 9 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Attachment Handling + + + +* SQL columns — binary (`bytea`, `varbinary`, etc.) or text (`text`, `nvarchar`, etc.) + — can be stored as RavenDB **attachments** by setting `Type = CdcColumnType.Attachment` + on the column mapping entry. + +* This applies to both root tables and embedded tables. + +* In this page: + * [Root Table Attachments](#root-table-attachments) + * [Embedded Table Attachments](#embedded-table-attachments) + * [Attachment Naming](#attachment-naming) + * [Attachment Lifecycle](#attachment-lifecycle) + + + +--- + +## Root Table Attachments + +Given a SQL table with a binary column: + + + + +{`CREATE TABLE files ( + id SERIAL PRIMARY KEY, + filename TEXT NOT NULL, + mime_type TEXT NOT NULL, + content BYTEA NOT NULL -- binary attachment +); +`} + + + + +Set `Type = CdcColumnType.Attachment` on the `content` column entry: + + + + +{`new CdcSinkTableConfig +\{ + CollectionName = "Files", + SourceTableName = "files", + PrimaryKeyColumns = ["id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "filename", Name = "Filename" \}, + new CdcColumnMapping() \{ Column = "mime_type", Name = "MimeType" \}, + new CdcColumnMapping() + \{ + Column = "content", Name = "file", + Type = CdcColumnType.Attachment + \}, + ] +\} +`} + + + + +The `content` column is stored as an attachment named `"file"` on the document. +The `Name` value becomes the attachment name. The content type is determined by the +SQL column type: `application/octet-stream` for binary columns, `text/plain` for +text columns. + +--- + +## Embedded Table Attachments + +Binary columns on embedded tables are stored as attachments on the **parent** document. +The attachment name is automatically prefixed to ensure uniqueness. + +Given a parent `products` table and a child `photos` table: + + + + +{`CREATE TABLE products ( + product_id SERIAL PRIMARY KEY, + name TEXT NOT NULL +); + +CREATE TABLE photos ( + product_id INT NOT NULL REFERENCES products(product_id), + photo_num INT NOT NULL, + caption TEXT, + thumbnail BYTEA, -- binary attachment + PRIMARY KEY (product_id, photo_num) +); +`} + + + + +Configure the embedded table with `Type = CdcColumnType.Attachment` on the binary column: + + + + +{`new CdcSinkEmbeddedTableConfig +\{ + SourceTableName = "photos", + PropertyName = "Photos", + PrimaryKeyColumns = ["photo_num"], + JoinColumns = ["product_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "photo_num", Name = "PhotoNum" \}, + new CdcColumnMapping() \{ Column = "caption", Name = "Caption" \}, + new CdcColumnMapping() + \{ + Column = "thumbnail", Name = "thumb", + Type = CdcColumnType.Attachment + \}, + ] +\} +`} + + + + +A photo with `photo_num = 1` creates an attachment named `"Photos/1/thumb"` on the +parent document. The prefix `"Photos/1/"` is generated from the `PropertyName` and +the primary key value. + +--- + +## Attachment Naming + +**Root table attachments:** + +The attachment name is exactly the `Name` value on the mapping entry with +`Type = CdcColumnType.Attachment`. + +``` +Column = "content", Name = "file", Type = CdcColumnType.Attachment +→ Attachment name: "file" +``` + +**Embedded table attachments:** + +The attachment name is prefixed with `{PropertyName}/{pkValue}/`: + +``` +PropertyName = "Photos" +PrimaryKeyColumns = ["photo_num"] → photo_num = 1 +Column = "thumbnail", Name = "thumb", Type = CdcColumnType.Attachment +→ Attachment name: "Photos/1/thumb" +``` + +For composite primary keys, all key values are joined: + +``` +PrimaryKeyColumns = ["date", "seq"] → date='2024-01', seq=3 +→ Attachment name: "Photos/2024-01/3/thumb" +``` + +--- + +## Attachment Lifecycle + +* **INSERT** — Attachment is created on the document +* **UPDATE** — Attachment is replaced with the new binary data +* **DELETE (embedded item)** — All attachments for that item are automatically removed + from the parent document +* **DELETE (root document)** — Document and all its attachments are deleted + +--- + +## Related Articles + +### CDC Sink + +- [Column Mapping](./column-mapping.mdx) +- [Embedded Tables](./embedded-tables.mdx) +- [Configuration Reference](./configuration-reference.mdx) + +### Document Extensions + +- [Attachments Overview](../../../document-extensions/attachments/overview) diff --git a/docs/server/ongoing-tasks/cdc-sink/column-mapping.mdx b/docs/server/ongoing-tasks/cdc-sink/column-mapping.mdx new file mode 100644 index 0000000000..33f36503d8 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/column-mapping.mdx @@ -0,0 +1,238 @@ +--- +title: "CDC Sink: Column Mapping" +sidebar_label: Column Mapping +description: "Explains how to control which SQL columns appear in RavenDB documents and under what property names using the Columns list of CdcColumnMapping entries." +sidebar_position: 5 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Column Mapping + + + +* Column mapping controls which SQL columns appear in the RavenDB document, + under what property names, and how they are stored. It uses a `Columns` list + of `CdcColumnMapping` objects (not a dictionary). + +* In this page: + * [Mapping Columns to Properties](#mapping-columns-to-properties) + * [Column Types](#column-types) + * [Unmapped Columns](#unmapped-columns) + * [JSON Columns](#json-columns) + * [Validation Rules](#validation-rules) + * [Schema (Source Table Schema)](#schema-source-table-schema) + + + +--- + +## Mapping Columns to Properties + +`Columns` is a `List` where each entry maps a SQL column +to a RavenDB document property: + + + + +{`Columns = +[ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "customer_name", Name = "CustomerName" \}, + new CdcColumnMapping() \{ Column = "order_date", Name = "OrderDate" \}, + new CdcColumnMapping() \{ Column = "total_amount", Name = "TotalAmount" \}, +] +`} + + + + +**`Column`:** SQL column name (case-insensitive match against the column names in CDC events) +**`Name`:** Property name in the RavenDB document + + +The primary key column(s) do not need to be mapped. When included in `Columns`, +they become a regular document property. When omitted, the PK values are still used +to build the document ID — they just won't appear as a named property. + +Including the PK in the mapping is generally useful so the document carries its own +identifier, but it is not required. + + +--- + +## Column Types + +Each `CdcColumnMapping` entry has an optional `Type` property of type `CdcColumnType` +that controls how the SQL value is stored in RavenDB: + + + + +{`Columns = +[ + new CdcColumnMapping() \{ Column = "order_id", Name = "OrderId" \}, + // Parsed as a native JSON value + new CdcColumnMapping() + \{ + Column = "metadata", Name = "Metadata", + Type = CdcColumnType.Json + \}, + // Stored as an attachment + new CdcColumnMapping() + \{ + Column = "receipt", Name = "receipt.pdf", + Type = CdcColumnType.Attachment + \}, +] +`} + + + + +**`CdcColumnType.Default`** — Standard SQL-to-JSON type conversion. This is the default when `Type` is omitted. +Type conversions are source-database-specific. For PostgreSQL: + +* `smallint` / `integer` / `oid` → `long` +* `bigint` → `long` +* `real` → `float` +* `double precision` → `double` +* `numeric` / `decimal` → `decimal` +* `boolean` → `bool` +* `date` → `DateOnly` +* `timestamp` / `timestamptz` → `DateTime` (UTC) +* `uuid` → `string` +* `varchar` / `text` / `char` → `string` +* `json` / `jsonb` → `string` (raw JSON text, not parsed) +* SQL arrays → JSON arrays +* `vector` (pgvector) → `float[]` + +See [Patching: $row Column Types](./patching.mdx#row-column-types) for the full reference table. + +**`CdcColumnType.Json`** — Parses the string value as a native JSON value (object, array, +string, number, boolean, etc.) in the document. Use for PostgreSQL `json`/`jsonb` or +SQL Server `nvarchar(max)` with JSON content. + +**`CdcColumnType.Attachment`** — Stores the raw column value as a RavenDB attachment. +`byte[]` → binary (`application/octet-stream`), `string` → UTF-8 text (`text/plain`), +`float[]`/`double[]` → raw vector data. The `Name` field becomes the attachment name. + +--- + +## Unmapped Columns + +Columns not listed in `Columns` are **not stored** in the document, but they +are available in patch scripts via `$row`. + +This allows you to use data for computations without permanently storing raw SQL values: + + + + +{`Columns = +[ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "name", Name = "Name" \}, + // base_price and tax_rate are NOT mapped — won't appear in document +], +Patch = "this.FinalPrice = $row.base_price * (1 + $row.tax_rate);" +`} + + + + +In this example, `base_price` and `tax_rate` are available during the patch but +not stored as document properties. Only the computed `FinalPrice` is stored. + + +**Naming context:** +Property names (the `Name` values in `Columns`) become properties on the RavenDB +document — accessible as `this.FinalPrice` inside a patch script. + +Column names (the `Column` values in `Columns`, plus any unmapped columns) are accessible +in patch scripts via `$row.base_price` (for the current row's values) and +`$old?.base_price` (for the previous row's values on UPDATE events). + + +--- + +## JSON Columns + +SQL columns that contain JSON data (e.g., PostgreSQL `jsonb`, SQL Server `nvarchar` +with JSON content) can be parsed into native JSON objects or arrays using +`CdcColumnType.Json`: + + + + +{`Columns = +[ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "name", Name = "Name" \}, + new CdcColumnMapping() + \{ + Column = "metadata", Name = "Metadata", + Type = CdcColumnType.Json + \}, +] +`} + + + + +Without `CdcColumnType.Json`, the JSON column value would be stored as a raw string. +With it, the value is parsed and stored as a nested object or array in the document. + +--- + +## Validation Rules + +The following rules apply to both root table and embedded table column mappings: + +- Each `Column` name must be unique within a table (no duplicate SQL column entries). +- Each `Name` must be unique within a table (no duplicate property or attachment names). +- Both `Column` and `Name` are required and must be non-empty strings. +- At least one entry is required — an empty `Columns` list is a validation error. + +--- + +## Schema (Source Table Schema) + +`SourceTableSchema` specifies the SQL schema containing the table. It defaults to +`"public"` for PostgreSQL if omitted. + + + + +{`new CdcSinkTableConfig +\{ + SourceTableSchema = "sales", // Table is in the "sales" schema + SourceTableName = "orders", + // ... +\} +`} + + + + +For most PostgreSQL setups using the default `public` schema, this can be omitted. + +--- + +## Related Articles + +### CDC Sink + +- [Schema Design](./schema-design.mdx) +- [Patching](./patching.mdx) +- [Attachment Handling](./attachment-handling.mdx) +- [Configuration Reference](./configuration-reference.mdx) + +### PostgreSQL + +- [Patching: $row Column Types](./patching.mdx#row-column-types) diff --git a/docs/server/ongoing-tasks/cdc-sink/configuration-reference.mdx b/docs/server/ongoing-tasks/cdc-sink/configuration-reference.mdx new file mode 100644 index 0000000000..ac255fa74f --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/configuration-reference.mdx @@ -0,0 +1,229 @@ +--- +title: "CDC Sink: Configuration Reference" +sidebar_label: Configuration Reference +description: "Documents all configuration classes used to define a CDC Sink task, including CdcSinkConfiguration, table configs, CdcColumnMapping, and related types." +sidebar_position: 10 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Configuration Reference + + + +* This page documents all configuration classes used to define a CDC Sink task. + +* In this page: + * [CdcSinkConfiguration](#cdcsinkconfiguration) + * [CdcSinkPostgresSettings](#cdcsinkpostgressettings) + * [CdcSinkTableConfig](#cdcsinktableconfig) + * [CdcSinkEmbeddedTableConfig](#cdcsinkembeddedtableconfig) + * [CdcSinkLinkedTableConfig](#cdcsinklinkedtableconfig) + * [CdcColumnMapping](#cdccolumnmapping) + * [CdcColumnType](#cdccolumntype) + * [CdcSinkOnDeleteConfig](#cdcsinkondeleteconfig) + * [CdcSinkRelationType](#cdcsinkrelationtype) + * [REST API Endpoints](#rest-api-endpoints) + + + +--- + +## CdcSinkConfiguration + +The top-level configuration object for a CDC Sink task. + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `Name` | `string` | ✓ | Unique task name | +| `ConnectionStringName` | `string` | ✓ | Name of the SQL connection string (uses the existing `SqlConnectionString` type — the same as SQL ETL). The connection string's `FactoryName` determines the source database engine. | +| `Tables` | `List` | ✓ | Root table configurations (at least one required) | +| `Postgres` | `CdcSinkPostgresSettings` | | PostgreSQL-specific settings (slot and publication names) | +| `SkipInitialLoad` | `bool` | | When `true`, skip the initial full-table scan and start streaming CDC changes immediately. Only applies on first startup — has no effect once the initial load has completed. Default: `false` | +| `Disabled` | `bool` | | Pause the task without deleting it. Default: `false` | +| `MentorNode` | `string` | | Preferred cluster node for execution | +| `PinToMentorNode` | `bool` | | Pin the task to the mentor node. Default: `false` | +| `TaskId` | `long` | | Set by the server on creation | + +**Supported `FactoryName` values (set on the connection string):** + +| FactoryName | Database Engine | +|-------------|-----------------| +| `Npgsql` | PostgreSQL | +| `System.Data.SqlClient` or `Microsoft.Data.SqlClient` | SQL Server | +| `MySql.Data.MySqlClient` or `MySqlConnector.MySqlConnectorFactory` | MySQL / MariaDB | + +--- + +## CdcSinkPostgresSettings + +PostgreSQL-specific settings. Assigned to `CdcSinkConfiguration.Postgres`. +Leave `null` for non-PostgreSQL connections. + +| Property | Type | Description | +|----------|------|-------------| +| `SlotName` | `string` | Name of the PostgreSQL logical replication slot. If omitted on creation, auto-generated as `rvn_cdc_s_{guid}` on task creation. Immutable once set. Max 63 characters, alphanumeric and underscores only. | +| `PublicationName` | `string` | Name of the PostgreSQL publication. Same auto-generated naming and immutability rules as `SlotName`. | + +Setting these explicitly is useful when: +- A database administrator pre-creates the slot and publication with human-readable names +- Migrating from a previous CDC Sink task and reusing an existing slot +- Running multiple environments (dev/staging/prod) with predictable names + +If tables are missing from the publication, CDC Sink attempts `ALTER PUBLICATION ... ADD TABLE` +automatically on startup (requires sufficient permissions). + +See [Initial Setup](./postgres/initial-setup.mdx) for details. + + +For embedded tables where the join columns are not part of the primary key, the +PostgreSQL table must have `REPLICA IDENTITY` configured so that DELETE events include +the join column values. See [REPLICA IDENTITY](./postgres/replica-identity.mdx). + + +--- + +## CdcSinkTableConfig + +Configures a root table — one SQL table mapped to one RavenDB collection. + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `CollectionName` | `string` | ✓ | RavenDB collection name (e.g., `"Orders"`) | +| `SourceTableName` | `string` | ✓ | SQL table name (e.g., `"orders"`) | +| `SourceTableSchema` | `string` | | SQL schema name. Default: `"public"` | +| `PrimaryKeyColumns` | `List` | ✓ | SQL columns used for document ID generation | +| `Columns` | `List` | ✓ | Column mappings — each entry defines a SQL column, its document property name, and how to store it | +| `Patch` | `string` | | JavaScript patch for INSERT and UPDATE | +| `OnDelete` | `CdcSinkOnDeleteConfig` | | Delete behavior. Default: delete document | +| `EmbeddedTables` | `List` | | Nested table configurations | +| `LinkedTables` | `List` | | Foreign key reference configurations | +| `Disabled` | `bool` | | Skip this table. Default: `false` | + +--- + +## CdcSinkEmbeddedTableConfig + +Configures a table whose rows are embedded as nested objects within a parent document. + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `SourceTableName` | `string` | ✓ | SQL table name | +| `SourceTableSchema` | `string` | | SQL schema name. Default: `"public"` | +| `PropertyName` | `string` | ✓ | Property name in the parent document (e.g., `"Lines"`) | +| `Type` | `CdcSinkRelationType` | ✓ | `Array`, `Map`, or `Value` | +| `JoinColumns` | `List` | ✓ | FK columns referencing parent's `PrimaryKeyColumns` | +| `PrimaryKeyColumns` | `List` | ✓ | PK columns for matching items on UPDATE/DELETE | +| `Columns` | `List` | ✓ | Column mappings — each entry defines a SQL column, its document property name, and how to store it | +| `Patch` | `string` | | JavaScript patch on **parent** document for INSERT/UPDATE | +| `OnDelete` | `CdcSinkOnDeleteConfig` | | Delete behavior for embedded items | +| `CaseSensitiveKeys` | `bool` | | Case-sensitive PK matching. Default: `false` | +| `EmbeddedTables` | `List` | | Nested embedded tables | +| `LinkedTables` | `List` | | Linked tables within embedded items | + +--- + +## CdcSinkLinkedTableConfig + +Configures a foreign key reference that becomes a document ID in the parent document. + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `SourceTableName` | `string` | ✓ | SQL table name of the referenced table | +| `SourceTableSchema` | `string` | | SQL schema name. Default: `"public"` | +| `PropertyName` | `string` | ✓ | Property name in the parent document (e.g., `"Customer"`) | +| `LinkedCollectionName` | `string` | ✓ | Target RavenDB collection for ID generation (e.g., `"Customers"`) | +| `JoinColumns` | `List` | ✓ | FK columns used to build the referenced document ID | + +--- + +## CdcColumnMapping + +A single column mapping entry within a `Columns` list. + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `Column` | `string` | ✓ | SQL column name (case-insensitive) | +| `Name` | `string` | ✓ | Document property name (or attachment name when `Type = Attachment`) | +| `Type` | `CdcColumnType` | | How to store the value. Default: `Default` | + +--- + +## CdcColumnType + +Controls how a SQL column value is stored in RavenDB. + +| Value | Behavior | +|-------|----------| +| `Default` | Standard type conversion: int→long, decimal→double, date→DateOnly, timestamp→DateTime, uuid→string, varchar/text→string. SQL arrays→JSON arrays. JSON/JSONB stored as plain string. | +| `Json` | Parses the string value as a native JSON value (object, array, string, number, boolean, etc.) in the document. Use for PostgreSQL `json`/`jsonb` or SQL Server `nvarchar(max)` with JSON content. | +| `Attachment` | Stores the raw column value as a RavenDB attachment. `byte[]`→binary (`application/octet-stream`), `string`→UTF-8 text (`text/plain`), `float[]`/`double[]`→raw vector data. The `Name` field becomes the attachment name. | + +--- + +## CdcSinkOnDeleteConfig + +Controls how DELETE events are handled for a table or embedded table. + +| Property | Type | Default | Description | +|----------|------|---------|-------------| +| `Patch` | `string` | null | JavaScript patch that runs when a DELETE event arrives | +| `IgnoreDeletes` | `bool` | `false` | When `true`, skip the delete — document/item is kept | + +**Available patch variables for OnDelete:** + +* `this` — the document (root or parent for embedded) +* `$row` — all SQL columns from the DELETE event +* `$old` — the embedded item's last known state (for embedded tables) + +**Behavior matrix:** + +| IgnoreDeletes | Patch | Result | +|---------------|-------|--------| +| `false` | null | Normal delete (default) | +| `false` | set | Patch runs, then delete proceeds | +| `true` | null | DELETE discarded silently | +| `true` | set | Patch runs, delete skipped | + +--- + +## CdcSinkRelationType + +Specifies the structure of embedded data in the document. + +| Value | Description | +|-------|-------------| +| `Array` | One-to-many: stored as a JSON array. Items matched by PK for UPDATE/DELETE | +| `Map` | One-to-many: stored as a JSON object keyed by PK value(s) | +| `Value` | Many-to-one: stored as a single embedded object | + +--- + +## REST API Endpoints + +| Method | Path | Auth | Description | +|--------|------|------|-------------| +| PUT | `/databases/{db}/admin/cdc-sink` | DatabaseAdmin | Create CDC Sink task | +| PUT | `/databases/{db}/admin/cdc-sink?id={taskId}` | DatabaseAdmin | Update CDC Sink task | +| POST | `/databases/{db}/admin/cdc-sink/test` | DatabaseAdmin | Test a patch script | +| POST | `/databases/{db}/admin/cdc-sink/verify` | DatabaseAdmin | Verify source database connectivity and CDC setup | +| GET | `/databases/{db}/cdc-sink/performance` | ValidUser | Get performance stats | +| GET | `/databases/{db}/cdc-sink/performance/live` | ValidUser | WebSocket for live performance stats | + +--- + +## Related Articles + +### CDC Sink + +- [Schema Design](./schema-design.mdx) +- [Embedded Tables](./embedded-tables.mdx) +- [Linked Tables](./linked-tables.mdx) +- [Patching](./patching.mdx) +- [Delete Strategies](./delete-strategies.mdx) +- [API Reference](./api-reference.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/delete-strategies.mdx b/docs/server/ongoing-tasks/cdc-sink/delete-strategies.mdx new file mode 100644 index 0000000000..c8beb3d9a2 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/delete-strategies.mdx @@ -0,0 +1,290 @@ +--- +title: "CDC Sink: Delete Strategies" +sidebar_label: Delete Strategies +description: "Documents configurable behavior for DELETE events in CDC Sink, including archive, audit trail, and silent ignore patterns." +sidebar_position: 7 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Delete Strategies + + + +* CDC Sink provides configurable behavior for DELETE events through the + `CdcSinkOnDeleteConfig` object on both root and embedded table configurations. + +* By default, a DELETE event deletes the corresponding RavenDB document or removes + the embedded item. The `OnDelete` configuration changes this. + +* In this page: + * [Default Behavior](#default-behavior) + * [OnDelete Configuration](#ondelete-configuration) + * [Pattern: Archive](#pattern-archive) + * [Pattern: Audit Trail](#pattern-audit-trail) + * [Pattern: Silent Ignore](#pattern-silent-ignore) + * [OnDelete for Embedded Tables](#ondelete-for-embedded-tables) + * [Behavior Summary](#behavior-summary) + * [DELETE Routing and REPLICA IDENTITY](#delete-routing-and-replica-identity) + + + +--- + +## Default Behavior + +When `OnDelete` is `null` (or not set): + +* **Root table DELETE** — The corresponding RavenDB document is deleted +* **Embedded table DELETE (Array/Map)** — The item is removed from the parent document's array or map +* **Embedded table DELETE (Value)** — The `Value`-type property on the parent document is set to `null` + +--- + +## OnDelete Configuration + +`CdcSinkOnDeleteConfig` has two properties: + + + + +{`public class CdcSinkOnDeleteConfig +\{ + // JavaScript patch that runs when a DELETE event arrives + // Available variables: this, $row, $old + public string Patch \{ get; set; \} + + // When true, the delete is not applied — document/item is kept + public bool IgnoreDeletes \{ get; set; \} +\} +`} + + + + +* If `Patch` is set, it runs **before** the delete decision is made +* If `IgnoreDeletes = true`, the deletion is skipped after the patch runs +* If `IgnoreDeletes = false` (default), the deletion proceeds after the patch runs + + +When `IgnoreDeletes = true`, CDC Sink skips the delete but still **saves any changes +made to `this`** in the patch. This means a patch can modify the document (e.g., +set an `Archived` flag) and those changes are written to RavenDB even though the +document is not deleted. + + +--- + +## Pattern: Archive + +Keep the document in RavenDB but mark it as deleted. The patch runs to mark it, +and `IgnoreDeletes = true` prevents the actual deletion: + + + + +{`OnDelete = new CdcSinkOnDeleteConfig +\{ + IgnoreDeletes = true, + Patch = @" + this.Archived = true; + this.ArchivedAt = new Date().toISOString(); + " +\} +`} + + + + +The document remains in RavenDB with `Archived = true`. Queries can filter on this +field to exclude archived records. + +--- + +## Pattern: Audit Trail (Root Document) + +When a root document is deleted, write an audit record to a separate RavenDB +collection using `put()` before the deletion proceeds. + +The document is deleted after the patch runs, so any writes to `this` are lost. +Use `put()` to create a document that survives the deletion: + + + + +{`OnDelete = new CdcSinkOnDeleteConfig +\{ + // IgnoreDeletes defaults to false — delete proceeds after patch + Patch = @" + // Create a permanent audit record in a separate collection + put('DeletedOrders/' + this.Id, \{ + OriginalId: this.Id, + Customer: this.Customer, + Total: this.Total, + DeletedAt: new Date().toISOString() + \}); + " +\} +`} + + + + +The patch creates a document in `DeletedOrders/` before `this` is deleted. +The audit record persists permanently. + +--- + +## Pattern: Silent Ignore + +Discard DELETE events without running any patch. Use this for append-only data +where deletes should never result in document removal: + + + + +{`OnDelete = new CdcSinkOnDeleteConfig +\{ + IgnoreDeletes = true + // No patch — DELETE events are silently discarded +\} +`} + + + + +--- + +## OnDelete for Embedded Tables + +The same `OnDelete` configuration works on embedded tables. For embedded tables: + +* `Patch` runs on the **parent document** (not the embedded item) +* `$old` contains the embedded item's last known state before deletion +* `IgnoreDeletes = true` prevents the item from being removed from the array/map + +**Example: Keep deleted items in an audit array** + +Rather than removing a deleted line item from the array, move it to a separate +`DeletedLines` property: + + + + +{`new CdcSinkEmbeddedTableConfig +\{ + SourceTableName = "order_lines", + PropertyName = "Lines", + // ... + OnDelete = new CdcSinkOnDeleteConfig + \{ + IgnoreDeletes = true, + Patch = @" + // Remove from active Lines + this.Lines = (this.Lines || []) + .filter(l => l.LineId !== $old?.LineId); + + // Add to DeletedLines audit array + this.DeletedLines = this.DeletedLines || []; + this.DeletedLines.push(\{ + LineId: $old?.LineId, + Product: $old?.Product, + Quantity: $old?.Quantity, + DeletedAt: new Date().toISOString() + \}); + " + \} +\} +`} + + + + +With `IgnoreDeletes = true`, CDC Sink does not automatically remove the item — +the patch takes full control of both the `Lines` array and the `DeletedLines` audit trail. + +**Example: Run a patch but still remove the item** + +To run some logic on DELETE while still removing the item from the array, +use `IgnoreDeletes = false` (default) — CDC Sink handles the removal, and the patch +runs before it: + + + + +{`new CdcSinkEmbeddedTableConfig +\{ + SourceTableName = "order_lines", + PropertyName = "Lines", + // ... + OnDelete = new CdcSinkOnDeleteConfig + \{ + // IgnoreDeletes = false (default) — CDC Sink removes the item after patch + Patch = @" + // Log total for the line being removed + this.RemovedTotal = (this.RemovedTotal || 0) + + ($old?.UnitPrice || 0) * ($old?.Qty || 0); + " + \} +\} +`} + + + + +The `Lines` item is removed by CDC Sink. The patch only needs to handle the +side-effect logic. + +--- + +## Behavior Summary + +| IgnoreDeletes | Patch | Behavior | +|---------------|-------|----------| +| `false` | null | Normal delete (default) | +| `false` | set | Patch runs, then delete proceeds | +| `true` | null | DELETE event discarded silently | +| `true` | set | Patch runs, then delete is skipped | + +--- + +## DELETE Routing and REPLICA IDENTITY + +For CDC Sink to route a DELETE event to the correct document or embedded item, +the source database must include the necessary column values in the DELETE event. + +For embedded tables where the join column is not in the primary key, the source +database may need additional configuration. + + +The REPLICA IDENTITY requirement described below is **PostgreSQL-specific**. +Other databases may have different requirements or may not need any extra +configuration for DELETE routing. + + +See: + +* [REPLICA IDENTITY](./postgres/replica-identity.mdx) (PostgreSQL) + +**Skipping REPLICA IDENTITY requirements:** Set `OnDelete.IgnoreDeletes = true` +to discard DELETE events for an embedded table entirely. This skips the REPLICA +IDENTITY check, since delete routing is no longer needed. + +--- + +## Related Articles + +### CDC Sink + +- [Patching](./patching.mdx) +- [Embedded Tables](./embedded-tables.mdx) +- [Configuration Reference](./configuration-reference.mdx) + +### PostgreSQL + +- [REPLICA IDENTITY](./postgres/replica-identity.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/embedded-tables.mdx b/docs/server/ongoing-tasks/cdc-sink/embedded-tables.mdx new file mode 100644 index 0000000000..dde63a2bef --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/embedded-tables.mdx @@ -0,0 +1,277 @@ +--- +title: "CDC Sink: Embedded Tables" +sidebar_label: Embedded Tables +description: "Covers configuration options, nesting constraints, and how embedded items are updated and deleted in CDC Sink." +sidebar_position: 3 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Embedded Tables + + + +* Embedded tables allow CDC Sink to nest SQL table data inside a parent document as + arrays, maps, or single objects. + +* This page covers configuration options, nesting constraints, and how embedded items + are updated and deleted. + +* In this page: + * [Basic Configuration](#basic-configuration) + * [Join Columns and Primary Key Interaction](#join-columns-and-primary-key-interaction) + * [Matching Items on Update and Delete](#matching-items-on-update-and-delete) + * [Deep Nesting](#deep-nesting) + * [Attachments on Embedded Items](#attachments-on-embedded-items) + + + +--- + +## Basic Configuration + +`CdcSinkEmbeddedTableConfig` is placed inside a root table's `EmbeddedTables` list: + + + + +{`new CdcSinkEmbeddedTableConfig +\{ + SourceTableSchema = "public", // SQL schema (optional) + SourceTableName = "order_lines", // SQL table name + PropertyName = "Lines", // Property name in parent document + Type = CdcSinkRelationType.Array, // Array, Map, or Value + JoinColumns = ["order_id"], // FK to parent's PrimaryKeyColumns + PrimaryKeyColumns = ["line_id"], // Used to match items on UPDATE/DELETE + Columns = + [ + new CdcColumnMapping() \{ Column = "line_id", Name = "LineId" \}, + new CdcColumnMapping() \{ Column = "product", Name = "Product" \}, + new CdcColumnMapping() \{ Column = "quantity", Name = "Quantity" \}, + ] +\} +`} + + + + +--- + +## Join Columns and Primary Key Interaction + +### Purpose of JoinColumns + +`JoinColumns` specifies the foreign key columns in the embedded table that reference +the parent's primary key. CDC Sink uses these columns to route each row to the correct +parent document. + +The `JoinColumns` values must exactly match the parent's `PrimaryKeyColumns`: + + + + +{`// Parent root table: +PrimaryKeyColumns = ["order_id"] + +// Embedded table - correct: +JoinColumns = ["order_id"] // References parent's PK column + +// Embedded table - INCORRECT: +JoinColumns = ["customer_id"] // Does not reference the parent's PK +`} + + + + +### DELETE Events + +For DELETE events, the source database must include the join column values so CDC Sink +can route the delete to the correct parent document. + +By default, many databases only include primary key columns in DELETE events. If the +join column is _not_ in the embedded table's primary key, additional source database +configuration is required. + +**PostgreSQL:** See [REPLICA IDENTITY](./postgres/replica-identity.mdx) +for how CDC Sink handles this automatically when it has sufficient permissions, or how +a DBA can configure it manually. + +### Avoiding REPLICA IDENTITY Requirements + +The cleanest solution is to include the join column in the embedded table's primary key: + + + + +{`-- SQL schema: +CREATE TABLE order_lines ( + order_id INT NOT NULL REFERENCES orders(id), + line_id INT NOT NULL, + product VARCHAR(200), + PRIMARY KEY (order_id, line_id) -- order_id in PK means DELETE events include it +); +`} + + + + + + + +{`// Configuration: +PrimaryKeyColumns = ["order_id", "line_id"] +JoinColumns = ["order_id"] +`} + + + + +With `order_id` in the primary key, DELETE events include it by default and no additional +source database configuration is needed. + +Alternatively, set `OnDelete.IgnoreDeletes = true` to skip delete routing entirely +if deletes on that embedded table don't need to be processed. +See [Delete Strategies](./delete-strategies.mdx). + +--- + +## Matching Items on Update and Delete + +When an UPDATE or DELETE arrives for an embedded row, CDC Sink must find the correct +item within the parent document's array or map. + +Items are matched by their full `PrimaryKeyColumns` composite key: + +* **INSERT** — New item appended to array / added to map +* **UPDATE** — Item found by PK match; mapped columns overwritten +* **DELETE** — Item found by PK match; removed from array/map (or OnDelete behavior applied) + +**Composite PKs** work the same way — all PK columns must match: + + + + +{`PrimaryKeyColumns = ["invoice_date", "invoice_seq"] + +// UPDATE event for (invoice_date='2024-01-15', invoice_seq=3) +// → Finds and updates the item where both columns match +`} + + + + +**Case sensitivity:** By default, PK matching is case-insensitive. Set +`CaseSensitiveKeys = true` on `CdcSinkEmbeddedTableConfig` if your keys are +case-sensitive. + +--- + +## Deep Nesting + +Embedded tables can contain their own `EmbeddedTables`, creating hierarchies +with multiple levels. + +**Key constraint:** Every descendant table must carry the **root table's primary key** +as a denormalized column. This is required because CDC Sink routes each row to its +root document in a single pass. + +**Example: Company → Departments → Employees** + +The `employees` table must have `company_id` (the root PK) even though it only +directly joins to `departments`: + + + + +{`CREATE TABLE employees ( + company_id INT NOT NULL, -- Denormalized root PK + dept_id INT NOT NULL, + emp_id INT NOT NULL, + PRIMARY KEY (company_id, dept_id, emp_id) +); +`} + + + + + + + +{`// Configuration for the employees embedded table: +JoinColumns = ["company_id", "dept_id"] // Root PK + parent PK +PrimaryKeyColumns = ["emp_id"] +`} + + + + +**Why is the root PK required?** + +When a CDC Sink event arrives for an `employees` row, CDC Sink needs to: + +1. Find the root document: `Companies/\{company_id\}` +2. Navigate to the correct `Departments` array item: `Departments.find(d => d.dept_id == dept_id)` +3. Add or update the `Employees` array item: `Employees.find(e => e.emp_id == emp_id)` + +Without `company_id` in the event, CDC Sink cannot identify the root document +without an additional lookup, which is not supported. + +--- + +## Attachments on Embedded Items + +Binary columns from embedded tables can be stored as RavenDB attachments by adding +them to `Columns` with `Type = CdcColumnType.Attachment`. + + + + +{`new CdcSinkEmbeddedTableConfig +\{ + SourceTableName = "photos", + PropertyName = "Photos", + PrimaryKeyColumns = ["photo_num"], + JoinColumns = ["product_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "photo_num", Name = "PhotoNum" \}, + new CdcColumnMapping() \{ Column = "caption", Name = "Caption" \}, + new CdcColumnMapping() + \{ + Column = "thumbnail", Name = "thumb", + Type = CdcColumnType.Attachment + \}, + ] +\} +`} + + + + +A photo with `photo_num = 1` creates an attachment named `"Photos/1/thumb"` on the +parent document. The attachment name is prefixed with the embedded path and PK to +ensure uniqueness within the document. + +When the embedded item is deleted, its attachments are automatically removed. + +--- + +## Related Articles + +### CDC Sink + +- [Schema Design](./schema-design.mdx) +- [Linked Tables](./linked-tables.mdx) +- [Delete Strategies](./delete-strategies.mdx) +- [Attachment Handling](./attachment-handling.mdx) +- [Configuration Reference](./configuration-reference.mdx) + +### PostgreSQL + +- [REPLICA IDENTITY](./postgres/replica-identity.mdx) +- [REPLICA IDENTITY Manual Setup](./postgres/replica-identity-manual-setup.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/failover-and-consistency.mdx b/docs/server/ongoing-tasks/cdc-sink/failover-and-consistency.mdx new file mode 100644 index 0000000000..5298f4d3a4 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/failover-and-consistency.mdx @@ -0,0 +1,172 @@ +--- +title: "CDC Sink: Failover and Consistency" +sidebar_label: Failover and Consistency +description: "Explains how CDC Sink handles node failover, what consistency guarantees it provides, and why patches must be designed to be idempotent." +sidebar_position: 13 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Failover and Consistency + + + +* This page explains how CDC Sink handles node failover, what consistency guarantees + it provides, and why patches must be designed to be idempotent. + +* In this page: + * [State Storage](#state-storage) + * [Failover Behavior](#failover-behavior) + * [Re-Reading Changes](#re-reading-changes) + * [Consistency Guarantee](#consistency-guarantee) + * [Idempotency Requirements](#idempotency-requirements) + + + +--- + +## State Storage + +CDC Sink persists its progress as a **document in RavenDB** alongside your data. + +This state document records: + +* The last acknowledged position in the source database's change log +* Per-table initial load progress (which tables have completed, and the last key scanned) + +Like any RavenDB document, this state document is subject to **normal replication behavior**. +Different nodes in a cluster may have different versions of this document at any given moment. + +--- + +## Failover Behavior + +When the cluster elects a new mentor node for a CDC Sink task, the new node reads +the **replicated state** — which may be older than the work the previous mentor +completed but had not yet replicated. + +**Example:** + + + + +{`Before failure: +├─ Node A (Mentor) — Processed up to position X, state document not yet replicated +├─ Node B (Replica) — State document shows position X-100 (older, replicated state) +└─ Node C (Replica) — State document shows position X-100 + +Node A crashes. +Node B is elected new mentor. +Node B reads its state document: position X-100. +Node B resumes CDC Sink from position X-100. +`} + + + + +Changes between positions X-100 and X may already be in RavenDB (written by Node A), +or they may not be (if Node A crashed before writing). Node B will re-read and re-apply +them either way. + +--- + +## Re-Reading Changes + +**CDC Sink guarantees that no changes are lost, but some changes may be re-read.** + +When the new mentor resumes from an older position: + +* Changes that were already processed and fully replicated — re-applied (idempotent merge) +* Changes that were processed but not yet replicated — applied for the "first time" on new node +* New changes not yet seen — applied normally + +Re-reading is normal and expected. The document merge strategy ensures that re-applying +an INSERT or UPDATE with the same values is safe — properties are simply overwritten +with the same values they already have. + + +Patches that are not idempotent can produce incorrect results when a change is +re-read after a failover. See [Idempotency Requirements](#idempotency-requirements) below. + + +--- + +## Consistency Guarantee + +CDC Sink provides **at-least-once delivery** with **eventual consistency**: + +* ✓ No changes are lost +* ✓ Changes are applied in source transaction order +* ✓ After failover, progress resumes from the last replicated state +* ✓ Idempotent patches ensure re-reads are safe +* ✗ No "exactly once" guarantee — the same change may be applied more than once + +This is the same model used by other ongoing tasks in RavenDB. + +--- + +## Idempotency Requirements + +A patch is **idempotent** if applying it multiple times produces the same result as +applying it once. + +**Column mapping is always idempotent** — overwriting a property with the same value +is a no-op. + +**Patches that use absolute values are idempotent:** + + + + +{`// Idempotent — always sets to the current SQL value +this.Status = $row.is_active ? 'Active' : 'Inactive'; +this.ViewCount = $row.view_count; +`} + + + + +**Patches that increment are NOT idempotent:** + + + + +{`// NOT idempotent — increments again on re-read +this.ViewCount = (this.ViewCount || 0) + 1; +`} + + + + +**Delta patches using $old are idempotent** because `$old` reflects the previous +embedded item state, not the document's accumulated value: + + + + +{`// Idempotent — delta is recomputed correctly on re-read +const oldAmount = $old?.Amount || 0; +const newAmount = $row.amount || 0; +this.RunningTotal = (this.RunningTotal || 0) + (newAmount - oldAmount); +`} + + + + +On re-read, `$old` still reflects the state before the update, and `$row` still +reflects the new state — so the delta is the same as on the first read. + +--- + +## Related Articles + +### CDC Sink + +- [How It Works](./how-it-works.mdx) +- [Patching](./patching.mdx) +- [Monitoring](./monitoring.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/how-it-works.mdx b/docs/server/ongoing-tasks/cdc-sink/how-it-works.mdx new file mode 100644 index 0000000000..94c3047800 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/how-it-works.mdx @@ -0,0 +1,245 @@ +--- +title: "CDC Sink: How It Works" +sidebar_label: How It Works +description: "Describes the internal operation of CDC Sink — how it connects to the source database, loads initial data, streams changes, and handles failover." +sidebar_position: 1 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: How It Works + + + +* This page describes the internal operation of CDC Sink — how it connects to the + source database, loads initial data, streams changes, and handles failover. + +* Understanding these mechanics is important when designing patches, planning for + failover, and setting up monitoring. + +* In this page: + * [Startup and Verification](#startup-and-verification) + * [PostgreSQL](#postgresql) + * [Initial Load](#initial-load) + * [Change Streaming](#change-streaming) + * [Transaction Ordering](#transaction-ordering) + * [State Persistence and Failover](#state-persistence-and-failover) + * [Updating the Task Configuration](#updating-the-task-configuration) + * [Child Before Parent](#child-before-parent) + + + +--- + +## Startup and Verification + +When a CDC Sink task starts, it verifies that the source database is properly configured +before doing anything else. If any check fails, CDC Sink reports the exact issue and the +SQL an administrator needs to run to fix it. The task does not start until all checks pass. + +After verification, CDC Sink creates the necessary change-tracking infrastructure +in the source database, then begins the initial load. + +### PostgreSQL + +CDC Sink checks: + +* WAL level is set to `logical` +* The connecting user has sufficient privileges +* REPLICA IDENTITY is configured correctly for embedded tables that need delete routing + +See the [PostgreSQL Prerequisites Checklist](./postgres/prerequisites-checklist.mdx) +for the full list of requirements. + +--- + +## Initial Load + +Before streaming live changes, CDC Sink performs a full scan of every configured table +(root, embedded, and linked) using keyset pagination ordered by primary key. + +**Progress tracking:** Initial load progress is persisted per-table as a document in +RavenDB. If the task is restarted, it resumes from the last processed key rather than +re-scanning the entire table. + +**Batch pipelining:** While one batch is being written to RavenDB, the next batch is +being read from the source database, keeping both systems busy. + +**Ordering:** Tables are scanned in dependency order. Root tables are loaded first, +then embedded tables. This minimises the number of stub documents created (see +[Child Before Parent](#child-before-parent) below). + + +**Change tracking is set up before the initial load begins — but CDC streaming does +not start until after the initial load completes.** The sequence is: + +1. Change-tracking infrastructure is created in the source database (e.g., a replication + slot and publication for PostgreSQL, or CDC capture for SQL Server) +2. The full initial table scan runs — changes made during this time are captured by the + source database but not yet consumed +3. Once the initial load is complete, CDC streaming starts from the position captured in step 1 + +This guarantees no changes are missed: anything that happened during the initial load +is retained by the source database and will be processed immediately after. + +For very large databases this means the source database must retain sufficient change +history for the duration of the initial load (e.g., enough WAL disk space on PostgreSQL, +or a long enough retention window on SQL Server CDC tables). Plan accordingly before +starting the initial load on large tables. + + +--- + +## Change Streaming + +After the initial load completes, CDC Sink begins consuming changes from the source +database's change log, starting from the position captured **before** the initial load began. + +Changes arrive grouped by source database transaction, preserving the exact order +of operations. A transaction is only applied to RavenDB after it is fully committed +in the source database — partial transactions are never written. + +**Document merging:** When an UPDATE arrives, CDC Sink merges the new column values +onto the existing RavenDB document. Properties that are not part of the column mapping +are preserved. This allows RavenDB-side annotations and computed fields to coexist with +CDC-managed properties. + +See [Property Retention](./property-retention.mdx) for details. + +--- + +## Transaction Ordering + +CDC Sink preserves the full order of operations within a source database transaction. +If a single transaction performs multiple operations on the same row, all operations +are applied in order. + +**Example:** A source transaction that does: + + + + +{`BEGIN; +INSERT INTO items (id, name) VALUES (1, 'Alpha'); +UPDATE items SET name = 'Beta' WHERE id = 1; +DELETE FROM items WHERE id = 1; +INSERT INTO items (id, name) VALUES (1, 'Gamma'); +UPDATE items SET name = 'Delta' WHERE id = 1; +COMMIT; +`} + + + + +CDC Sink applies all five operations in order. The final document state is `name = 'Delta'`. + +Multiple documents modified in the same transaction are also applied atomically within +a single RavenDB batch. + +--- + +## State Persistence and Failover + +### State Storage + +CDC Sink persists its progress as a **document in RavenDB**, alongside your data. +This document records: + +* The last acknowledged position in the source database's change log (LSN for PostgreSQL) +* Per-table initial load progress (which tables completed, and the last key scanned) + +Like any RavenDB document, this state document is subject to normal replication behavior. +Different nodes in a cluster may have different versions of this document at any point in time. + +### Failover Behavior + +When the cluster elects a new mentor node for the CDC Sink task, the new node reads +the **replicated** state document — which may be older than the work the previous +mentor had completed but not yet replicated. + +The new mentor resumes from that replicated state. This means: + +* **No data is lost** — CDC Sink resumes from a known position and the source database + retains all changes from that position onward +* **Some changes may be re-read** — Changes between the replicated state and the + previous mentor's actual progress will be processed again + +Re-reading is normal and expected. The document merge strategy means that re-applying +the same INSERT or UPDATE is safe — column values are simply overwritten with the same values. + + +Patches that are not idempotent can produce incorrect results if the same change is +re-read after a failover. Design patches to handle re-processing safely. +See [Patching](./patching.mdx) for guidance. + + +--- + +## Updating the Task Configuration + +When you update a CDC Sink task configuration — for example, adding or removing columns +from the `Columns` list, changing a `Patch` script, or modifying embedded table +configuration — the changes take effect **only for new CDC events going forward**. + +CDC Sink does not retroactively re-process existing documents. Documents that were +already created retain their current structure. This means: + +* **Adding a new column mapping** — Existing documents will not have the new property. + Only documents created or updated by a subsequent CDC event will include it. +* **Removing a column mapping** — Existing documents retain the property. It will no + longer be updated by CDC events, but it is not removed from existing documents. +* **Changing a patch script** — The new patch runs on future events only. Existing + documents reflect the results of the previous patch. +* **Adding an embedded or linked table** — Existing parent documents will not have the + new embedded data until a CDC event for the embedded table row arrives. + + + +To apply configuration changes to **all** documents (not just new events), delete +the CDC Sink task and recreate it. The new task will perform a fresh initial load, +re-processing all rows with the updated configuration. + +If you only need to backfill a specific property on existing documents, you can also +use a RavenDB [patch by query](../../../client-api/operations/patching/set-based) +to update existing documents independently of the CDC Sink task. + + + +--- + + + +Tables are scanned one at a time during the initial load. This means a child table can +contain rows that reference a parent row inserted *after* the parent table scan had +already completed. For example: Orders are fully scanned, then while scanning OrderLines, +an OrderLine is encountered that points to an Order that was inserted after the Orders +scan finished. + +In this case, CDC Sink creates a **stub document** for the parent containing only the +embedded child data. When CDC streaming begins after the initial load, it picks up the +missing parent row and merges its columns onto the stub document. + +The final document will contain both the parent fields and all embedded items. No data +is lost — the brief intermediate state is resolved automatically once streaming starts. + +--- + +## Related Articles + +### CDC Sink + +- [Overview](./overview.mdx) +- [Schema Design](./schema-design.mdx) +- [Patching](./patching.mdx) +- [Property Retention](./property-retention.mdx) +- [Failover and Consistency](./failover-and-consistency.mdx) + +### PostgreSQL + +- [Prerequisites Checklist](./postgres/prerequisites-checklist.mdx) +- [Initial Setup](./postgres/initial-setup.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/linked-tables.mdx b/docs/server/ongoing-tasks/cdc-sink/linked-tables.mdx new file mode 100644 index 0000000000..e0ecc863fa --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/linked-tables.mdx @@ -0,0 +1,143 @@ +--- +title: "CDC Sink: Linked Tables" +sidebar_label: Linked Tables +description: "Explains how linked tables create document ID references in parent documents by converting foreign key values into RavenDB document IDs." +sidebar_position: 4 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Linked Tables + + + +* A **linked table** creates a document ID reference in the parent document rather + than embedding data. A foreign key value becomes a RavenDB document ID pointing + to a related collection. + +* In this page: + * [Basic Configuration](#basic-configuration) + * [Composite Foreign Keys](#composite-foreign-keys) + * [Linked vs Embedded](#linked-vs-embedded) + + + +--- + +## Basic Configuration + +`CdcSinkLinkedTableConfig` is placed inside a root table's (or embedded table's) `LinkedTables` list: + + + + +{`new CdcSinkTableConfig +\{ + CollectionName = "Orders", + SourceTableName = "orders", + PrimaryKeyColumns = ["id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "customer_id", Name = "CustomerId" \}, + ], + LinkedTables = + [ + new CdcSinkLinkedTableConfig + \{ + SourceTableName = "customers", + PropertyName = "Customer", // Property name in document + LinkedCollectionName = "Customers", // Target collection for ID + JoinColumns = ["customer_id"] // FK used to build the document ID + \} + ] +\} +`} + + + + +With `customer_id = 42`, the Orders document gets: + + + + +{`\{ + "Id": 1, + "CustomerId": 42, + "Customer": "Customers/42" +\} +`} + + + + +The `Customer` property is a RavenDB document ID. Load the referenced document using +[includes](../../../client-api/session/loading-entities#load-with-includes) to avoid +a second network call. + +--- + +## Composite Foreign Keys + +When the target table has a composite primary key, the linked reference includes +all parts of that key: + + + + +{`new CdcSinkLinkedTableConfig +\{ + SourceTableName = "customers", + PropertyName = "Customer", + LinkedCollectionName = "Customers", + JoinColumns = ["customer_region", "customer_id"] // Must match Customers PK order +\} +`} + + + + +With `customer_region = 'US'` and `customer_id = 42`, the document gets: + +``` +"Customer": "Customers/US/42" +``` + +--- + +## Linked vs Embedded + +| Consideration | Embedded | Linked | +|--------------|----------|--------| +| Data stored | Full nested object/array inside document | Document ID reference only | +| Load pattern | Single document load | Load parent + include references | +| Document size | Grows with embedded items | Parent document stays small | +| Updates to referenced data | Reflected via CDC | Reflected via CDC on the referenced collection | +| Independence | Child has no meaning without parent | Referenced entity exists independently | +| Typical use | Orders own LineItems | Orders reference Customers | + +**Choose embedded** when the child data belongs to the parent and is always read +together with it. + +**Choose linked** when the referenced entity is independently meaningful and shared +across many parents, and you want to avoid duplicating data. + +--- + +## Related Articles + +### CDC Sink + +- [Schema Design](./schema-design.mdx) +- [Embedded Tables](./embedded-tables.mdx) +- [Configuration Reference](./configuration-reference.mdx) + +### Client API + +- [Load with Includes](../../../client-api/session/loading-entities#load-with-includes) diff --git a/docs/server/ongoing-tasks/cdc-sink/monitoring.mdx b/docs/server/ongoing-tasks/cdc-sink/monitoring.mdx new file mode 100644 index 0000000000..010c0f4381 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/monitoring.mdx @@ -0,0 +1,134 @@ +--- +title: "CDC Sink: Monitoring" +sidebar_label: Monitoring +description: "Explains how to monitor a CDC Sink task — its running state, progress, fallback behavior, and statistics available through Management Studio and the API." +sidebar_position: 12 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Monitoring + + + +* This page explains how to monitor a CDC Sink task — its running state, progress, + fallback behavior, and statistics available through the Management Studio and API. + +* In this page: + * [Task State](#task-state) + * [Fallback Mode](#fallback-mode) + * [Statistics](#statistics) + * [Notifications](#notifications) + + + +--- + +## Task State + +A CDC Sink task can be in one of the following states: + +| State | Description | +|-------|-------------| +| `Active` | Running normally — streaming changes or waiting for new changes | +| `ActiveByAnotherNode` | Another cluster node is the mentor; this node is a replica | +| `Disabled` | Manually disabled via Studio or API | +| `Error` | The task encountered an error and stopped | +| `FallbackMode` | Connection to the source database was lost; retrying | +| `NotOnThisNode` | This node does not hold the task | + +The current state is visible in the **Ongoing Tasks** view in the Management Studio. + +--- + +## Fallback Mode + +When CDC Sink cannot reach the source database, it enters **fallback mode** rather +than failing immediately. + +In fallback mode: +* The task continues retrying the connection at regular intervals +* No changes are applied while the connection is down +* The task automatically resumes streaming once the source is reachable again + +The maximum time the task will remain in fallback mode before reporting an error +is controlled by the `CdcSink.MaxFallbackTimeInSec` configuration key. +See [Server Configuration](./server-configuration.mdx). + +--- + +## Statistics + +CDC Sink exposes runtime statistics through the `GetOngoingTaskInfoOperation`: + + + + +{`var taskInfo = await store.Maintenance.SendAsync( + new GetOngoingTaskInfoOperation(taskId, OngoingTaskType.CdcSink)); +`} + + + + +The returned object includes: + +| Field | Description | +|-------|-------------| +| `TaskState` | Current state of the task | +| `MentorNode` | Configured preferred node | +| `ResponsibleNode` | Node currently running the task | +| `Error` | Error message if the task is in error state | + +Detailed per-table statistics — including row counts and last processed position — +are available through the Management Studio's ongoing tasks detail view. + +--- + +## Notifications + +CDC Sink participates in RavenDB's standard alert system. If the task enters an +error state or fallback mode, an alert is raised and visible in: + +* The **Notification Center** in the Management Studio (bell icon) +* The cluster's alert log + +Alerts include the error message and which table or operation triggered the failure, +making it straightforward to diagnose the root cause. + +--- + +## Error Handling + +CDC Sink tolerates individual document processing errors — each failure is logged +and recorded, but processing continues for the remaining documents in the batch. +The CDC position advances as long as the error ratio stays within acceptable bounds. + +**Error threshold:** +When cumulative errors reach **100** and also exceed the number of successes, +the batch throws and the task enters fallback mode. + +**Fallback mode on error:** +The task enters exponential backoff, starting at 5 seconds and doubling on each +retry, up to the limit set by `CdcSink.MaxFallbackTimeInSec` (default 15 minutes). +See [Server Configuration](./server-configuration.mdx) and [Fallback Mode](#fallback-mode). + +**Patch errors:** +Patch errors — including `MaxSteps exceeded` — fail only the affected document. +The rest of the batch continues normally. + +--- + +## Related Articles + +### CDC Sink + +- [How It Works](./how-it-works.mdx) +- [Failover and Consistency](./failover-and-consistency.mdx) +- [Server Configuration](./server-configuration.mdx) +- [Troubleshooting](./troubleshooting.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/mysql/_category_.json b/docs/server/ongoing-tasks/cdc-sink/mysql/_category_.json new file mode 100644 index 0000000000..8ae06a8e97 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/mysql/_category_.json @@ -0,0 +1,4 @@ +{ + "position": 18, + "label": "MySQL" +} diff --git a/docs/server/ongoing-tasks/cdc-sink/mysql/overview.mdx b/docs/server/ongoing-tasks/cdc-sink/mysql/overview.mdx new file mode 100644 index 0000000000..f0ae2aee1e --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/mysql/overview.mdx @@ -0,0 +1,110 @@ +--- +title: "CDC Sink for MySQL: Overview" +sidebar_label: Overview +description: "CDC Sink support for MySQL and MariaDB — using binlog replication to stream changes into RavenDB in real time." +sidebar_position: 0 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for MySQL: Overview + + + +* CDC Sink supports MySQL and MariaDB as source databases using + [binary log (binlog) replication](https://dev.mysql.com/doc/refman/8.0/en/replication.html). + +* MySQL streams row-level changes in real time through the binlog. CDC Sink consumes + this stream and applies INSERTs, UPDATEs, and DELETEs to RavenDB documents as they + occur. + + + +--- + +## Connection String + +Use a `SqlConnectionString` with `FactoryName` set to `"MySql.Data.MySqlClient"` or +`"MySqlConnector.MySqlConnectorFactory"`: + + + + +{`store.Maintenance.Send(new PutConnectionStringOperation( + new SqlConnectionString + \{ + Name = "MySqlConnection", + FactoryName = "MySqlConnector.MySqlConnectorFactory", + ConnectionString = "Server=myserver;Port=3306;Database=mydb;User=ravendb_cdc;Password=..." + \})); +`} + + + + +--- + +## Prerequisites + +MySQL must have binary logging enabled with row-based format: + + + + +{`# my.cnf / my.ini +[mysqld] +log-bin = mysql-bin +binlog_format = ROW +binlog_row_image = FULL +server-id = 1 +`} + + + + +The CDC Sink user needs the following privileges: + + + + +{`GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'ravendb_cdc'@'%'; +GRANT SELECT ON mydb.* TO 'ravendb_cdc'@'%'; +`} + + + + +* `REPLICATION SLAVE` — required to read the binlog stream +* `REPLICATION CLIENT` — required to query binlog status +* `SELECT` — required for the initial full-table load + +--- + +## Streaming Behavior + +Like PostgreSQL, MySQL CDC Sink uses streaming replication (via the binlog), not +polling. The `CdcSink.PollIntervalInSec` server configuration setting does not +apply to MySQL. + +--- + +## Configuration + +The CDC Sink task configuration is the same across all database engines. +See [Configuration Reference](../configuration-reference.mdx) for full details. + +The `SourceTableSchema` corresponds to the MySQL database name. + +--- + +## Related Articles + +### CDC Sink + +- [Overview](../overview.mdx) +- [Configuration Reference](../configuration-reference.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/overview.mdx b/docs/server/ongoing-tasks/cdc-sink/overview.mdx new file mode 100644 index 0000000000..7a2adb1909 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/overview.mdx @@ -0,0 +1,152 @@ +--- +title: "CDC Sink: Overview" +sidebar_label: Overview +description: "An introduction to CDC Sink, the RavenDB ongoing task that reads Change Data Capture streams from a relational database and writes documents into RavenDB." +sidebar_position: 0 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Overview + + + +* **CDC Sink** is a RavenDB ongoing task that reads **Change Data Capture (CDC)** + streams from a relational database and writes the resulting documents into RavenDB. + +* CDC Sink is the reverse of ETL: instead of pushing data _from_ RavenDB _to_ SQL, + CDC Sink _pulls_ data _from_ SQL _into_ RavenDB. + The relational database is the source of truth; RavenDB receives a continuously-updated + document model derived from it. + +* CDC Sink maps normalized relational tables into rich, nested RavenDB documents — + automatically and in real time. + +* Supported source databases: + * **PostgreSQL** — via logical replication + * **SQL Server** — via native CDC (polling change tables) + * **MySQL / MariaDB** — via binlog replication + +* In this page: + * [Why Use CDC Sink](#why-use-cdc-sink) + * [How It Works](#how-it-works) + * [Task Lifecycle](#task-lifecycle) + * [Licensing](#licensing) + + + +--- + +## Why Use CDC Sink + +CDC Sink solves the problem of moving data from a relational database into RavenDB +without requiring changes to the source system. + +* **Migrate from SQL to RavenDB** + Transform normalized SQL tables (orders, order_lines, customers) into rich RavenDB + documents where an Order contains embedded LineItems and a reference to the Customer — + automatically and continuously, without changing your SQL application. + +* **Build a read-optimized view** + Your transactional system uses a relational database, but your API layer needs + denormalized documents. CDC Sink creates and maintains those documents without + touching your existing application. + +* **Gradual migration** + Keep your SQL application running while RavenDB documents are built in the background. + Applications can start reading from RavenDB while writes still go to the relational database. + +* **Event-driven side effects** + Using JavaScript patches, every INSERT, UPDATE, or DELETE in the source database can + trigger custom logic in RavenDB — computing derived fields, maintaining running totals, + or writing custom transformations. + +--- + +## How It Works + +A CDC Sink task continuously reads changes from the source relational database and +applies them to RavenDB documents. + +### Initial Load + +When a CDC Sink task starts for the first time, it performs a full scan of every +configured table using keyset pagination. This populates RavenDB with the current +state of the data before streaming begins. + +Initial load progress is persisted per-table. If the task is restarted, it resumes +from where it left off rather than re-scanning. + +### Change Streaming + +After the initial load, CDC Sink switches to streaming changes in real time. +Changes are grouped into transactions, preserving the exact order of operations +from the source database. Partial transactions are never written to RavenDB — +all changes within a source database transaction are applied together. + +### Document Model + +The relational model is mapped to RavenDB documents through configuration: + +* **Root tables** map to RavenDB collections (one document per row) +* **Embedded tables** become nested arrays or objects within parent documents +* **Linked tables** become document ID references + +See [Schema Design](./schema-design.mdx) for details. + +--- + +## Task Lifecycle + +1. **Create** — Define the task in Studio or via the Client API + Specify the connection string, table mappings, and transformation options + +2. **Verify** — CDC Sink verifies the source database is properly configured + Checks permissions, replication prerequisites, and table configuration + +3. **Initial Load** — Full table scan populates RavenDB with current data + Progress is tracked per-table and persists across restarts + +4. **Stream** — Real-time change streaming begins + All INSERTs, UPDATEs, and DELETEs are applied to RavenDB documents as they occur + +5. **Monitor** — View statistics, errors, and progress in Studio + +6. **Retire** — Delete the task in RavenDB when no longer needed + PostgreSQL artifacts (replication slot, publication) must be cleaned up by + the database administrator separately + +--- + +## Licensing + + +CDC Sink is available on an **Enterprise** license. + + +Learn more about licensing [here](../../../licensing/overview). + +--- + +## Related Articles + +### CDC Sink + +- [How It Works](./how-it-works.mdx) +- [Schema Design](./schema-design.mdx) +- [Configuration Reference](./configuration-reference.mdx) + +### PostgreSQL + +- [Prerequisites Checklist](./postgres/prerequisites-checklist.mdx) +- [Initial Setup](./postgres/initial-setup.mdx) + +### Server + +- [ETL Basics](../etl/basics) +- [Queue Sink Overview](../queue-sink/overview) diff --git a/docs/server/ongoing-tasks/cdc-sink/patching.mdx b/docs/server/ongoing-tasks/cdc-sink/patching.mdx new file mode 100644 index 0000000000..2b7bf52eef --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/patching.mdx @@ -0,0 +1,505 @@ +--- +title: "CDC Sink: Patching" +sidebar_label: Patching +description: "Explains how JavaScript patch scripts work in CDC Sink, covering available variables, patch scope, common scenarios, idempotency, and limitations." +sidebar_position: 6 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Patching + + + +* A **patch** is a JavaScript snippet that runs on a document after column mapping is applied. + Patches let you transform data, compute derived fields, maintain aggregates, load related + documents, and control document metadata. + +* Patches are defined per-table (root or embedded) and per-operation (INSERT/UPDATE + or DELETE via `OnDelete.Patch`). + +* In this page: + * [When Patches Run](#when-patches-run) + * [Available Variables](#available-variables) + * [$row and $old: Names and Types](#row-and-old-names-and-types) + * [$row Column Types](#row-column-types) + * [Patch Scope: Root vs Embedded](#patch-scope-root-vs-embedded) + * [Scenario: Column Transformation](#scenario-column-transformation) + * [Scenario: Aggregation with Embedded Tables](#scenario-aggregation-with-embedded-tables) + * [Scenario: Delta Computations](#scenario-delta-computations) + * [Scenario: Computed Derived Fields](#scenario-computed-derived-fields) + * [Scenario: Document Metadata](#scenario-document-metadata) + * [Scenario: Loading Related Documents](#scenario-loading-related-documents) + * [Idempotency and Failover](#idempotency-and-failover) + * [Step Limit](#step-limit) + * [Capabilities and Limitations](#capabilities-and-limitations) + + + +--- + +## When Patches Run + +| Event | Patch property | Runs? | +|-------|---------------|-------| +| INSERT | `Patch` | ✓ After column mapping | +| UPDATE | `Patch` | ✓ After column mapping, on merged document | +| DELETE (root table) | `OnDelete.Patch` | ✓ Before document is deleted | +| DELETE (embedded table) | `OnDelete.Patch` | ✓ On parent document before item is removed | + +The `Patch` property on a table handles INSERT and UPDATE. +The `OnDelete.Patch` on `CdcSinkOnDeleteConfig` handles DELETE separately. + +--- + +## Available Variables + +| Variable | Available | Type | Description | +|----------|-----------|------|-------------| +| `this` | Always | object | The document being modified (root or parent for embedded) | +| `$row` | Always | object | All SQL columns from the CDC event (mapped and unmapped) | +| `$old` | Always | object \| null | Previous state of the document (root patches) or embedded item (embedded patches). Null on INSERT. | +| `load(id)` | Always | function | Load a RavenDB document by ID | + +--- + +## $row and $old: Names and Types + +`$row` and `$old` look similar but represent fundamentally different things: + +| | `$row` | `$old` | +|--|--------|--------| +| **Contains** | The incoming SQL row from the CDC event | The previous state of the RavenDB document (or embedded item) | +| **Property names** | Original SQL column names (`customer_name`) | Mapped RavenDB property names (`CustomerName`) | +| **Value types** | Raw SQL types (see [$row Column Types](#row-column-types)) | Stored JSON types (whatever was last written to RavenDB) | +| **Available on INSERT** | ✓ | ✗ (null) | +| **Available on UPDATE** | ✓ | ✓ | +| **Available on DELETE** | ✓ | ✓ | + +Given this column mapping: + + + + +{`Columns = +[ + new CdcColumnMapping() \{ Column = "customer_name", Name = "CustomerName" \}, + new CdcColumnMapping() \{ Column = "amount", Name = "Amount" \}, +] +`} + + + + +In the patch script: + + + + +{`// $row — SQL column names, raw SQL types +$row.customer_name // string +$row.amount // number (from numeric/decimal column) + +// $old — mapped property names, stored JSON types +$old.CustomerName // string (as stored in RavenDB) +$old.Amount // number (as stored in RavenDB) + +// this — the document after column mapping has been applied for this event +this.CustomerName // string (just set from $row.customer_name) +this.Amount // number (just set from $row.amount) +`} + + + + + +Unmapped columns — those not listed in `Columns` — appear in `$row` but are never +written to the document and therefore never appear in `$old`. + + +--- + +## $row Column Types + +`$row` exposes all SQL columns using their **original column names** (not the mapped +RavenDB property names). The JavaScript type of each value depends on the source +database type. + +### PostgreSQL + +| PostgreSQL Type | JavaScript Type in `$row` | Notes | +|----------------|--------------------------|-------| +| `integer` / `bigint` | `number` | | +| `real` / `double precision` | `number` | | +| `numeric` | `number` | | +| `boolean` | `boolean` | | +| `date` / `timestamp` / `timestamptz` | `string` | Formatted date string | +| `uuid` | `string` | | +| `varchar` / `text` / `char` | `string` | | +| `bytea` | `string` | Base64-encoded | +| `json` / `jsonb` | `string` | Raw JSON text — call `JSON.parse()` to access as an object / array | +| `text[]` / `varchar[]` / `int[]` / etc. | `Array` | Elements are always strings, even for numeric array types | +| `vector` (pgvector) | `Array` | Elements are JavaScript numbers (floats) | + +**JSON/JSONB columns require explicit parsing:** + + + + +{`var meta = JSON.parse($row.metadata); +this.Priority = meta.priority; +this.Label = meta.label; +`} + + + + +**Array columns:** + + + + +{`this.TagCount = $row.tags.length; +this.FirstTag = $row.tags[0]; +this.Tags = $row.tags; + +// INT[] elements are strings — parse explicitly if numeric arithmetic is needed +var first = parseInt($row.scores[0], 10); +`} + + + + +**pgvector columns (elements are numbers, not strings):** + + + + +{`this.EmbeddingSum = $row.embedding.reduce(function(a, b) \{ return a + b; \}, 0); +this.Embedding = $row.embedding; +`} + + + + +--- + +## Patch Scope: Root vs Embedded + +**Root table patches** operate on the root document: + + + + +{`Patch = "this.Status = $row.is_active ? 'Active' : 'Inactive';" +`} + + + + +`this` is the Orders (or whatever root collection) document. + +**Embedded table patches** operate on the **parent** document, not the embedded item: + + + + +{`// Patch on order_lines embedded table +Patch = "this.TotalQuantity = (this.Lines || []).reduce((s, l) => s + l.Quantity, 0);" +`} + + + + +`this` is the parent Orders document. This lets you recompute parent-level aggregates +whenever an embedded item changes. + +--- + +## Scenario: Column Transformation + +Transform or combine SQL columns into RavenDB properties: + + + + +{`this.FullName = ($row.first_name + ' ' + $row.last_name).trim(); +this.Status = $row.is_active ? 'Active' : 'Inactive'; +this.CreatedAt = new Date($row.created_unix * 1000).toISOString(); +`} + + + + +--- + +## Scenario: Aggregation with Embedded Tables + +Recompute parent-level totals whenever an embedded item is added or updated. + + +If you use a `Patch` to maintain an aggregate, you **must** also provide an +`OnDelete.Patch` that reverses the aggregate when an item is deleted. +Without it, deletes will leave the aggregate in an incorrect state. + + + + + +{`new CdcSinkEmbeddedTableConfig +\{ + SourceTableName = "order_lines", + PropertyName = "Lines", + PrimaryKeyColumns = ["line_id"], + JoinColumns = ["order_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "line_id", Name = "LineId" \}, + new CdcColumnMapping() \{ Column = "quantity", Name = "Quantity" \}, + ], + + // Runs on INSERT and UPDATE — recomputes total from current Lines array + Patch = @" + this.TotalQuantity = (this.Lines || []) + .reduce((sum, line) => sum + (line.Quantity || 0), 0); + ", + + // REQUIRED: Runs on DELETE — recomputes after item is removed from array + OnDelete = new CdcSinkOnDeleteConfig + \{ + Patch = @" + this.TotalQuantity = (this.Lines || []) + .reduce((sum, line) => sum + (line.Quantity || 0), 0); + " + \} +\} +`} + + + + +The OnDelete patch runs after the item has already been removed from the array, +so re-summing `this.Lines` gives the correct post-deletion total. + +--- + +## Scenario: Delta Computations + +Use `$old` to compute a delta when an embedded item changes, keeping a running total +without recomputing from scratch. + + +Running total patches using `$old` must have a matching `OnDelete.Patch` that subtracts +the deleted item's value. Without it, deletes leave the running total incorrect. + + + + + +{`new CdcSinkEmbeddedTableConfig +\{ + SourceTableName = "invoice_lines", + PropertyName = "Lines", + PrimaryKeyColumns = ["line_id"], + JoinColumns = ["invoice_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "line_id", Name = "LineId" \}, + new CdcColumnMapping() \{ Column = "amount", Name = "Amount" \}, + ], + + // INSERT: $old is null, so delta = new amount (0 → new) + // UPDATE: $old has previous Amount, delta = new - old + Patch = @" + const oldAmount = $old?.Amount || 0; + const newAmount = $row.amount || 0; + this.RunningTotal = (this.RunningTotal || 0) + (newAmount - oldAmount); + ", + + // REQUIRED: Subtract the deleted item's amount using $old + OnDelete = new CdcSinkOnDeleteConfig + \{ + Patch = @" + const deletedAmount = $old?.Amount || 0; + this.RunningTotal = (this.RunningTotal || 0) - deletedAmount; + " + \} +\} +`} + + + + +The `$old` variable in the OnDelete patch contains the item's last known state before deletion. + +--- + +## Scenario: Computed Derived Fields + +Compute fields from unmapped columns that you don't want to store directly: + + + + +{`Columns = +[ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "name", Name = "Name" \}, +], +// base_price and tax_rate are NOT in Columns +Patch = @" + this.FinalPrice = $row.base_price * (1 + $row.tax_rate); + this.Discount = $row.is_vip ? $row.base_price * 0.1 : 0; +" +`} + + + + +--- + +## Scenario: Document Metadata + +Patches can set RavenDB document metadata, including expiration: + + + + +{`this['@metadata'] = this['@metadata'] || \{\}; + +if ($row.expires_at) \{ + this['@metadata']['@expires'] = new Date($row.expires_at).toISOString(); +\} + +this['@metadata']['SourceTable'] = 'orders'; +`} + + + + +--- + +## Scenario: Loading Related Documents + +Use `load()` to load a related RavenDB document and denormalize its data: + + + + +{`const customer = load('Customers/' + $row.customer_id); + +if (customer) \{ + this.CustomerName = customer.Name; + this.CustomerEmail = customer.Email; + this.CustomerTier = customer.Tier; +\} else \{ + // Document doesn't exist yet — race condition or not yet synced + this.CustomerName = null; + this.CustomerEmail = null; +\} +`} + + + + +`load()` returns `null` if the document does not exist or has not yet been created +by CDC Sink (race condition when multiple tables are loading in parallel). +Always check for null before accessing properties. + +**When to use `load()`:** + +* Denormalizing slowly-changing reference data (customer name, category, region) +* Capturing a snapshot of related data at insert time + +**Prefer linked tables** for simple foreign key references — they are cleaner and +do not have the null-handling complexity of `load()`. + +--- + +## Idempotency and Failover + +After a failover, CDC Sink resumes from the replicated state, which may be older +than the last work the previous mentor node completed. Some changes may be re-read +and re-applied. + +**Patches that are not idempotent can produce incorrect results when re-applied.** + + + + +{`// NOT idempotent — increments again on re-read +this.ViewCount = (this.ViewCount || 0) + 1; + +// Idempotent — absolute value from SQL source +this.ViewCount = $row.view_count; + +// Idempotent — delta via $old (both first-read and re-read produce same result) +const oldVal = $old?.Amount || 0; +const newVal = $row.amount || 0; +this.RunningTotal = (this.RunningTotal || 0) + (newVal - oldVal); +`} + + + + +Column mapping itself is idempotent — overwriting a property with the same value +is always safe. + +--- + +## Step Limit + +Patch scripts are bounded by a **step quota** rather than a time limit. Each operation +in the script (assignment, loop iteration, function call) consumes steps. + +The limit is controlled by the `Patching.MaxStepsForScript` configuration setting. +See [Patching Configuration](../../../server/configuration/patching-configuration). + +If a patch exceeds the limit, the CDC operation fails and is retried. + +Keep patches focused and efficient — prefer `.filter()` + `.reduce()` over nested loops. + +--- + +## Capabilities and Limitations + +**What patches can do:** + +* Access `this`, `$row`, `$old` +* Load related documents with `load(id)` +* Create or replace documents with `put(id, document)` +* Delete documents with `del(id)` +* Compute and transform property values +* Set and modify document metadata +* Conditional logic, loops, array methods +* Use built-in JavaScript: `Date`, `Math`, `JSON`, `Array` + +**What patches cannot do:** + +* Make HTTP calls or API requests +* Access the file system +* Load documents from other RavenDB databases + +--- + +## Related Articles + +### CDC Sink + +- [Delete Strategies](./delete-strategies.mdx) +- [Column Mapping](./column-mapping.mdx) +- [Property Retention](./property-retention.mdx) +- [Failover and Consistency](./failover-and-consistency.mdx) + +### PostgreSQL + +- [Column Mapping](./column-mapping.mdx) + +### Server + +- [Patching Configuration](../../../server/configuration/patching-configuration) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/_category_.json b/docs/server/ongoing-tasks/cdc-sink/postgres/_category_.json new file mode 100644 index 0000000000..697a3e48e5 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/_category_.json @@ -0,0 +1 @@ +{"position": 16, "label": "PostgreSQL"} \ No newline at end of file diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/cleanup-and-maintenance.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/cleanup-and-maintenance.mdx new file mode 100644 index 0000000000..538fb59942 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/cleanup-and-maintenance.mdx @@ -0,0 +1,210 @@ +--- +title: "CDC Sink for PostgreSQL: Cleanup and Maintenance" +sidebar_label: Cleanup and Maintenance +description: "How to identify and remove orphaned PostgreSQL replication slots and publications after a CDC Sink task is deleted, and how to avoid slot accumulation." +sidebar_position: 6 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Cleanup and Maintenance + + + +* When a CDC Sink task is deleted from RavenDB, the associated PostgreSQL objects + — the replication slot and publication — are **not** automatically removed. + +* These must be cleaned up manually by a database administrator. + +* In this page: + * [Why Manual Cleanup Is Required](#why-manual-cleanup-is-required) + * [Finding Orphaned Slots and Publications](#finding-orphaned-slots-and-publications) + * [Dropping a Replication Slot](#dropping-a-replication-slot) + * [Dropping a Publication](#dropping-a-publication) + * [Too Many Replication Slots](#too-many-replication-slots) + * [Configuration Changes That Rename Slots](#configuration-changes-that-rename-slots) + + + +--- + +## Why Manual Cleanup Is Required + +An active replication slot prevents PostgreSQL from discarding WAL segments that +have not yet been consumed. If a slot is not being consumed (because the CDC Sink +task was deleted), PostgreSQL will accumulate WAL on disk indefinitely. + +This can lead to: + +* Disk space exhaustion on the PostgreSQL server +* Degraded performance as old WAL segments pile up + +A database administrator must drop unused replication slots. + + +**Why RavenDB does not drop slots automatically:** + +There are valid reasons to keep a slot after a task is deleted. For example, you +may want to create a new CDC Sink task that resumes from the same position (by +reusing the existing slot), or you may want to review what changes are pending +before cleaning up. The CDC Sink user may also not have the permissions required +to drop replication slots, even if it had the permissions to create them. + +For these reasons, slot and publication lifecycle management is the responsibility +of the database administrator. + + +--- + +## Finding Orphaned Slots and Publications + +List all CDC Sink replication slots: + + + + +{`SELECT slot_name, active, confirmed_flush_lsn +FROM pg_replication_slots +WHERE slot_name LIKE 'rvn_cdc_s_%'; +`} + + + + +An `active = false` slot is not being consumed. Compare the list to your active CDC +Sink tasks in RavenDB — any slot whose corresponding task no longer exists is orphaned. + +List all CDC Sink publications: + + + + +{`SELECT p.pubname, c.relname AS table_name +FROM pg_publication p +JOIN pg_publication_rel pr ON pr.prpubid = p.oid +JOIN pg_class c ON c.oid = pr.prrelid +WHERE p.pubname LIKE 'rvn_cdc_p_%' +ORDER BY p.pubname, c.relname; +`} + + + + +--- + +## Dropping a Replication Slot + + + + +{`SELECT pg_drop_replication_slot('rvn_cdc_s_'); +`} + + + + + +You cannot drop an active replication slot (one with `active = true`). The CDC Sink +task must be stopped or deleted in RavenDB before the slot can be dropped. + + +--- + +## Dropping a Publication + + + + +{`DROP PUBLICATION IF EXISTS "rvn_cdc_p_"; +`} + + + + +Publications are not consumed like slots — they do not accumulate data or hold WAL +segments. However, they should still be dropped to keep the database clean. + +--- + +## Too Many Replication Slots + +PostgreSQL limits the total number of replication slots to `max_replication_slots`. +If you exceed this limit, no new CDC Sink tasks can start (they will fail with a +connection error). + +Check how many slots are in use: + + + + +{`SELECT count(*) FROM pg_replication_slots; +SHOW max_replication_slots; +`} + + + + +To resolve this: + +1. Identify inactive slots: + + + + +{`SELECT slot_name, active +FROM pg_replication_slots +WHERE active = false; +`} + + + + +2. Drop slots for tasks that no longer exist: + + + + +{`SELECT pg_drop_replication_slot('rvn_cdc_s_'); +`} + + + + +3. If needed, increase `max_replication_slots` in `postgresql.conf` and restart + PostgreSQL. + +--- + +## Slot and Publication Names Are Immutable + +The replication slot name and publication name are generated **once at task creation** +using a GUID and stored with the task. They do not change when you modify the task — +adding or removing tables, renaming the task, or changing any other configuration +always reuses the same slot and publication. + +Attempting to specify a different `SlotName` or `PublicationName` when updating a +task results in an error. + +If you need to use a different slot or publication, delete the task and create a new one. +The old slot and publication from the deleted task must then be cleaned up manually — +see [Dropping a Replication Slot](#dropping-a-replication-slot) and +[Dropping a Publication](#dropping-a-publication) below. + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [Initial Setup](./initial-setup.mdx) +- [Monitoring PostgreSQL](./monitoring-postgres.mdx) +- [WAL Configuration](./wal-configuration.mdx) + +### CDC Sink + +- [API Reference](../api-reference.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/examples/_category_.json b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/_category_.json new file mode 100644 index 0000000000..10c3f5c05e --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/_category_.json @@ -0,0 +1 @@ +{"position": 10, "label": "Examples"} diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-complex-nesting.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-complex-nesting.mdx new file mode 100644 index 0000000000..9520887f56 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-complex-nesting.mdx @@ -0,0 +1,230 @@ +--- +title: "CDC Sink Example: Complex Nesting with Linked Tables" +sidebar_label: Complex Nesting with Linked Tables +description: "Multi-level embedded table structure combined with linked table references, representing a product catalog with variants, attributes, and a category reference." +sidebar_position: 3 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink Example: Complex Nesting with Linked Tables + + + +* This example shows a multi-level embedded table structure combined with linked + table references, representing a product catalog with variants, attributes, and + a category reference. + +* In this page: + * [Source Schema](#source-schema) + * [REPLICA IDENTITY Setup](#replica-identity-setup) + * [Task Configuration](#task-configuration) + * [Resulting Documents](#resulting-documents) + + + +## Source Schema + + + + +{`CREATE TABLE categories ( + category_id SERIAL PRIMARY KEY, + name TEXT NOT NULL +); + +CREATE TABLE products ( + product_id SERIAL PRIMARY KEY, + name TEXT NOT NULL, + category_id INT REFERENCES categories(category_id) +); + +CREATE TABLE product_variants ( + variant_id SERIAL PRIMARY KEY, + product_id INT NOT NULL REFERENCES products(product_id), + sku TEXT NOT NULL, + price NUMERIC(10,2) +); + +CREATE TABLE variant_attributes ( + attr_id SERIAL PRIMARY KEY, + -- denormalized root PK (required for deep nesting) + product_id INT NOT NULL REFERENCES products(product_id), + variant_id INT NOT NULL REFERENCES product_variants(variant_id), + attr_name TEXT NOT NULL, + attr_value TEXT NOT NULL +); +`} + + + + +## REPLICA IDENTITY Setup + +Both `product_variants` and `variant_attributes` have surrogate PKs with join +columns that are not part of their primary keys. Rather than `REPLICA IDENTITY FULL` +(which includes all columns), we use targeted unique indexes covering just the join +and PK columns: + + + + +{`-- product_variants: join column is product_id, PK is variant_id +CREATE UNIQUE INDEX product_variants_replica_idx + ON product_variants (product_id, variant_id); +ALTER TABLE product_variants + REPLICA IDENTITY USING INDEX product_variants_replica_idx; + +-- variant_attributes: join columns are product_id + variant_id, PK is attr_id +CREATE UNIQUE INDEX variant_attributes_replica_idx + ON variant_attributes (product_id, variant_id, attr_id); +ALTER TABLE variant_attributes + REPLICA IDENTITY USING INDEX variant_attributes_replica_idx; +`} + + + + +See [REPLICA IDENTITY](../replica-identity.mdx) for more details. + +## Task Configuration + + + + +{`var config = new CdcSinkConfiguration +\{ + Name = "ProductCatalogSync", + ConnectionStringName = "MyPostgresConnection", + Tables = + [ + new CdcSinkTableConfig + \{ + CollectionName = "Products", + SourceTableName = "products", + PrimaryKeyColumns = ["product_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "product_id", Name = "ProductId" \}, + new CdcColumnMapping() \{ Column = "name", Name = "Name" \}, + ], + // Linked table: category_id FK → document ID in Categories collection + LinkedTables = + [ + new CdcSinkLinkedTableConfig + \{ + SourceTableName = "categories", + PropertyName = "Category", + LinkedCollectionName = "Categories", + JoinColumns = ["category_id"] + \} + ], + EmbeddedTables = + [ + new CdcSinkEmbeddedTableConfig + \{ + SourceTableName = "product_variants", + PropertyName = "Variants", + Type = CdcSinkRelationType.Array, + JoinColumns = ["product_id"], + PrimaryKeyColumns = ["variant_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "variant_id", Name = "VariantId" \}, + new CdcColumnMapping() \{ Column = "sku", Name = "Sku" \}, + new CdcColumnMapping() \{ Column = "price", Name = "Price" \}, + ], + // Deep-nested: attributes within each variant + EmbeddedTables = + [ + new CdcSinkEmbeddedTableConfig + \{ + SourceTableName = "variant_attributes", + PropertyName = "Attributes", + Type = CdcSinkRelationType.Array, + // JoinColumns must include the ROOT PK for deep nesting + JoinColumns = ["product_id", "variant_id"], + PrimaryKeyColumns = ["attr_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "attr_id", Name = "AttrId" \}, + new CdcColumnMapping() \{ Column = "attr_name", Name = "Name" \}, + new CdcColumnMapping() \{ Column = "attr_value", Name = "Value" \}, + ] + \} + ] + \} + ] + \} + ] +\}; + +await store.Maintenance.SendAsync(new AddCdcSinkOperation(config)); +`} + + + + +## Resulting Documents + + + + +{`\{ + "ProductId": 42, + "Name": "Hiking Boot", + "Category": "categories/3", + "Variants": [ + \{ + "VariantId": 101, + "Sku": "HB-BLK-10", + "Price": 89.99, + "Attributes": [ + \{ "AttrId": 1, "Name": "Color", "Value": "Black" \}, + \{ "AttrId": 2, "Name": "Size", "Value": "10" \} + ] + \}, + \{ + "VariantId": 102, + "Sku": "HB-BRN-11", + "Price": 89.99, + "Attributes": [ + \{ "AttrId": 3, "Name": "Color", "Value": "Brown" \}, + \{ "AttrId": 4, "Name": "Size", "Value": "11" \} + ] + \} + ], + "@metadata": \{ "@collection": "Products" \} +\} +`} + + + + + +The `Categories` collection is also synced by CDC Sink (it would be a separate +root table configuration). `"categories/3"` is a standard RavenDB document ID +that enables the use of RavenDB includes when querying products. + + +--- + +## Related Articles + +### CDC Sink Examples + +- [Simple Migration](./example-simple-migration.mdx) +- [Denormalization](./example-denormalization.mdx) +- [Event Sourcing](./example-event-sourcing.mdx) + +### CDC Sink + +- [Schema Design](../../schema-design.mdx) +- [Embedded Tables](../../embedded-tables.mdx) +- [Linked Tables](../../linked-tables.mdx) +- [REPLICA IDENTITY](../replica-identity.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-denormalization.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-denormalization.mdx new file mode 100644 index 0000000000..73cd1c7dfb --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-denormalization.mdx @@ -0,0 +1,202 @@ +--- +title: "CDC Sink Example: Denormalization with Embedded Tables" +sidebar_label: Denormalization with Embedded Tables +description: "How to merge a normalized SQL schema (orders + order_lines) into denormalized RavenDB documents with embedded arrays using CDC Sink." +sidebar_position: 1 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink Example: Denormalization with Embedded Tables + + + +* This example shows how to merge a normalized SQL schema (orders + order_lines) + into denormalized RavenDB documents with embedded arrays. + +* In this page: + * [Source Schema](#source-schema) + * [REPLICA IDENTITY Setup](#replica-identity-setup) + * [Task Configuration](#task-configuration) + * [Resulting Documents](#resulting-documents) + * [What Happens on Change Events](#what-happens-on-change-events) + + + +## Source Schema + + + + +{`CREATE TABLE orders ( + order_id SERIAL PRIMARY KEY, + customer TEXT NOT NULL, + status TEXT NOT NULL DEFAULT 'pending', + created_at TIMESTAMPTZ DEFAULT now() +); + +CREATE TABLE order_lines ( + line_id SERIAL PRIMARY KEY, + order_id INT NOT NULL REFERENCES orders(order_id), + product TEXT NOT NULL, + qty INT NOT NULL, + unit_price NUMERIC(10,2) NOT NULL +); +`} + + + + +## REPLICA IDENTITY Setup + +`order_lines` has a surrogate PK (`line_id`). The join column `order_id` is not +part of the primary key. Without `REPLICA IDENTITY FULL`, DELETE events for +`order_lines` rows would not include `order_id`, and CDC Sink could not find +the parent document. + + + + +{`ALTER TABLE order_lines REPLICA IDENTITY FULL; +`} + + + + +See [REPLICA IDENTITY](../replica-identity.mdx). + +## Task Configuration + + + + +{`var config = new CdcSinkConfiguration +\{ + Name = "OrdersSync", + ConnectionStringName = "MyPostgresConnection", + Tables = new List + \{ + new CdcSinkTableConfig + \{ + CollectionName = "Orders", + SourceTableName = "orders", + PrimaryKeyColumns = new List \{ "order_id" \}, + Columns = + [ + new CdcColumnMapping() \{ Column = "order_id", Name = "OrderId" \}, + new CdcColumnMapping() \{ Column = "customer", Name = "Customer" \}, + new CdcColumnMapping() \{ Column = "status", Name = "Status" \}, + new CdcColumnMapping() \{ Column = "created_at", Name = "CreatedAt" \}, + ], + EmbeddedTables = new List + \{ + new CdcSinkEmbeddedTableConfig + \{ + SourceTableName = "order_lines", + PropertyName = "Lines", + Type = CdcSinkRelationType.Array, + JoinColumns = new List \{ "order_id" \}, + PrimaryKeyColumns = new List \{ "line_id" \}, + Columns = + [ + new CdcColumnMapping() \{ Column = "line_id", Name = "LineId" \}, + new CdcColumnMapping() \{ Column = "product", Name = "Product" \}, + new CdcColumnMapping() \{ Column = "qty", Name = "Qty" \}, + new CdcColumnMapping() \{ Column = "unit_price", Name = "UnitPrice" \}, + ] + \} + \} + \} + \} +\}; + +await store.Maintenance.SendAsync(new AddCdcSinkOperation(config)); +`} + + + + +## Resulting Documents + +SQL rows: + + + + +{`orders: order_id=1, customer='Acme Corp', status='pending' +order_lines: line_id=1, order_id=1, product='Widget A', qty=3, unit_price=9.99 +order_lines: line_id=2, order_id=1, product='Widget B', qty=1, unit_price=24.99 +`} + + + + +RavenDB document `orders/1`: + + + + +{`\{ + "OrderId": 1, + "Customer": "Acme Corp", + "Status": "pending", + "CreatedAt": "2024-06-01T09:00:00+00:00", + "Lines": [ + \{ + "LineId": 1, + "Product": "Widget A", + "Qty": 3, + "UnitPrice": 9.99 + \}, + \{ + "LineId": 2, + "Product": "Widget B", + "Qty": 1, + "UnitPrice": 24.99 + \} + ], + "@metadata": \{ "@collection": "Orders" \} +\} +`} + + + + +## What Happens on Change Events + +**INSERT into `order_lines`:** +A new item is appended to the `Lines` array. + +**UPDATE to `order_lines`:** +CDC Sink finds the item by `line_id` within the `Lines` array and updates its properties. + +**DELETE from `order_lines`:** +CDC Sink finds the item by `line_id` and removes it from the array. +(Requires `REPLICA IDENTITY FULL` as configured above.) + +**UPDATE to `orders`:** +Only the root document properties (`Status`, etc.) are updated. The `Lines` array +is not affected. + +**DELETE from `orders`:** +The entire `orders/1` document is deleted, including all embedded lines. + +--- + +## Related Articles + +### CDC Sink Examples + +- [Simple Migration](./example-simple-migration.mdx) +- [Complex Nesting](./example-complex-nesting.mdx) + +### CDC Sink + +- [Embedded Tables](../../embedded-tables.mdx) +- [REPLICA IDENTITY](../replica-identity.mdx) +- [Delete Strategies](../../delete-strategies.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-event-sourcing.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-event-sourcing.mdx new file mode 100644 index 0000000000..a5c8304f71 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-event-sourcing.mdx @@ -0,0 +1,185 @@ +--- +title: "CDC Sink Example: Event Sourcing with Aggregation" +sidebar_label: Event Sourcing with Aggregation +description: "How to use CDC Sink patches to maintain a computed aggregate on a RavenDB document as individual event rows arrive from PostgreSQL." +sidebar_position: 2 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink Example: Event Sourcing with Aggregation + + + +* This example shows how to use CDC Sink patches to maintain a computed aggregate + on a RavenDB document as individual event rows arrive from PostgreSQL. + +* In this page: + * [Source Schema](#source-schema) + * [Goal](#goal) + * [Task Configuration](#task-configuration) + * [Resulting Documents](#resulting-documents) + * [Handling Deletes](#handling-deletes) + + + +## Source Schema + +An accounts table and a transactions table: + + + + +{`CREATE TABLE accounts ( + account_id SERIAL PRIMARY KEY, + owner TEXT NOT NULL, + currency TEXT NOT NULL DEFAULT 'USD' +); + +CREATE TABLE transactions ( + txn_id SERIAL PRIMARY KEY, + account_id INT NOT NULL REFERENCES accounts(account_id), + amount NUMERIC(12,2) NOT NULL, + type TEXT NOT NULL, -- 'credit' or 'debit' + created_at TIMESTAMPTZ DEFAULT now() +); +`} + + + + +## Goal + +Store each account as a RavenDB document with a `Balance` field that reflects the +running total of all transactions. Transaction rows are embedded as an array for +history, and `Balance` is maintained using patch logic. + +## Task Configuration + + + + +{`var config = new CdcSinkConfiguration +\{ + Name = "AccountsSync", + ConnectionStringName = "MyPostgresConnection", + Tables = + [ + new CdcSinkTableConfig + \{ + CollectionName = "Accounts", + SourceTableName = "accounts", + PrimaryKeyColumns = ["account_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "account_id", Name = "AccountId" \}, + new CdcColumnMapping() \{ Column = "owner", Name = "Owner" \}, + new CdcColumnMapping() \{ Column = "currency", Name = "Currency" \}, + ], + EmbeddedTables = + [ + new CdcSinkEmbeddedTableConfig + \{ + SourceTableName = "transactions", + PropertyName = "Transactions", + Type = CdcSinkRelationType.Array, + JoinColumns = ["account_id"], + PrimaryKeyColumns = ["txn_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "txn_id", Name = "TxnId" \}, + new CdcColumnMapping() \{ Column = "amount", Name = "Amount" \}, + new CdcColumnMapping() \{ Column = "type", Name = "Type" \}, + new CdcColumnMapping() \{ Column = "created_at", Name = "CreatedAt" \}, + ], + // Patch runs on the parent document for INSERT/UPDATE + Patch = """ + const oldAmount = $old?.Amount || 0; + const newAmount = $row.amount || 0; + const sign = $row.type === 'credit' ? 1 : -1; + const oldSign = $old?.Type === 'credit' ? 1 : -1; + this.Balance = (this.Balance || 0) + - (oldSign * oldAmount) + + (sign * newAmount); + """, + OnDelete = new CdcSinkOnDeleteConfig + \{ + Patch = """ + const deletedAmount = $old?.Amount || 0; + const sign = $old?.Type === 'credit' ? 1 : -1; + this.Balance = (this.Balance || 0) - (sign * deletedAmount); + """ + \} + \} + ] + \} + ] +\}; + +await store.Maintenance.SendAsync(new AddCdcSinkOperation(config)); +`} + + + + +## Resulting Documents + +After three transactions (credit 100, debit 30, credit 50): + + + + +{`\{ + "AccountId": 1, + "Owner": "Alice", + "Currency": "USD", + "Balance": 120.00, + "Transactions": [ + \{ "TxnId": 1, "Amount": 100.00, "Type": "credit", "CreatedAt": "..." \}, + \{ "TxnId": 2, "Amount": 30.00, "Type": "debit", "CreatedAt": "..." \}, + \{ "TxnId": 3, "Amount": 50.00, "Type": "credit", "CreatedAt": "..." \} + ], + "@metadata": \{ "@collection": "Accounts" \} +\} +`} + + + + +## Handling Deletes + +The `OnDelete.Patch` reverses the contribution of the deleted transaction to +`Balance`. This uses `$old` (the embedded item's last known state) rather than +`$row`, because for a DELETE event the embedded item's mapped values are in `$old`. + +Without the `OnDelete.Patch`, deleting a transaction row from SQL would remove +it from the `Transactions` array but leave `Balance` stale. The delete patch +keeps `Balance` consistent. + + +The patch uses delta logic (`$old` → `$row`) for idempotency. If a change is +re-applied after a failover, `$old` still reflects the state before the original +update, so the delta produces the same result. +See [Failover and Consistency](../../failover-and-consistency.mdx). + + +--- + +## Related Articles + +### CDC Sink Examples + +- [Simple Migration](./example-simple-migration.mdx) +- [Denormalization](./example-denormalization.mdx) +- [Complex Nesting](./example-complex-nesting.mdx) + +### CDC Sink + +- [Patching](../../patching.mdx) +- [Delete Strategies](../../delete-strategies.mdx) +- [Failover and Consistency](../../failover-and-consistency.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-simple-migration.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-simple-migration.mdx new file mode 100644 index 0000000000..b6634e057e --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/examples/example-simple-migration.mdx @@ -0,0 +1,128 @@ +--- +title: "CDC Sink Example: Simple Table Migration" +sidebar_label: Simple Table Migration +description: "Minimal setup to replicate a single SQL table into a RavenDB collection using CDC Sink." +sidebar_position: 0 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink Example: Simple Table Migration + + + +* This example shows the minimal setup to replicate a single SQL table into a + RavenDB collection. + +* In this page: + * [Source Schema](#source-schema) + * [Task Configuration](#task-configuration) + * [Resulting Documents](#resulting-documents) + + + +## Source Schema + +A simple customers table: + + + + +{`CREATE TABLE customers ( + id SERIAL PRIMARY KEY, + name TEXT NOT NULL, + email TEXT NOT NULL, + created_at TIMESTAMPTZ DEFAULT now() +); +`} + + + + +## Task Configuration + + + + +{`var config = new CdcSinkConfiguration +\{ + Name = "CustomersSync", + ConnectionStringName = "MyPostgresConnection", + Tables = new List + \{ + new CdcSinkTableConfig + \{ + CollectionName = "Customers", + SourceTableName = "customers", + PrimaryKeyColumns = new List \{ "id" \}, + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "name", Name = "Name" \}, + new CdcColumnMapping() \{ Column = "email", Name = "Email" \}, + new CdcColumnMapping() \{ Column = "created_at", Name = "CreatedAt" \}, + ] + \} + \} +\}; + +await store.Maintenance.SendAsync(new AddCdcSinkOperation(config)); +`} + + + + +## Resulting Documents + +SQL row: + + + + +{`id=1, name='Alice', email='alice@example.com', created_at='2024-01-15 10:30:00+00' +`} + + + + +RavenDB document in collection `Customers` with ID `customers/1`: + + + + +{`\{ + "Id": 1, + "Name": "Alice", + "Email": "alice@example.com", + "CreatedAt": "2024-01-15T10:30:00+00:00", + "@metadata": \{ + "@collection": "Customers" + \} +\} +`} + + + + +Document IDs are generated as `{collection}/{pk}` — for example, `customers/1` +for a row with `id = 1`. + +--- + +## Related Articles + +### CDC Sink Examples + +- [Denormalization](./example-denormalization.mdx) +- [Event Sourcing](./example-event-sourcing.mdx) +- [Complex Nesting](./example-complex-nesting.mdx) + +### CDC Sink + +- [Schema Design](../../schema-design.mdx) +- [Column Mapping](../../column-mapping.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/initial-setup.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/initial-setup.mdx new file mode 100644 index 0000000000..d491506ee1 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/initial-setup.mdx @@ -0,0 +1,218 @@ +--- +title: "CDC Sink for PostgreSQL: Initial Setup" +sidebar_label: Initial Setup +description: "How CDC Sink creates and verifies the replication slot and publication on startup, with options for automatic or manual setup and custom naming." +sidebar_position: 3 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Initial Setup + + + +* When CDC Sink starts, it verifies and (if it has the necessary permissions) creates + the **replication slot** and **publication** required for logical replication. + +* If the CDC Sink user does not have permission to create these objects, a database + administrator must create them manually before the task can start. + +* In this page: + * [Automatic Setup](#automatic-setup) + * [Manual Setup](#manual-setup) + * [Specifying Custom Slot and Publication Names](#specifying-custom-slot-and-publication-names) + * [Slot and Publication Naming (Auto-generated)](#slot-and-publication-naming-auto-generated) + * [Verifying Setup](#verifying-setup) + + + +--- + +## Automatic Setup + +If the CDC Sink user has the required permissions (see +[Permissions and Roles](./permissions-and-roles.mdx)), +CDC Sink will: + +1. Generate the replication slot and publication names (a GUID-based identifier) and store them with the task +2. Check whether they already exist +3. Create them if they do not exist +4. If configured tables are missing from the publication, CDC Sink attempts `ALTER PUBLICATION ... ADD TABLE` automatically (requires the user to have publication ownership or superuser access) +5. Begin the initial load + +No manual database administration is needed in this case. + +--- + +## Manual Setup + +If the CDC Sink user does not have permission to create replication slots or +publications, a database administrator must create them before the task is started. + +**Step 1: Determine the slot and publication names** + +The simplest approach is to specify the names explicitly in `CdcSinkPostgresSettings` +so both you and the database administrator know what names to use. +See [Specifying Custom Slot and Publication Names](#specifying-custom-slot-and-publication-names). + +If using auto-generated names (no `Postgres` settings), CDC Sink generates names +using the `rvn_cdc_s_` and `rvn_cdc_p_` prefixes followed by a GUID. +See [Slot and Publication Naming](#slot-and-publication-naming-auto-generated) +for the naming scheme. + +You can also find the names CDC Sink expects by creating the task (it will fail to start) +and reading the error message, which includes the expected names. + +**Step 2: Create the publication** + +Create a publication that includes all the tables CDC Sink will replicate: + + + + +{`CREATE PUBLICATION rvn_cdc_p_ +FOR TABLE orders, order_lines, customers; +`} + + + + +The publication must include all tables from the task's root and embedded table +configurations. + +**Step 3: Create the replication slot** + + + + +{`SELECT pg_create_logical_replication_slot( + 'rvn_cdc_s_', + 'pgoutput' +); +`} + + + + +**Step 4: Start the CDC Sink task** + +Once the slot and publication exist, CDC Sink will detect them on startup and +proceed with the initial load. + + +The slot and publication names are determined at task creation time and do not change +when you later modify the task. If you created these objects manually, the same names +will continue to be used regardless of subsequent configuration changes. + + +--- + +## Specifying Custom Slot and Publication Names + +You can explicitly specify the replication slot and publication names by setting +`CdcSinkPostgresSettings` on the task configuration: + + + + +{`var config = new CdcSinkConfiguration +\{ + Name = "OrdersSync", + ConnectionStringName = "MyPostgresConnection", + Postgres = new CdcSinkPostgresSettings + \{ + SlotName = "orders_sync_slot", + PublicationName = "orders_sync_pub" + \}, + Tables = [ ... ] +\}; +`} + + + + +Custom names are useful when: + +* A database administrator pre-creates the slot and publication with human-readable + names before starting the task +* You are migrating from a previous CDC Sink task and want to reuse an existing slot + to avoid re-reading history +* You need predictable names across environments (dev/staging/prod) + +**Immutability:** Once set, `SlotName` and `PublicationName` are fixed for the +lifetime of the task. If you need different names, delete the task and create a new one. + +--- + +## Slot and Publication Naming (Auto-generated) + +When `CdcSinkPostgresSettings` is not set (or `SlotName`/`PublicationName` are null), +CDC Sink generates unique names at task creation time using a GUID: + +* **Slot name**: `rvn_cdc_s_` +* **Publication name**: `rvn_cdc_p_` + +The GUID is generated once when the task is first created and stored with the task. + +**Example:** + + + + +{`rvn_cdc_s_a3f1b2c4d5e6f7a8b9c0d1e2f3a4b5c6 +rvn_cdc_p_a3f1b2c4d5e6f7a8b9c0d1e2f3a4b5c6 +`} + + + + +The slot and publication share the same identifier, making it easy to match them to +a specific task. + + +The slot and publication names are generated **once at task creation** and then stored +with the task. Subsequent modifications to the task (adding tables, changing mappings, +etc.) do not change the slot or publication names — the original names are always used. + + +--- + +## Verifying Setup + +To verify the slot and publication were created: + + + + +{`-- View CDC Sink replication slots +SELECT slot_name, plugin, slot_type, active +FROM pg_replication_slots +WHERE slot_name LIKE 'rvn_cdc_s_%'; + +-- View CDC Sink publications +SELECT p.pubname, c.relname AS table_name +FROM pg_publication p +JOIN pg_publication_rel pr ON pr.prpubid = p.oid +JOIN pg_class c ON c.oid = pr.prrelid +WHERE p.pubname LIKE 'rvn_cdc_p_%' +ORDER BY p.pubname, c.relname; +`} + + + + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [Prerequisites Checklist](./prerequisites-checklist.mdx) +- [Permissions and Roles](./permissions-and-roles.mdx) +- [REPLICA IDENTITY](./replica-identity.mdx) +- [Cleanup and Maintenance](./cleanup-and-maintenance.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/monitoring-postgres.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/monitoring-postgres.mdx new file mode 100644 index 0000000000..78390309b6 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/monitoring-postgres.mdx @@ -0,0 +1,120 @@ +--- +title: "CDC Sink for PostgreSQL: Monitoring PostgreSQL" +sidebar_label: Monitoring PostgreSQL +description: "PostgreSQL-side monitoring for CDC Sink, including replication slot health, replication lag, and WAL disk usage queries." +sidebar_position: 7 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Monitoring PostgreSQL + + + +* This page covers PostgreSQL-side monitoring for CDC Sink — replication slot health, + lag, and WAL usage. + +* For RavenDB-side monitoring, see [Monitoring](../monitoring.mdx). + +* In this page: + * [Replication Slot Health](#replication-slot-health) + * [Replication Lag](#replication-lag) + * [WAL Disk Usage](#wal-disk-usage) + + + +--- + +## Replication Slot Health + +Check that CDC Sink replication slots are active and being consumed: + + + + +{`SELECT slot_name, active, confirmed_flush_lsn +FROM pg_replication_slots +WHERE slot_name LIKE 'rvn_cdc_s_%'; +`} + + + + +| Column | Meaning | +|--------|---------| +| `active` | `true` if CDC Sink is connected and consuming; `false` if the connection is down | +| `confirmed_flush_lsn` | The LSN up to which changes have been confirmed as consumed | + +A slot with `active = false` means CDC Sink is not currently connected. This is +expected during failover or when the task is paused. If the slot remains inactive +for an extended period, investigate the task state in RavenDB Studio. + +--- + +## Replication Lag + +Replication lag measures how far behind the slot is relative to the current WAL position: + + + + +{`SELECT slot_name, + pg_current_wal_lsn() - confirmed_flush_lsn AS lag_bytes +FROM pg_replication_slots +WHERE slot_name LIKE 'rvn_cdc_s_%'; +`} + + + + +A consistently growing `lag_bytes` means CDC Sink is not keeping up with the rate +of changes in the source database. Consider: + +* Increasing `CdcSink.MaxBatchSize` to process more changes per batch +* Reducing load on the source database +* Checking the per-table processing statistics in the Management Studio for slow scripts — + complex patch scripts are a common cause of processing slowdowns + +--- + +## WAL Disk Usage + +PostgreSQL retains WAL segments until all replication slots have consumed them. +An inactive or slow slot can cause WAL to accumulate on disk. + +Check approximate WAL retained per slot: + + + + +{`SELECT slot_name, active, + pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS retained_wal_bytes +FROM pg_replication_slots +WHERE slot_name LIKE 'rvn_cdc_s_%'; +`} + + + + +High `retained_wal_bytes` on an inactive slot indicates the slot is not being +consumed and is holding WAL. If the slot corresponds to a deleted or abandoned +CDC Sink task, drop it. +See [Cleanup and Maintenance](./cleanup-and-maintenance.mdx). + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [Cleanup and Maintenance](./cleanup-and-maintenance.mdx) + +### CDC Sink + +- [Monitoring](../monitoring.mdx) +- [Failover and Consistency](../failover-and-consistency.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/overview.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/overview.mdx new file mode 100644 index 0000000000..f8aebd302e --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/overview.mdx @@ -0,0 +1,123 @@ +--- +title: "CDC Sink for PostgreSQL: Overview" +sidebar_label: Overview +description: "CDC Sink support for PostgreSQL — using logical replication to stream changes into RavenDB in real time." +sidebar_position: -1 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Overview + + + +* CDC Sink supports PostgreSQL as a source database using + [logical replication](https://www.postgresql.org/docs/current/logical-replication.html). + +* PostgreSQL streams row-level changes in real time through a logical replication slot. + CDC Sink consumes this stream and applies INSERTs, UPDATEs, and DELETEs to RavenDB + documents as they occur. + + + +--- + +## Connection String + +Use a `SqlConnectionString` with `FactoryName` set to `"Npgsql"`: + + + + +{`store.Maintenance.Send(new PutConnectionStringOperation( + new SqlConnectionString + \{ + Name = "MyPostgresConnection", + FactoryName = "Npgsql", + ConnectionString = "Host=myserver;Port=5432;Database=mydb;Username=ravendb_cdc;Password=..." + \})); +`} + + + + +--- + +## How It Works + +PostgreSQL CDC Sink uses logical replication, which provides a continuous stream of +changes without polling. This gives low-latency, near-real-time replication. + +Key PostgreSQL concepts: + +* **Replication slot** — a server-side cursor that tracks how far a consumer has read. + CDC Sink creates and manages one automatically. +* **Publication** — defines which tables are included in the replication stream. + CDC Sink creates one automatically for the configured tables. +* **WAL (Write-Ahead Log)** — PostgreSQL retains WAL segments as long as a replication + slot has not consumed them. An inactive or paused task causes WAL to accumulate. + +--- + +## Prerequisites + +PostgreSQL requires several configuration settings and permissions before CDC Sink +can connect. See [Prerequisites Checklist](./prerequisites-checklist.mdx) for the +full list, including: + +* WAL level set to `logical` +* Sufficient `max_replication_slots` and `max_wal_senders` +* A database user with `REPLICATION` privilege +* `REPLICA IDENTITY` configuration for embedded tables with surrogate keys + +--- + +## PostgreSQL-Specific Settings + +The `CdcSinkPostgresSettings` object on `CdcSinkConfiguration` controls the replication +slot and publication names: + + + + +{`Postgres = new CdcSinkPostgresSettings +\{ + SlotName = "my_slot", + PublicationName = "my_publication" +\} +`} + + + + +If omitted, GUID-based names are generated automatically on first use. +See [Initial Setup](./initial-setup.mdx) for details. + +--- + +## Configuration + +The CDC Sink task configuration is the same across all database engines. +See [Configuration Reference](../configuration-reference.mdx) for full details. + +The `SourceTableSchema` defaults to `"public"` when using PostgreSQL. + +--- + +## In This Section + +* [Prerequisites Checklist](./prerequisites-checklist.mdx) — what to verify before creating a task +* [Initial Setup](./initial-setup.mdx) — creating the replication slot and publication +* [WAL Configuration](./wal-configuration.mdx) — PostgreSQL WAL settings +* [Permissions and Roles](./permissions-and-roles.mdx) — required database permissions +* [REPLICA IDENTITY](./replica-identity.mdx) — configuring DELETE event content +* [Monitoring PostgreSQL](./monitoring-postgres.mdx) — replication lag and WAL monitoring +* [Cleanup and Maintenance](./cleanup-and-maintenance.mdx) — dropping orphaned slots +* [Studio UI](./studio-ui.mdx) — creating tasks through the Studio interface + +--- diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/permissions-and-roles.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/permissions-and-roles.mdx new file mode 100644 index 0000000000..beb6b070d4 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/permissions-and-roles.mdx @@ -0,0 +1,149 @@ +--- +title: "CDC Sink for PostgreSQL: Permissions and Roles" +sidebar_label: Permissions and Roles +description: "PostgreSQL permissions required by the CDC Sink user, including minimum privileges, automatic setup permissions, and dedicated user creation examples." +sidebar_position: 2 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Permissions and Roles + + + +* This page documents the PostgreSQL permissions required by the CDC Sink user, + and how to grant them. + +* In this page: + * [Minimum Permissions](#minimum-permissions) + * [Permissions for Automatic Setup](#permissions-for-automatic-setup) + * [Creating a Dedicated CDC User](#creating-a-dedicated-cdc-user) + + + +--- + +## Minimum Permissions + +At a minimum, the CDC Sink user needs: + +* **`REPLICATION`** attribute — allows the user to initiate logical replication +* **`SELECT`** on each table being replicated — required for the initial load phase + +Example: + + + + +{`-- Grant replication privilege +ALTER USER cdc_user REPLICATION; + +-- Grant SELECT on each table +GRANT SELECT ON TABLE orders TO cdc_user; +GRANT SELECT ON TABLE order_lines TO cdc_user; +GRANT SELECT ON TABLE customers TO cdc_user; +`} + + + + +With only these permissions, a database administrator must manually create the +replication slot and publication before starting the CDC Sink task. +See [Initial Setup](./initial-setup.mdx). + +--- + +## Permissions for Automatic Setup + +If you want CDC Sink to create and manage the replication slot and publication +automatically, the user needs additional permissions: + +**Create/drop replication slots:** + +PostgreSQL 14+: + + + + +{`GRANT pg_replication_slot_admin TO cdc_user; +`} + + + + +PostgreSQL 10–13: Replication slot management requires `SUPERUSER`. + +**Create/drop publications:** + +The user must either own the tables being published, or have `SUPERUSER`. + +Alternatively, you can grant `CREATE` on the database: + + + + +{`GRANT CREATE ON DATABASE mydb TO cdc_user; +`} + + + + + +Granting `SUPERUSER` gives the user unrestricted access to the database server. +For production environments, prefer granting only the specific privileges listed +above rather than `SUPERUSER`. + + + +For added security in production, consider having your database administrator create +the replication slot and publication manually with the minimal permissions shown above, +rather than granting CDC Sink the ability to manage them automatically. The CDC Sink +user then only needs `REPLICATION` privilege and `SELECT` on the relevant tables. +See [Initial Setup](./initial-setup.mdx). + + +--- + +## Creating a Dedicated CDC User + +It is recommended to use a dedicated database user for CDC Sink rather than an +application or admin user. + +Example setup: + + + + +{`-- Create the user +CREATE USER cdc_user WITH PASSWORD 'secure_password' REPLICATION; + +-- Grant SELECT on tables to replicate +GRANT SELECT ON TABLE orders TO cdc_user; +GRANT SELECT ON TABLE order_lines TO cdc_user; + +-- If using PostgreSQL 14+ and want automatic slot management: +GRANT pg_replication_slot_admin TO cdc_user; + +-- If user needs to create publications (requires table ownership or superuser): +-- Option A: Grant ownership of tables +ALTER TABLE orders OWNER TO cdc_user; +-- Option B: Create publications as a superuser before starting the task +`} + + + + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [Prerequisites Checklist](./prerequisites-checklist.mdx) +- [Initial Setup](./initial-setup.mdx) +- [Cleanup and Maintenance](./cleanup-and-maintenance.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/prerequisites-checklist.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/prerequisites-checklist.mdx new file mode 100644 index 0000000000..04f24fc05e --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/prerequisites-checklist.mdx @@ -0,0 +1,157 @@ +--- +title: "CDC Sink for PostgreSQL: Prerequisites Checklist" +sidebar_label: Prerequisites Checklist +description: "Verify all requirements before creating a CDC Sink task for PostgreSQL, including WAL configuration, user permissions, table requirements, and network access." +sidebar_position: 0 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Prerequisites Checklist + + + +* Before creating a CDC Sink task for PostgreSQL, verify that each of these requirements + is in place. + +* In this page: + * [Source Database Requirements](#source-database-requirements) + * [User Permissions](#user-permissions) + * [Table Requirements](#table-requirements) + * [Network Access](#network-access) + + + +--- + +## Source Database Requirements + +* **PostgreSQL version**: 10 or later (logical replication introduced in version 10) + +* **WAL level**: `wal_level` must be set to `logical` + + Verify with: + + + + +{`SHOW wal_level; +`} + + + + + If the result is not `logical`, see [WAL Configuration](./wal-configuration.mdx). + +* **`max_replication_slots`**: Must be at least 1 (one slot per CDC Sink task) + + Verify with: + + + + +{`SHOW max_replication_slots; +`} + + + + +* **`max_wal_senders`**: Must be at least 1 + + Verify with: + + + + +{`SHOW max_wal_senders; +`} + + + + +--- + +## User Permissions + +The database user in the connection string must have sufficient permissions. + +**Minimum required permissions:** + + + + +{`-- Replication privilege (required for logical replication) +ALTER USER cdc_user REPLICATION; + +-- SELECT on each table CDC Sink will read +GRANT SELECT ON TABLE orders TO cdc_user; +GRANT SELECT ON TABLE order_lines TO cdc_user; +`} + + + + +**Optional (allows CDC Sink to configure replication automatically):** + + + + +{`-- Create/drop replication slots +-- This requires SUPERUSER or membership in pg_replication_slot_admin (PG 14+) + +-- Create/drop publications +-- This requires ownership of the tables being published, or SUPERUSER +`} + + + + +If the CDC Sink user does not have permission to create publications and replication +slots, a database administrator must set them up manually. +See [Initial Setup](./initial-setup.mdx). + +Full details: [Permissions and Roles](./permissions-and-roles.mdx). + +--- + +## Table Requirements + +* **Primary key**: Each root table and each embedded table must have a primary key + (or a unique index used as a replica identity). + +* **REPLICA IDENTITY**: For embedded tables where the join columns are not part of + the primary key, the table must have `REPLICA IDENTITY FULL` or a replica identity + index that includes the join columns. Without this, DELETE events will not include + the old row values, and CDC Sink cannot identify which embedded item to remove. + + See [REPLICA IDENTITY](./replica-identity.mdx). + +* **Published columns**: All columns referenced in `PrimaryKeyColumns`, `JoinColumns`, + and `Columns` entries must exist in the SQL table. + +--- + +## Network Access + +* The RavenDB server must be able to open a TCP connection to the PostgreSQL host + on the configured port (default: 5432). + +* The connection must remain open for the duration of the replication stream. + Ensure that firewalls, proxies, and load balancers do not terminate idle or + long-lived connections. + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [WAL Configuration](./wal-configuration.mdx) +- [Permissions and Roles](./permissions-and-roles.mdx) +- [Initial Setup](./initial-setup.mdx) +- [REPLICA IDENTITY](./replica-identity.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity-manual-setup.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity-manual-setup.mdx new file mode 100644 index 0000000000..d1172cad60 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity-manual-setup.mdx @@ -0,0 +1,141 @@ +--- +title: "CDC Sink for PostgreSQL: Manual REPLICA IDENTITY Setup" +sidebar_label: Manual REPLICA IDENTITY Setup +description: "Step-by-step instructions for a database administrator to configure REPLICA IDENTITY manually when CDC Sink lacks permission to alter tables." +sidebar_position: 5 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Manual REPLICA IDENTITY Setup + + + +* When CDC Sink does not have permission to alter tables, a database administrator + must configure `REPLICA IDENTITY` manually before the task starts. + +* In this page: + * [When Manual Setup Is Required](#when-manual-setup-is-required) + * [Setting REPLICA IDENTITY FULL](#setting-replica-identity-full) + * [Using an Index Instead of FULL](#using-an-index-instead-of-full) + * [Verifying the Configuration](#verifying-the-configuration) + + + +--- + +## When Manual Setup Is Required + +CDC Sink attempts to set `REPLICA IDENTITY FULL` automatically on any embedded +table whose join columns are not part of the primary key. This requires: + +* The CDC Sink user owns the table, **or** +* The CDC Sink user has `SUPERUSER` + +If neither condition is met, CDC Sink will start but embedded table deletes may +not work correctly. A database administrator must configure `REPLICA IDENTITY` +manually. + +--- + +## Setting REPLICA IDENTITY FULL + +The simplest approach is to set `REPLICA IDENTITY FULL` on all embedded tables +that CDC Sink will replicate: + + + + +{`ALTER TABLE order_lines REPLICA IDENTITY FULL; +ALTER TABLE line_attributes REPLICA IDENTITY FULL; +`} + + + + +`REPLICA IDENTITY FULL` includes all column values in DELETE and UPDATE events. +This is the most compatible option but increases WAL volume for tables with many +or large columns. + +--- + +## Using an Index Instead of FULL + +If WAL size is a concern, you can use a specific unique index that covers both +the join columns and PK columns of the embedded table. + +**Step 1: Create a unique index covering the required columns** + + + + +{`-- For order_lines: join column is order_id, PK is line_id +CREATE UNIQUE INDEX order_lines_replica_idx + ON order_lines (order_id, line_id); +`} + + + + +**Step 2: Set REPLICA IDENTITY to use this index** + + + + +{`ALTER TABLE order_lines + REPLICA IDENTITY USING INDEX order_lines_replica_idx; +`} + + + + +This instructs PostgreSQL to include only those indexed columns in DELETE and +UPDATE events, rather than all columns. + + +The index must be `UNIQUE` and `NOT DEFERRABLE`. It cannot include expressions +or partial predicates. All columns in the index must be `NOT NULL`. + + +--- + +## Verifying the Configuration + +Confirm that the desired `REPLICA IDENTITY` mode is set: + + + + +{`SELECT c.relname, c.relreplident, + CASE c.relreplident + WHEN 'd' THEN 'DEFAULT' + WHEN 'f' THEN 'FULL' + WHEN 'i' THEN 'INDEX' + WHEN 'n' THEN 'NOTHING' + END AS mode +FROM pg_class c +JOIN pg_namespace n ON n.oid = c.relnamespace +WHERE c.relkind = 'r' + AND n.nspname = 'public' +ORDER BY c.relname; +`} + + + + +Tables configured with `FULL` or `INDEX` are ready for CDC Sink embedded table +delete processing. + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [REPLICA IDENTITY](./replica-identity.mdx) +- [Initial Setup](./initial-setup.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity.mdx new file mode 100644 index 0000000000..571d76a7fb --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/replica-identity.mdx @@ -0,0 +1,166 @@ +--- +title: "CDC Sink for PostgreSQL: REPLICA IDENTITY" +sidebar_label: REPLICA IDENTITY +description: "How PostgreSQL REPLICA IDENTITY affects CDC Sink's ability to process DELETE events for embedded tables, and when it must be configured." +sidebar_position: 4 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: REPLICA IDENTITY + + + +* **REPLICA IDENTITY** controls what column values PostgreSQL includes in DELETE + (and UPDATE) events in the logical replication stream. + +* CDC Sink requires old row values in DELETE events to identify which embedded + item to remove from a parent document. Without them, embedded table deletes + cannot be routed correctly. + +* In this page: + * [Why REPLICA IDENTITY Matters](#why-replica-identity-matters) + * [REPLICA IDENTITY Options](#replica-identity-options) + * [When Is REPLICA IDENTITY Required?](#when-is-replica-identity-required) + * [Checking Current Setting](#checking-current-setting) + * [Automatic vs Manual Configuration](#automatic-vs-manual-configuration) + + + +--- + +## Why REPLICA IDENTITY Matters + +When CDC Sink receives a DELETE event for an embedded table, it needs to: + +1. Find the parent document (using the join column values from the old row) +2. Find the matching item within the embedded array or map (using the PK column values) +3. Remove it + +If the DELETE event does not include old row values for the join columns and PK +columns, CDC Sink cannot perform steps 1 or 2. + +By default, PostgreSQL only includes the primary key columns in DELETE events. +If the embedded table's join columns are part of the primary key, this is sufficient. +If they are not — which is the common case when a table has a surrogate PK — the +DELETE event will be missing the columns needed to route it. + +--- + +## REPLICA IDENTITY Options + +PostgreSQL supports four REPLICA IDENTITY modes: + +| Mode | What's included in DELETE/UPDATE events | Notes | +|------|----------------------------------------|-------| +| `DEFAULT` | Primary key columns only | Default for tables with a PK | +| `FULL` | All columns | Works for any table structure; increases WAL size | +| `INDEX` | Columns covered by a specific unique index | More targeted than FULL | +| `NOTHING` | No old values | Insufficient for CDC Sink embedded table deletes | + +For CDC Sink, `FULL` is the simplest and most compatible choice. +`INDEX` works if the index covers both the join columns and PK columns. + +--- + +## When Is REPLICA IDENTITY Required? + +| Scenario | REPLICA IDENTITY needed? | +|----------|------------------------| +| Root table deletes | No — root documents are deleted by document ID | +| Embedded table deletes where join columns ARE part of the PK | No — `DEFAULT` is sufficient | +| Embedded table deletes where join columns are NOT part of the PK | **Yes** — `DEFAULT` is insufficient | +| Deep-nested embedded tables | Depends on the join column placement | + +**Example where REPLICA IDENTITY is needed:** + + + + +{`-- order_lines has a surrogate PK (line_id), with order_id as foreign key +-- order_id is NOT part of the primary key +CREATE TABLE order_lines ( + line_id SERIAL PRIMARY KEY, -- PK + order_id INT NOT NULL, -- FK to orders (not part of PK) + product TEXT, + qty INT +); + +-- Without REPLICA IDENTITY FULL, a DELETE event only contains line_id. +-- CDC Sink cannot determine which parent document (order_id) to update. +ALTER TABLE order_lines REPLICA IDENTITY FULL; +`} + + + + +**Example where REPLICA IDENTITY is NOT needed:** + + + + +{`-- If order_id is part of a composite PK, DEFAULT is sufficient +CREATE TABLE order_lines ( + order_id INT NOT NULL, + line_num INT NOT NULL, + product TEXT, + qty INT, + PRIMARY KEY (order_id, line_num) +); +`} + + + + +--- + +## Checking Current Setting + + + + +{`SELECT c.relname, c.relreplident +FROM pg_class c +JOIN pg_namespace n ON n.oid = c.relnamespace +WHERE c.relkind = 'r' + AND n.nspname = 'public' +ORDER BY c.relname; +`} + + + + +The `relreplident` column: + +| Value | Meaning | +|-------|---------| +| `d` | DEFAULT (primary key only) | +| `f` | FULL (all columns) | +| `i` | INDEX (specific unique index) | +| `n` | NOTHING | + +--- + +## Automatic vs Manual Configuration + +If the CDC Sink user has sufficient permissions (table ownership or `SUPERUSER`), +CDC Sink will automatically set `REPLICA IDENTITY FULL` on embedded tables that +require it. + +If the CDC Sink user does not have permission to alter tables, a database +administrator must configure `REPLICA IDENTITY` before starting the task. +See [Manual Setup](./replica-identity-manual-setup.mdx). + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [Manual REPLICA IDENTITY Setup](./replica-identity-manual-setup.mdx) +- [Initial Setup](./initial-setup.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/studio-ui.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/studio-ui.mdx new file mode 100644 index 0000000000..0cedc92e5f --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/studio-ui.mdx @@ -0,0 +1,96 @@ +--- +title: "CDC Sink for PostgreSQL: Studio UI" +sidebar_label: Studio UI +description: "How to create, monitor, and edit CDC Sink tasks for PostgreSQL through the RavenDB Management Studio Ongoing Tasks interface." +sidebar_position: 8 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: Studio UI + + + +* CDC Sink tasks are created and managed through the **Ongoing Tasks** section of the + Management Studio. + +* In this page: + * [Creating a CDC Sink Task](#creating-a-cdc-sink-task) + * [Monitoring Task State](#monitoring-task-state) + * [Editing a Task](#editing-a-task) + + + +--- + +## Creating a CDC Sink Task + +To create a CDC Sink task in the Studio: + +1. Navigate to **Databases** → your database → **Ongoing Tasks** +2. Click **Add Task** and select **CDC Sink** +3. Configure the connection string pointing to the PostgreSQL source +4. Add one or more root tables with their column mappings +5. (Optional) Add embedded tables and linked tables for each root table +6. (Optional) Configure patches for INSERT/UPDATE/DELETE events +7. Click **Save** + +--- + +## Monitoring Task State + +The task list shows: + +* **Task name** and **connection string name** +* **State** — Active, Disabled, Error, or FallbackMode +* **Responsible node** — the cluster node currently running the task +* **Progress** — during initial load, shows which tables have been scanned + +Clicking on a task opens the detail view with per-table statistics. + +--- + +## Editing a Task + +To edit a task, click its name in the Ongoing Tasks list. The same configuration +form used for creation opens in edit mode. + + +The replication slot and publication names are **fixed at task creation** and do not +change when you edit the task — regardless of whether names were auto-generated or +specified explicitly. + +When you add or remove tables, CDC Sink continues using the same slot and publication +names. The **publication** in PostgreSQL controls which tables are replicated and must +be kept in sync with the task configuration: + +* **Adding a table:** If CDC Sink has the necessary permissions, it will automatically + run `ALTER PUBLICATION ADD TABLE ;` when the task is saved. If it does + not have permissions, a database administrator must run this manually. +* **Removing a table:** `ALTER PUBLICATION DROP TABLE
;` must be run + manually by a database administrator — CDC Sink does not remove tables from the + publication automatically. + +Orphaned slots and publications must be dropped manually by the database administrator. +See [Cleanup and Maintenance](./cleanup-and-maintenance.mdx). + + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [Initial Setup](./initial-setup.mdx) +- [Cleanup and Maintenance](./cleanup-and-maintenance.mdx) + +### CDC Sink + +- [Monitoring](../monitoring.mdx) +- [API Reference](../api-reference.mdx) +- [Configuration Reference](../configuration-reference.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/postgres/wal-configuration.mdx b/docs/server/ongoing-tasks/cdc-sink/postgres/wal-configuration.mdx new file mode 100644 index 0000000000..e4263bfdf8 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/postgres/wal-configuration.mdx @@ -0,0 +1,128 @@ +--- +title: "CDC Sink for PostgreSQL: WAL Configuration" +sidebar_label: WAL Configuration +description: "How to verify and configure the PostgreSQL Write-Ahead Log level to enable logical replication required by CDC Sink." +sidebar_position: 1 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for PostgreSQL: WAL Configuration + + + +* CDC Sink uses PostgreSQL **logical replication**, which requires the Write-Ahead Log + (WAL) to be configured at the `logical` level. + +* This page explains how to verify the current setting and change it if needed. + +* In this page: + * [Check Current WAL Level](#check-current-wal-level) + * [Enable Logical Replication](#enable-logical-replication) + * [Other Required Settings](#other-required-settings) + + + +--- + +## Check Current WAL Level + +Connect to your PostgreSQL instance and run: + + + + +{`SHOW wal_level; +`} + + + + +If the result is `logical`, no changes are needed. + +If the result is `replica` or `minimal`, logical replication is not enabled and +must be configured before CDC Sink can run. + +--- + +## Enable Logical Replication + +Edit `postgresql.conf` and set the following: + + + + +{`wal_level = logical +`} + + + + +This change requires a **PostgreSQL restart**. + +After restarting, verify the change took effect: + + + + +{`SHOW wal_level; +-- Should return: logical +`} + + + + + +Changing `wal_level` requires a full server restart, not just a configuration reload. +Plan for a brief maintenance window. + + +--- + +## Other Required Settings + +CDC Sink uses one replication slot per task. Ensure the following settings are +sufficient for the number of CDC Sink tasks you plan to run: + + + + +{`max_replication_slots = 10 -- at least 1 per CDC Sink task +max_wal_senders = 10 -- at least 1 per active replication connection +`} + + + + +The defaults in a standard PostgreSQL installation are typically sufficient for +a small number of tasks, but you may need to increase them if you have many +concurrent CDC Sink tasks or other replication consumers. + +Check current values: + + + + +{`SHOW max_replication_slots; +SHOW max_wal_senders; +`} + + + + +These settings also require a **server restart** if changed. + +--- + +## Related Articles + +### CDC Sink for PostgreSQL + +- [Prerequisites Checklist](./prerequisites-checklist.mdx) +- [Permissions and Roles](./permissions-and-roles.mdx) +- [Cleanup and Maintenance](./cleanup-and-maintenance.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/property-retention.mdx b/docs/server/ongoing-tasks/cdc-sink/property-retention.mdx new file mode 100644 index 0000000000..8f64d7c68e --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/property-retention.mdx @@ -0,0 +1,155 @@ +--- +title: "CDC Sink: Property Retention" +sidebar_label: Property Retention +description: "Explains how CDC Sink merges UPDATE events onto existing RavenDB documents, preserving properties that are not part of the column mapping." +sidebar_position: 8 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Property Retention + + + +* When CDC Sink applies an UPDATE to an existing RavenDB document, it **merges** + the new values onto the existing document rather than replacing it entirely. + +* Properties that are not part of the column mapping are preserved across updates. + This allows RavenDB-side data to coexist safely with CDC-managed properties. + +* In this page: + * [How Merging Works](#how-merging-works) + * [What Is and Isn't Preserved](#what-is-and-isnt-preserved) + * [Editing Documents Directly in RavenDB](#editing-documents-directly-in-ravendb) + * [Implications for Patches](#implications-for-patches) + + + +--- + +## How Merging Works + +When a CDC UPDATE arrives for a document that already exists: + +1. The existing document is loaded +2. Mapped column values from the CDC event overwrite the corresponding properties +3. All other properties on the document are left unchanged +4. The merged document is written back + +**Example:** + +Initial SQL row → initial document: + + + + +{`\{ + "Id": 1, + "Name": "Alice", + "Email": "alice@example.com", + "InternalNotes": "VIP customer", + "@metadata": \{ "@collection": "Customers" \} +\} +`} + + + + +SQL UPDATE: `UPDATE customers SET email = 'alice.new@example.com' WHERE id = 1` + +Document after CDC UPDATE: + + + + +{`\{ + "Id": 1, + "Name": "Alice", + "Email": "alice.new@example.com", + "InternalNotes": "VIP customer", + "@metadata": \{ "@collection": "Customers" \} +\} +`} + + + + +`Email` is updated from SQL, while `InternalNotes` (not in the `Columns` list) is preserved. + +--- + +## What Is and Isn't Preserved + +**Preserved across CDC updates:** + +* Properties not listed in the `Columns` list +* Properties set in RavenDB directly (annotations, computed values, flags) +* Document metadata (unless the patch explicitly modifies it) + +**Overwritten on CDC update:** + +* Any property mapped via the `Columns` list — always updated to match the current SQL value + +If you manually edit a property that is part of the `Columns` list, the next CDC UPDATE +for that row will overwrite your edit with the SQL value. + +--- + +## Editing Documents Directly in RavenDB + +You can safely add properties to CDC-managed documents: + + + + +{`\{ + "Id": 1, + "Name": "Alice", + "Email": "alice@ex.com", + "InternalNotes": "...", + "ReviewedAt": "...", + "Tags": ["vip"] +\} +`} + + + + +Properties managed by CDC (those in the `Columns` list) will be overwritten on +the next UPDATE from the source database. Do not rely on manual edits to mapped +properties surviving future CDC updates. + +**CDC Sink does not detect or protect manual edits to mapped properties.** +If you need to preserve a value that comes from SQL, consider adding a separate +RavenDB-only property for your annotation and leaving the SQL-mapped property as-is. + +--- + +## Implications for Patches + +Patches run after column mapping and can set additional properties that are not +from the `Columns` list. These patch-computed properties follow the same merge rules: + +* If a patch sets `this.ComputedField = ...`, that value persists across future events + where the patch doesn't explicitly change it +* If a patch sets a property that is also in the `Columns` list, the column mapping + value takes precedence (mapping is applied before patching) + +For aggregates maintained via patches (e.g., `RunningTotal`), the patch itself +is responsible for keeping the value correct across INSERT, UPDATE, and DELETE events. +See [Patching](./patching.mdx). + +--- + +## Related Articles + +### CDC Sink + +- [Column Mapping](./column-mapping.mdx) +- [Patching](./patching.mdx) +- [How It Works](./how-it-works.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/schema-design.mdx b/docs/server/ongoing-tasks/cdc-sink/schema-design.mdx new file mode 100644 index 0000000000..3958b38dca --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/schema-design.mdx @@ -0,0 +1,347 @@ +--- +title: "CDC Sink: Schema Design" +sidebar_label: Schema Design +description: "Explains how CDC Sink maps a relational schema to a RavenDB document model using root tables, embedded tables, and linked tables." +sidebar_position: 2 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Schema Design + + + +* CDC Sink maps a relational schema to a RavenDB document model through configuration. + This page explains the three building blocks — root tables, embedded tables, and + linked tables — and how to combine them. + +* In this page: + * [Root Tables](#root-tables) + * [Embedded Tables](#embedded-tables) + * [Linked Tables](#linked-tables) + * [Primary Key and Join Column Requirements](#primary-key-and-join-column-requirements) + * [Multi-Level Nesting](#multi-level-nesting) + * [Relation Types](#relation-types) + * [Choosing Between Embedded and Linked](#choosing-between-embedded-and-linked) + + + +--- + +## Root Tables + +A **root table** maps a SQL table to a RavenDB collection. Each row in the SQL table +becomes one document. + + + + +{`new CdcSinkTableConfig +\{ + CollectionName = "Orders", // RavenDB collection name + SourceTableSchema = "public", // SQL schema (optional, default: "public") + SourceTableName = "orders", // SQL table name + PrimaryKeyColumns = ["id"], // Used for document ID generation + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "customer_name", Name = "CustomerName" \}, + new CdcColumnMapping() \{ Column = "total", Name = "Total" \}, + ] +\} +`} + + + + +**Document ID generation:** `\{CollectionName\}/\{pk1\}/\{pk2\}/...` +A row with `id = 42` and collection `Orders` becomes document `Orders/42`. +A composite PK `(region, id)` with values `(US, 42)` becomes `Orders/US/42`. + +**Column mapping:** Only columns listed in `Columns` appear in the document. Unmapped columns are +still available in patch scripts via `$row` but are not stored in the document. + +--- + +## Embedded Tables + +An **embedded table** creates nested data within a parent document. For example, a +SQL `order_lines` table becomes an array inside each `Orders` document. + + + + +{`new CdcSinkTableConfig +\{ + CollectionName = "Orders", + SourceTableName = "orders", + PrimaryKeyColumns = ["id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "customer_name", Name = "CustomerName" \}, + ], + EmbeddedTables = + [ + new CdcSinkEmbeddedTableConfig + \{ + SourceTableName = "order_lines", + PropertyName = "Lines", // Property in parent document + Type = CdcSinkRelationType.Array, // Array of items + JoinColumns = ["order_id"], // FK referencing parent's PK + PrimaryKeyColumns = ["line_id"], // Used to match items on update/delete + Columns = + [ + new CdcColumnMapping() \{ Column = "line_id", Name = "LineId" \}, + new CdcColumnMapping() \{ Column = "product", Name = "Product" \}, + new CdcColumnMapping() \{ Column = "quantity", Name = "Quantity" \}, + ] + \} + ] +\} +`} + + + + +This produces documents like: + + + + +{`\{ + "Id": 1, + "CustomerName": "Alice", + "Lines": [ + \{ "LineId": 1, "Product": "Apples", "Quantity": 5 \}, + \{ "LineId": 2, "Product": "Bananas", "Quantity": 3 \} + ], + "@metadata": \{ "@collection": "Orders" \} +\} +`} + + + + +--- + +## Linked Tables + +A **linked table** creates a document ID reference rather than embedding data. +A foreign key in the source row becomes a RavenDB document ID. + + + + +{`new CdcSinkTableConfig +\{ + CollectionName = "Orders", + SourceTableName = "orders", + PrimaryKeyColumns = ["id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "id", Name = "Id" \}, + new CdcColumnMapping() \{ Column = "customer_id", Name = "CustomerId" \}, + ], + LinkedTables = + [ + new CdcSinkLinkedTableConfig + \{ + SourceTableName = "customers", + PropertyName = "Customer", // Property in parent document + LinkedCollectionName = "Customers", // Target collection for ID + JoinColumns = ["customer_id"] // FK used to build the ID + \} + ] +\} +`} + + + + +With `customer_id = 42`, the document gets `"Customer": "Customers/42"`. + +--- + +## Primary Key and Join Column Requirements + +### Root Tables + +The `PrimaryKeyColumns` list defines which SQL columns are used to generate the document ID. +All PK columns must be present in every INSERT, UPDATE, and DELETE event. + +### Embedded Tables (One Level) + +An embedded table needs: + +* **PrimaryKeyColumns** — Used to match items within the parent's array for UPDATE and DELETE +* **JoinColumns** — Foreign key referencing the parent's `PrimaryKeyColumns` + +The `JoinColumns` must exactly match the parent's `PrimaryKeyColumns`: + +| Parent PK | Required JoinColumns | Valid? | +|-----------|---------------------|--------| +| `[id]` | `[order_id]` where `order_id` = parent's `id` | ✓ | +| `[id, year]` | `[order_id, order_year]` | ✓ | +| `[id]` | `[customer_id, order_id]` | ✗ Extra column not from parent PK | +| `[id, year]` | `[order_id]` | ✗ Missing `order_year` | + +### DELETE Events and REPLICA IDENTITY + +For DELETE events, the source database must include the join column values so CDC Sink +can route the delete to the correct parent document. + +If the join column is not part of the SQL table's primary key, the source database may +need additional configuration to include it in DELETE events. + +See [REPLICA IDENTITY](./postgres/replica-identity.mdx) for +the PostgreSQL-specific requirement and how CDC Sink handles it automatically. + +--- + +## Multi-Level Nesting + +Embedded tables can themselves have embedded tables, creating arbitrarily deep hierarchies. + +**Example: Company → Departments → Employees** + + + + +{`new CdcSinkTableConfig +\{ + CollectionName = "Companies", + SourceTableName = "companies", + PrimaryKeyColumns = ["company_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "company_id", Name = "CompanyId" \}, + new CdcColumnMapping() \{ Column = "name", Name = "Name" \}, + ], + EmbeddedTables = + [ + new CdcSinkEmbeddedTableConfig + \{ + SourceTableName = "departments", + PropertyName = "Departments", + Type = CdcSinkRelationType.Array, + JoinColumns = ["company_id"], // Root FK + PrimaryKeyColumns = ["dept_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "dept_id", Name = "DeptId" \}, + new CdcColumnMapping() \{ Column = "dept_name", Name = "DeptName" \}, + ], + EmbeddedTables = + [ + new CdcSinkEmbeddedTableConfig + \{ + SourceTableName = "employees", + PropertyName = "Employees", + Type = CdcSinkRelationType.Array, + JoinColumns = ["company_id", "dept_id"], // Root FK + parent FK + PrimaryKeyColumns = ["emp_id"], + Columns = + [ + new CdcColumnMapping() \{ Column = "emp_id", Name = "EmpId" \}, + new CdcColumnMapping() \{ Column = "emp_name", Name = "EmpName" \}, + ] + \} + ] + \} + ] +\} +`} + + + + +**Critical requirement for deep nesting:** All descendant tables must carry the root +table's primary key as a denormalized column. The `employees` table must have +`company_id` even though it joins directly to `departments` via `dept_id`. + +This is required because CDC Sink needs to route every row to the correct root document +in a single pass, without additional lookups. + +**SQL schema to support this:** + + + + +{`CREATE TABLE employees ( + company_id INT NOT NULL, -- Denormalized root FK + dept_id INT NOT NULL, -- Parent FK + emp_id INT NOT NULL, -- Local PK + emp_name VARCHAR(200), + PRIMARY KEY (company_id, dept_id, emp_id) +); +`} + + + + +Including all routing columns in the primary key also avoids the need for REPLICA IDENTITY +configuration — the default DELETE events include all PK columns. + +--- + +## Relation Types + +The `Type` property on embedded tables controls the document structure: + +| Type | Use Case | Document Structure | +|------|----------|--------------------| +| `Array` | One-to-many: parent has many children | `"Lines": [\{ ... \}, \{ ... \}]` | +| `Map` | One-to-many with direct key lookup | `"Lines": \{ "1": \{ ... \}, "2": \{ ... \} \}` | +| `Value` | Many-to-one: parent has one child | `"Customer": \{ ... \}` | + +**Array** — Items are matched by `PrimaryKeyColumns` for UPDATE and DELETE. +Use when you need to iterate over all items. + +**Map** — Items are stored as a JSON object keyed by the primary key value(s). +Use when you need fast direct-key access within the document. + +**Value** — Stores a single embedded object. +Use for many-to-one relationships (many orders share one customer). + +--- + +## Choosing Between Embedded and Linked + +| Consideration | Embedded | Linked | +|--------------|----------|--------| +| Data location | Stored inside parent document | Stored in a separate document | +| Access pattern | Read parent to get all data | Load parent, then load referenced doc | +| Updates | Automatic via CDC | Automatic via CDC for each table | +| Document size | Grows with embedded items | Parent stays small | +| Use case | Parent owns child (orders own lines) | Independent entities (orders reference customers) | + +**Use embedded tables** when: +* The child entity has no meaning outside the parent (order lines without an order) +* You always read the parent and child together +* You want a single-document read + +**Use linked tables** when: +* The referenced entity is independently meaningful (customers exist without orders) +* The referenced entity is shared by many parents +* You want RavenDB's include loading to handle the join + +--- + +## Related Articles + +### CDC Sink + +- [Embedded Tables](./embedded-tables.mdx) +- [Linked Tables](./linked-tables.mdx) +- [Column Mapping](./column-mapping.mdx) +- [Configuration Reference](./configuration-reference.mdx) + +### PostgreSQL + +- [REPLICA IDENTITY](./postgres/replica-identity.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/server-configuration.mdx b/docs/server/ongoing-tasks/cdc-sink/server-configuration.mdx new file mode 100644 index 0000000000..fe680783b2 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/server-configuration.mdx @@ -0,0 +1,76 @@ +--- +title: "CDC Sink: Server Configuration" +sidebar_label: Server Configuration +description: "Documents the RavenDB server configuration keys that control CDC Sink task behavior, including batch size, fallback timeout, and poll interval." +sidebar_position: 15 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Server Configuration + + + +* This page documents the RavenDB server configuration keys that control CDC Sink + task behavior. + +* In this page: + * [Configuration Keys](#configuration-keys) + + + +--- + +## Configuration Keys + +These keys can be set in `settings.json` or passed as environment variables. +See [Configuration Overview](../../../server/configuration/configuration-options). + +--- + +#### `CdcSink.MaxBatchSize` + +**Default:** `1024` + +Target number of change rows processed in a single batch before writing to RavenDB. A batch may exceed this size when a source database transaction contains more rows, since transactions are never split across batches. + +Larger values increase throughput but also increase memory usage per batch. + +--- + +#### `CdcSink.MaxFallbackTimeInSec` + +**Default:** `900` (15 minutes) + +Maximum number of seconds the CDC Sink process will stay in fallback mode after a failure before retrying. Fallback duration doubles on each consecutive failure, up to this cap. + +Set to `0` to disable fallback mode entirely — the task will retry immediately on each failure without exponential backoff. + +--- + +#### `CdcSink.PollIntervalInSec` + +**Default:** `1` + +How frequently (in seconds) the CDC Sink polls the source database for new change rows. This setting applies to **SQL Server** only, which uses polling-based change capture. + +PostgreSQL and MySQL use streaming replication and ignore this setting. + +--- + +## Related Articles + +### CDC Sink + +- [Overview](./overview.mdx) +- [Monitoring](./monitoring.mdx) +- [Failover and Consistency](./failover-and-consistency.mdx) + +### Server Configuration + +- [Configuration Overview](../../../server/configuration/configuration-options) diff --git a/docs/server/ongoing-tasks/cdc-sink/source-schema-changes.mdx b/docs/server/ongoing-tasks/cdc-sink/source-schema-changes.mdx new file mode 100644 index 0000000000..cfd12c7fba --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/source-schema-changes.mdx @@ -0,0 +1,267 @@ +--- +title: "CDC Sink: Source Schema Changes" +sidebar_label: Source Schema Changes +description: "How to safely modify your source database schema while a CDC Sink task is running — behavior per database engine, recovery, and the SQL Server capture instance procedure." +sidebar_position: 13 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Source Schema Changes + + + +* This page explains how CDC Sink handles schema changes (columns added, removed, + renamed, or retyped) on the source database tables while the task is running. + +* The behavior differs by database engine. PostgreSQL is the most resilient, + MySQL detects most changes automatically, and SQL Server requires an explicit + procedure for every schema change. + +* In this page: + * [Quick Reference](#quick-reference) + * [PostgreSQL](#postgresql) + * [MySQL](#mysql) + * [SQL Server](#sql-server) + * [Recovery From Errors](#recovery-from-errors) + * [General Recommendation](#general-recommendation) + + + +--- + +## Quick Reference + +| Change | MySQL | PostgreSQL | SQL Server | +|--------|-------|------------|------------| +| Add column (unmapped) | Transparent | Transparent | Requires [procedure](#sql-server-schema-change-procedure) | +| Add column (mapped) | Update config + restart | Update config + restart | Requires [procedure](#sql-server-schema-change-procedure) + config update | +| Drop unmapped column | Transparent | Transparent | Requires [procedure](#sql-server-schema-change-procedure) | +| Drop mapped column | Enters fallback (config error) | Enters fallback (config error) | Requires [procedure](#sql-server-schema-change-procedure) | +| Change column type | Restarts, usually recovers | Transparent (auto-rebuild) | Requires [procedure](#sql-server-schema-change-procedure) | +| Rename mapped column | Enters fallback | Enters fallback | Requires [procedure](#sql-server-schema-change-procedure) | + +--- + +## PostgreSQL + +PostgreSQL logical replication sends `RelationMessage` events inline in the WAL stream +whenever the schema changes. CDC Sink detects these automatically and rebuilds its +column mapping on the fly. This makes PostgreSQL the **most resilient** to schema changes. + +* **Adding or dropping unmapped columns** — transparent. + +* **Changing a column type** — usually transparent. A new `RelationMessage` triggers + an update to the internal columns mapping inside the sink. + +* **Dropping or renaming a mapped column** — CDC Sink enters fallback because the + mapped column name is no longer found in the new schema. Update the task + configuration to match the new column name, and CDC Sink recovers on the next retry. + +### PostgreSQL Examples + + + + +{`-- Transparent: adding an unmapped column +ALTER TABLE orders ADD COLUMN internal_notes TEXT; +-- CDC Sink auto-rebuilds and continues + +-- Transparent: changing a column type +ALTER TABLE orders ALTER COLUMN status TYPE VARCHAR(100); +-- CDC Sink auto-rebuilds and continues + +-- Requires config update: renaming a mapped column +ALTER TABLE orders RENAME COLUMN name TO full_name; +-- CDC Sink enters fallback. Update column mapping from 'name' to 'full_name'. +`} + + + + +--- + +## MySQL + +MySQL CDC uses binlog replication. Each row event is preceded by a `TableMapEvent` +containing column type information. CDC Sink compares the column types at mapped +positions against the expected types from `INFORMATION_SCHEMA`. If they don't match, +the process detects a schema change. + +* **Adding or dropping unmapped columns** — transparent. The unmapped column is ignored. + The CDC Sink may restart itself to re-resovle the current schema from `INFORMATION_SCHEMA`, + but normal operations will resume momentarily. + +* **Changing a column type** — CDC Sink detects the type change via binlog column types, + restarts, re-resolves from `INFORMATION_SCHEMA`, and continues. + +* **Dropping or renaming a mapped column** — CDC Sink enters fallback because the + mapped column is no longer found. Update the task configuration to match the new + schema. + + + +**Apply schema changes one at a time.** MySQL CDC Sink detects schema changes by +comparing column types at specific positions. If multiple changes are applied in a +single `ALTER TABLE` statement (e.g., `ADD COLUMN ... AFTER ...`, or adding and +removing columns together), the positional shifts can compound and make it difficult +for CDC Sink to resolve the new schema correctly. + +Apply one schema change, let CDC Sink catch up and confirm it is processing normally, +then apply the next change. This ensures each change is simple and isolated. + + + +### MySQL Examples + + + + +{`-- Transparent: adding an unmapped column +ALTER TABLE orders ADD COLUMN internal_notes TEXT; +-- CDC Sink continues without interruption + +-- Auto-recovers: changing a column type +ALTER TABLE orders MODIFY COLUMN status VARCHAR(100); +-- CDC Sink detects change, restarts, and recovers automatically + +-- Requires config update: dropping a mapped column +ALTER TABLE orders DROP COLUMN legacy_field; +-- CDC Sink enters fallback. Update the config to remove 'legacy_field' from Columns. +`} + + + + +--- + +## SQL Server + +SQL Server CDC uses **capture instances** — immutable snapshots of the table schema +at the time `sp_cdc_enable_table` was executed. The CDC change tables reflect the +schema as it was when the capture instance was created, **not** the current table +schema. This means **any** schema change on a CDC-tracked table requires creating +a new capture instance. + +SQL Server supports at most **two** capture instances per table at a time. The CDC +Sink always reads from the oldest active instance. When the old instance is dropped, +CDC Sink detects this (via `fn_cdc_get_min_lsn` returning all zeros), restarts, and +picks up the new instance. + +### SQL Server Schema Change Procedure + +For **any** schema change on a SQL Server CDC-tracked table, follow this procedure: + + + + +{`-- Step 1: ALTER the table +ALTER TABLE dbo.orders ADD priority INT DEFAULT 0; + +-- Step 2: Create a NEW capture instance (the old one still exists) +EXEC sys.sp_cdc_enable_table + @source_schema = N'dbo', + @source_name = N'orders', + @capture_instance = N'dbo_orders_v2', + @role_name = NULL; + +-- Step 3: WAIT — let CDC Sink drain the OLD capture instance. +-- The sink reads from the oldest instance and processes all remaining changes. +-- Monitor the CDC Sink task status in Studio to confirm it has caught up. + +-- Step 4: Drop the OLD capture instance +EXEC sys.sp_cdc_disable_table + @source_schema = N'dbo', + @source_name = N'orders', + @capture_instance = N'dbo_orders'; + +-- CDC Sink detects the drop, restarts, and picks up dbo_orders_v2. +`} + + + + +**What happens at each step:** + +1. **After step 2** — Two capture instances exist. CDC Sink continues reading from the + oldest one (`dbo_orders`). New changes are captured by both instances. +2. **After step 3** — CDC Sink has processed all changes from the old instance. It's + safe to drop it. +3. **After step 4** — `fn_cdc_get_min_lsn('dbo_orders')` returns all zeros. CDC Sink + detects this on the next poll cycle, enters fallback, and restarts against the new + instance. + +If the schema change also affects column mappings, update the CDC Sink task +configuration after the new capture instance is active. + + + +Do not drop the old capture instance before CDC Sink has drained it — you may lose +changes that were captured in the old instance but not yet processed. + + + +### SQL Server Agent + +CDC requires SQL Server Agent to be running. The capture job (which populates change +tables) and cleanup job (which purges old changes) are Agent jobs. + + + + +{`-- Check Agent status +SELECT * FROM sys.dm_server_services WHERE servicename LIKE '%Agent%'; +`} + + + + +Without the Agent, changes are never captured and CDC Sink polls indefinitely without +finding data. + +--- + +## Recovery From Errors + +All three providers use the same recovery mechanism: + +1. **Error detected** — process enters fallback mode with exponential backoff (starting + at 5 seconds, up to `CdcSink.MaxFallbackTimeInSec`) +2. **On each retry** — process restarts: re-resolves schema, re-opens connections +3. **If the error persists** (e.g., mapped column still missing) — retries continue + with increasing backoff. A notification is raised for each error, visible in Studio. +4. **After fixing the configuration** — the process recovers automatically on the next + retry cycle. No manual restart is needed. + +--- + +## General Recommendation + +For schema changes that affect mapped columns on any database engine: + +1. Update the CDC Sink task configuration to match the new schema +2. Apply the schema change on the source database +3. For SQL Server: follow the [capture instance procedure](#sql-server-schema-change-procedure) above + +CDC Sink will detect the change and recover automatically. Disabling the task +beforehand is not required but can avoid a brief fallback cycle. + +--- + +## Related Articles + +### CDC Sink + +- [Configuration Reference](./configuration-reference.mdx) +- [Column Mapping](./column-mapping.mdx) +- [How It Works: Updating the Task Configuration](./how-it-works.mdx#updating-the-task-configuration) +- [Troubleshooting](./troubleshooting.mdx) + +### SQL Server + +- [SQL Server Overview](./sql-server/overview.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/sql-server/_category_.json b/docs/server/ongoing-tasks/cdc-sink/sql-server/_category_.json new file mode 100644 index 0000000000..53959810fc --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/sql-server/_category_.json @@ -0,0 +1 @@ +{"position": 17, "label": "SQL Server"} diff --git a/docs/server/ongoing-tasks/cdc-sink/sql-server/overview.mdx b/docs/server/ongoing-tasks/cdc-sink/sql-server/overview.mdx new file mode 100644 index 0000000000..8bb1add581 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/sql-server/overview.mdx @@ -0,0 +1,100 @@ +--- +title: "CDC Sink for SQL Server: Overview" +sidebar_label: Overview +description: "CDC Sink support for SQL Server — using native SQL Server Change Data Capture to stream changes into RavenDB." +sidebar_position: 0 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink for SQL Server: Overview + + + +* CDC Sink supports SQL Server as a source database using native + [SQL Server Change Data Capture](https://learn.microsoft.com/en-us/sql/relational-databases/track-changes/about-change-data-capture-sql-server). + +* SQL Server CDC captures row-level changes from the transaction log and stores them + in change tables. CDC Sink polls these change tables to stream INSERTs, UPDATEs, + and DELETEs into RavenDB documents. + + + +--- + +## Connection String + +Use a `SqlConnectionString` with `FactoryName` set to `"System.Data.SqlClient"` or +`"Microsoft.Data.SqlClient"`: + + + + +{`store.Maintenance.Send(new PutConnectionStringOperation( + new SqlConnectionString + \{ + Name = "MySqlServerConnection", + FactoryName = "Microsoft.Data.SqlClient", + ConnectionString = "Server=myserver;Database=mydb;Trusted_Connection=True;" + \})); +`} + + + + +--- + +## Prerequisites + +SQL Server CDC must be enabled on the source database and on each table you want +to capture: + + + + +{`-- Enable CDC on the database +EXEC sys.sp_cdc_enable_db; + +-- Enable CDC on a table +EXEC sys.sp_cdc_enable_table + @source_schema = N'dbo', + @source_name = N'orders', + @role_name = NULL; +`} + + + + +The SQL Server Agent service must be running for CDC to function. + +--- + +## Polling Behavior + +Unlike PostgreSQL (which uses streaming replication), SQL Server CDC Sink uses polling. +The `CdcSink.PollIntervalInSec` server configuration key controls how frequently +the task polls for new changes. See [Server Configuration](../server-configuration.mdx). + +--- + +## Configuration + +The CDC Sink task configuration is the same across all database engines. +See [Configuration Reference](../configuration-reference.mdx) for full details. + +The `SourceTableSchema` defaults to `"dbo"` when using SQL Server. + +--- + +## Related Articles + +### CDC Sink + +- [Overview](../overview.mdx) +- [Configuration Reference](../configuration-reference.mdx) +- [PostgreSQL: Prerequisites Checklist](../postgres/prerequisites-checklist.mdx) diff --git a/docs/server/ongoing-tasks/cdc-sink/troubleshooting.mdx b/docs/server/ongoing-tasks/cdc-sink/troubleshooting.mdx new file mode 100644 index 0000000000..5c8b275645 --- /dev/null +++ b/docs/server/ongoing-tasks/cdc-sink/troubleshooting.mdx @@ -0,0 +1,262 @@ +--- +title: "CDC Sink: Troubleshooting" +sidebar_label: Troubleshooting +description: "Covers common problems encountered when running CDC Sink tasks and how to resolve them, including startup failures, error states, and missing documents." +sidebar_position: 14 +--- + +import Admonition from '@theme/Admonition'; +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import CodeBlock from '@theme/CodeBlock'; +import LanguageSwitcher from "@site/src/components/LanguageSwitcher"; +import LanguageContent from "@site/src/components/LanguageContent"; + +# CDC Sink: Troubleshooting + + + +* This page covers common problems encountered when running CDC Sink tasks and how + to resolve them. + +* In this page: + * [Task Won't Start](#task-wont-start) + * [Task Enters Error State](#task-enters-error-state) + * [Documents Not Appearing](#documents-not-appearing) + * [Embedded Items Missing or Incorrect](#embedded-items-missing-or-incorrect) + * [DELETE Not Applied](#delete-not-applied) + * [Patch Errors](#patch-errors) + * [Source Schema Changes](#source-schema-changes) + * [Partial Export/Import and State Loss](#partial-exportimport-and-state-loss) + + + +--- + +## Task Won't Start + +**Symptom:** Task remains in `Error` state immediately after creation. + +**Common causes:** + +* **Invalid connection string** — verify the connection string name matches one defined + in the SQL connection strings section of the Management Studio. + +* **Connection refused** — the source database is unreachable from the RavenDB server. + Verify host, port, and firewall rules. + +* **Authentication failure** — the credentials in the connection string are incorrect + or the user does not have the required permissions. + See the PostgreSQL [Permissions and Roles](./postgres/permissions-and-roles.mdx) page. + +* **WAL level not set to `logical`** — CDC Sink requires logical replication enabled + on the source. + See [WAL Configuration](./postgres/wal-configuration.mdx). + +Check the error message in the Notification Center (bell icon in Studio) for the +specific failure reason. + +--- + +## Task Enters Error State + +**Symptom:** Task was running, then stopped with an error. + +**Common causes:** + +* **Replication slot dropped** — if the PostgreSQL replication slot was dropped + externally, CDC Sink cannot resume. The task must be deleted and recreated (a new + slot will be created on next start). + +* **Source table schema changed** — adding or removing columns from the SQL table + may cause mapping errors. See [Source Schema Changes](./source-schema-changes.mdx). + +* **Replication lag too large** — if CDC Sink falls behind significantly, PostgreSQL + may retain WAL segments that fill the disk. See your PostgreSQL documentation on + `wal_keep_size` and replication slot lag. + +* **Exceeded fallback timeout** — the source was unreachable for longer than + `CdcSink.MaxFallbackTimeInSec`. The task moves to error state after this timeout. + Restore connectivity and re-enable the task. + +--- + +## Documents Not Appearing + +**Symptom:** SQL rows exist but corresponding RavenDB documents are not created. + +**Check:** + +1. **Initial load in progress** — CDC Sink performs a full table scan before streaming. + For large tables this can take time. Check the task progress in Studio. + +2. **Table is disabled** — verify `Disabled` is not set to `true` on the `CdcSinkTableConfig`. + +3. **Primary key mismatch** — the columns listed in `PrimaryKeyColumns` must match the + actual SQL primary key. If they don't match, CDC Sink cannot generate a document ID. + +4. **Columns list is empty** — at least one column must be mapped. A table with no + column mappings will produce empty documents. + +5. **Task is paused** — check that the task state is `Active`, not `Disabled`. + +--- + +## Embedded Items Missing or Incorrect + +**Symptom:** Parent document exists but embedded array/map is empty or items are missing. + +**Check:** + +1. **JoinColumns mismatch** — the `JoinColumns` in `CdcSinkEmbeddedTableConfig` must + reference the same columns (by name) that the SQL foreign key uses. Verify these + match the parent table's `PrimaryKeyColumns`. + +2. **PrimaryKeyColumns wrong** — the embedded table's `PrimaryKeyColumns` are used to + match items on UPDATE and DELETE. If they're wrong, new items may be created as + duplicates or existing items may not be found. + +3. **DELETE not reflected** — if embedded item rows are being deleted in SQL but the + items remain in RavenDB, check whether `REPLICA IDENTITY` is configured for the + embedded table. Without it, PostgreSQL does not include the old row values in the + DELETE event, and CDC Sink cannot identify which item to remove. + See [REPLICA IDENTITY](./postgres/replica-identity.mdx). + +4. **Type mismatch (Array vs Map)** — if the `Type` was changed from `Array` to `Map` + or vice versa after documents were already created, existing documents retain the + old structure. Re-process from scratch or migrate existing documents. + +--- + +## DELETE Not Applied + +**Symptom:** A row was deleted from SQL but the RavenDB document still exists. + +**Check:** + +1. **`IgnoreDeletes = true`** — if `OnDelete.IgnoreDeletes` is set to `true`, the + delete is intentionally discarded. Verify this is the intended behavior. + +2. **Missing REPLICA IDENTITY (embedded tables)** — for embedded table items, + PostgreSQL must include old row values in DELETE events so CDC Sink knows which + parent document to update. If `REPLICA IDENTITY` is not set to `FULL` (or to + an index that includes join columns), the DELETE event may be missing the columns + needed to route it. + See [REPLICA IDENTITY](./postgres/replica-identity.mdx). + +3. **Custom patch returns without deleting** — if an `OnDelete.Patch` is configured + that does not allow the delete to proceed (combined with `IgnoreDeletes = true`), + the document is kept. Review the patch logic. + +--- + +## Patch Errors + +**Symptom:** Task enters error state with a message referencing a JavaScript patch failure. + +**Check:** + +1. **Script syntax error** — test the patch script in the Management Studio's patch + editor before using it in CDC Sink. + +2. **Null reference** — `$row` properties and `$old` may be `null` for certain event + types. Use optional chaining: `$old?.Amount || 0`. + +3. **`load()` returns null** — a document loaded with `load()` may not exist yet if CDC + Sink processes tables out of dependency order. Guard with a null check: + + + + +{`const related = load("Collection/123"); +if (related) \{ ... \} +`} + + + + +4. **Step limit exceeded** — patch scripts have a step quota. If the script is very + long-running or loops over large arrays, it may hit `Patching.MaxStepsForScript`. + See the [configuration reference](../../../server/configuration/patching-configuration) for this setting. + +--- + +## Source Schema Changes + +Adding, removing, renaming, or retyping columns on the source database while a CDC +Sink task is running requires different handling depending on the database engine. +PostgreSQL and MySQL detect most changes automatically; SQL Server requires an explicit +capture instance procedure for every schema change. + +See [Source Schema Changes](./source-schema-changes.mdx) for the full guide, including +per-engine behavior, examples, and the SQL Server procedure. + +--- + +## Partial Export/Import and State Loss + +**Symptom:** After importing a database export, CDC Sink re-reads all data from the +source database from the beginning, performing a full initial load even though the +task configuration is present and the data already exists in RavenDB. + +**Cause:** + +CDC Sink stores two separate pieces of information: + +* **Task configuration** — stored in the cluster-wide **database record**. This is + always included in any export/import because it is part of the database definition. +* **Processing state (last LSN)** — stored as a **document** in the `@cdc-states` + system collection. This tracks the last position read from the source database + (e.g., the PostgreSQL LSN or SQL Server change tracking version). It also tracks + per-table initial load progress. + +If you export and import the database but **skip importing documents** (or skip system +documents), the task configuration is restored from the database record — but the +processing state document is missing. CDC Sink sees a valid configuration with no +recorded position and starts from the beginning: performing a full initial load of +every configured table, then streaming all changes from the start. + +For large source databases, this can take a significant amount of time and place +heavy load on both the source database and RavenDB. + +**How to avoid this:** + +* When exporting a database that has CDC Sink tasks, always include documents in the + export — specifically the `@cdc-states` collection. +* If using Smuggler or the REST API for selective export/import, ensure system + documents are included. +* If state loss has already occurred and the data in RavenDB is already up to date, + you can delete and recreate the task with `SkipInitialLoad = true` to skip the full + table scan and begin streaming from the current position. Note that imported tasks + are restored in a **disabled** state by default — you must explicitly enable the task + after import. This gives you the opportunity to inspect and correct the configuration + before the task starts processing. + + + +There is no _supported_ way to manually set the last LSN position. While the state +document in `@cdc-states` can technically be edited, doing so is risky: setting an +incorrect LSN can cause CDC Sink to skip over changes (data loss) or land in the +middle of a transaction boundary, producing errors or inconsistent documents. If the +state document is lost, the safest options are to re-read from the beginning or to +recreate the task with `SkipInitialLoad` (which skips the initial load but may miss +changes that occurred between the export and the recreation). + + + +--- + +## Related Articles + +### CDC Sink + +- [How It Works](./how-it-works.mdx) +- [Monitoring](./monitoring.mdx) +- [Patching](./patching.mdx) +- [Failover and Consistency](./failover-and-consistency.mdx) + +### PostgreSQL + +- [WAL Configuration](./postgres/wal-configuration.mdx) +- [Permissions and Roles](./postgres/permissions-and-roles.mdx) +- [REPLICA IDENTITY](./postgres/replica-identity.mdx)