diff --git a/docs/compute-storage-decoupled/rw/read-write-separation.md b/docs/compute-storage-decoupled/rw/read-write-separation.md index 46459e40d51f7..683210f0f8b2b 100644 --- a/docs/compute-storage-decoupled/rw/read-write-separation.md +++ b/docs/compute-storage-decoupled/rw/read-write-separation.md @@ -3,186 +3,448 @@ "title": "Read/Write Separation and Primary/Standby Cluster File Cache Warm-Up Configuration Guide", "sidebar_label": "Read/Write Separation: File Cache Warm-Up", "language": "en", - "description": "Describes the Doris File Cache active incremental warm-up mechanism, supporting read/write separation and primary/standby cluster architectures, covering warm-up job creation, management, monitoring, and troubleshooting.", - "keywords": ["read/write separation", "primary/standby cluster", "File Cache warm-up", "compute group sync", "high availability", "cross availability zone"] + "description": "Describes the Doris File Cache active incremental warm-up mechanism for read/write separation and primary/standby architectures, including compute-group-level and table-level event-driven warm-up, job creation, status observation, metrics, and troubleshooting.", + "keywords": ["read/write separation", "primary/standby cluster", "File Cache warm-up", "active incremental warm-up", "table-level warm-up", "ON TABLES", "compute group sync", "high availability"] } --- - + ## Background and Applicable Scenarios -To address cache cold-start problems in cross-availability-zone (AZ) high-availability failover and read/write separation scenarios, Doris introduces the **File Cache active incremental warm-up mechanism**. This mechanism ensures that the cache data of a target cluster stays highly consistent with the source cluster, thereby improving query performance, reducing jitter, and accelerating failover response. +In the Apache Doris compute-storage decoupled architecture, multiple compute groups can share the same remote storage data. A write compute group handles ingestion, Compaction, or Schema Change, while query compute groups handle online queries. After a new Rowset is generated, if a query compute group has not loaded the corresponding files into its local File Cache, the first query must access object storage or HDFS, which can cause query latency jitter. -This feature applies to the following two typical scenarios: +File Cache active incremental warm-up preloads the related Segment and index files into the target compute group's local cache after new data is generated on the write side. It is mainly applicable to the following scenarios: -| Scenario | Description | Core Requirement | +| Scenario | Description | Benefit | |------|------|----------| -| **Primary/standby cluster high availability** | The standby cluster continuously syncs hot data from the primary cluster and takes over the load quickly when the primary cluster fails. | Minimize failover latency | -| **Read/write separation** | Newly written data in the write cluster is promptly warmed up to the read cluster to avoid queries hitting cold cache. | Reduce read cluster query jitter | +| Read/write separation | The write compute group continuously ingests data, and the query compute group only serves queries | Reduces Cache Miss when the query compute group reads new data | +| Primary/standby cluster high availability | The standby compute group continuously syncs hot data from the primary compute group | Shortens cold-cache recovery after failover | +| Multi-tenant or layered data warehouse | Different query compute groups access only part of the business tables | Uses table-level filtering to reduce unnecessary warm-up and cache usage | +| Cost optimization | The source compute group has many tables, but hot queries focus on a small subset | Reduces remote storage reads and network transfer | :::tip Version Information -The File Cache active incremental warm-up feature was introduced in Apache Doris **3.1.0**. +This document describes File Cache active incremental warm-up and its table-level `ON TABLES` filtering capability. For exact version support, refer to the release notes and SQL syntax documentation of the corresponding version. ::: ---- - ## Feature Overview - - -File Cache active warm-up supports the following two types of cache synchronization: - -1. **Event-triggered warm-up**: Automatically triggers synchronization after write operations such as Load, Compaction, and Schema Change complete, reducing query jitter. -2. **Periodic hot-data sync**: Continuously syncs hot-query data to the target cluster through periodic scanning, ensuring stable performance of the standby cluster during primary/standby switchover. - ---- +File Cache active warm-up supports three sync modes: -## Sync Mode Description +| Sync Mode | Property Value | Applicable Scenario | +|------|--------|----------| +| One-time sync | `once` | Manually triggers initial warm-up when a new compute group comes online | +| Periodic sync | `periodic` | Syncs hot data at a fixed interval for continuous warm-keeping | +| Event-driven sync | `event_driven` | Automatically warms up new Rowsets after write events, suitable for read/write separation | - - +Event-driven sync can be applied at two scopes: -The applicable scenarios for the three sync modes are as follows: +| Scope | Syntax Form | Description | +|------|----------|------| +| Compute-group-level event-driven warm-up | Without `ON TABLES` | New data generated by matching events on the source compute group triggers warm-up | +| Table-level event-driven warm-up | With `ON TABLES (...)` | Only new data from tables that match the rules triggers warm-up | -| Mode | Parameter Value | Applicable Scenario | -|------|--------|----------| -| One-time sync | `ONCE` | Manually triggered; suitable for initial warm-up when a new cluster comes online | -| Periodic sync | `PERIODIC` | Scheduled sync of hot data; suitable for continuous warm-keeping scenarios | -| Event-driven sync | `EVENT_DRIVEN` | Automatically triggered after Load, Compaction, or Schema Change operations | - ---- +Table-level event-driven warm-up is suitable when a query compute group cares only about a subset of core tables. Compared with compute-group-level warm-up, it reduces unnecessary remote reads, network transfer, and cache usage on the target compute group. ## Creating Warm-Up Jobs - - - ### One-Time Sync -Suitable for manually triggering initial warm-up when a new cluster comes online: +One-time sync is suitable for initial warm-up when a new compute group comes online: ```sql -WARM UP COMPUTE GROUP WITH COMPUTE GROUP ; +WARM UP COMPUTE GROUP WITH COMPUTE GROUP ; ``` ### Periodic Sync -Suitable for continuously maintaining hot-data synchronization: +Periodic sync is suitable for continuously maintaining hot-data synchronization: ```sql -WARM UP COMPUTE GROUP WITH COMPUTE GROUP +WARM UP COMPUTE GROUP WITH COMPUTE GROUP PROPERTIES ( "sync_mode" = "periodic", "sync_interval_sec" = "600" ); ``` -- `sync_interval_sec`: The sync interval in seconds, calculated from the last start time. The default value is 600 seconds. +`sync_interval_sec` specifies the sync interval in seconds. The default value is `600`. -### Event-Driven Sync +### Compute-Group-Level Event-Driven Warm-Up -Suitable for read/write separation scenarios, where new data is automatically warmed up to the read cluster after a write operation completes: +Compute-group-level event-driven warm-up listens for write events on the source compute group and preloads new data into the target compute group: ```sql -WARM UP COMPUTE GROUP WITH COMPUTE GROUP +WARM UP COMPUTE GROUP WITH COMPUTE GROUP PROPERTIES ( "sync_mode" = "event_driven", "sync_event" = "load" ); ``` -- `sync_event`: The type of event that triggers synchronization. Accepted values include `load`, `compaction`, and `schema_change`. +`sync_event` specifies the event type that triggers warm-up. Compute-group-level event-driven warm-up can be used for ingestion, Compaction, Schema Change, and similar scenarios. The available values depend on the `WARM UP` SQL syntax of the current version. ---- +### Table-Level Event-Driven Warm-Up + +Table-level event-driven warm-up adds the `ON TABLES` clause to compute-group-level event-driven warm-up to specify the table range to warm up: + +```sql +WARM UP COMPUTE GROUP WITH COMPUTE GROUP +ON TABLES ( + INCLUDE '.' + [, INCLUDE '.' ...] + [, EXCLUDE '.' ...] +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +Parameter descriptions: + +| Parameter | Required | Description | +|------|----------|------| +| `` | Yes | Name of the target compute group. New data from matched tables is warmed up into this compute group's local File Cache | +| `` | Yes | Name of the source compute group. Doris listens for write events on this compute group | +| `ON TABLES` | No | Table-level filter rules. If omitted, the job is a compute-group-level event-driven warm-up job | +| `INCLUDE` | Required when `ON TABLES` is used | Declares table patterns to include in the warm-up scope. At least one `INCLUDE` rule is required | +| `EXCLUDE` | No | Excludes table patterns from the `INCLUDE` result | +| `sync_mode` | Yes | Table-level event-driven warm-up uses `event_driven` | +| `sync_event` | Yes | Table-level event-driven warm-up currently uses the `load` event | + +:::caution Note +Do not configure compute-group-level `load` event-driven warm-up and table-level `ON TABLES` event-driven warm-up for the same source and target compute groups at the same time. Their semantics overlap, and Doris rejects conflicting configurations during job creation. To switch from compute-group-level warm-up to table-level warm-up, cancel the old job first, then create a new `ON TABLES` job. +::: + +## ON TABLES Matching Rules + +### Pattern Format + +Patterns in `ON TABLES` must use the `'database.table'` format and be enclosed in single quotes. The following wildcards are supported: + +| Wildcard | Meaning | Example | +|--------|------|------| +| `*` | Matches any number of arbitrary characters, including zero characters | `'ods.*'` matches all tables in the `ods` database | +| `?` | Matches exactly one arbitrary character | `'logs.access_202?'` matches `logs.access_2020` through `logs.access_2029` | + +Without wildcards, the pattern is an exact match. For example, `'sales.orders'` matches only the `orders` table in the `sales` database. + +Common pattern examples: + +| Pattern | Meaning | +|------|------| +| `'mydb.*'` | Matches all tables in the `mydb` database | +| `'*.orders'` | Matches tables named `orders` in all databases | +| `'dw.fact_*'` | Matches tables whose names start with `fact_` in the `dw` database | +| `'*.*_bak'` | Matches tables whose names end with `_bak` in all databases | +| `'sales.orders'` | Exactly matches `sales.orders` | + +### INCLUDE and EXCLUDE + +Doris computes the final warm-up scope as follows: + +```text +Final warm-up scope = tables matched by all INCLUDE rules - tables matched by all EXCLUDE rules +``` + +Rules: + +- The order of `INCLUDE` and `EXCLUDE` rules does not affect the final result. +- At least one `INCLUDE` rule is required. You cannot specify only `EXCLUDE`. +- Multiple `INCLUDE` rules are combined by union. +- Multiple `EXCLUDE` rules remove matching tables from the candidate set. +- Matching follows Doris database and table naming rules. Use the same letter case as the actual database and table names. + +Example: + +```sql +WARM UP COMPUTE GROUP analytics_cg WITH COMPUTE GROUP write_cg +ON TABLES ( + INCLUDE 'ods.*', + INCLUDE 'dw.fact_*', + INCLUDE 'dw.dim_*', + EXCLUDE 'ods.tmp_*', + EXCLUDE '*.*_bak' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +This example warms up: + +- Tables in the `ods` database except those whose names start with `tmp_`. +- Tables in the `dw` database whose names start with `fact_` or `dim_`. +- No backup tables whose names end with `_bak` in any database. + +### Materialized Views + +`ON TABLES` rules match both ordinary tables and asynchronous materialized views. An asynchronous materialized view exists as an independent table in the database namespace and is matched by name through `INCLUDE` and `EXCLUDE` rules. + +A synchronous materialized view (Rollup) is an internal index of the base table and is not an independent table. When the base table is warmed up, the data related to its synchronous materialized views is processed together with the base table and does not need a separate rule. + +## Examples + +### Warm Up Specific Tables + +```sql +WARM UP COMPUTE GROUP report_cg WITH COMPUTE GROUP business_cg +ON TABLES ( + INCLUDE 'sales.orders', + INCLUDE 'sales.customers', + INCLUDE 'inventory.stock_level' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +### Warm Up an Entire Database + +```sql +WARM UP COMPUTE GROUP analytics_cg WITH COMPUTE GROUP load_cg +ON TABLES ( + INCLUDE 'analytics_db.*' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +### Warm Up Multiple Databases and Exclude Temporary Tables + +```sql +WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP etl_cg +ON TABLES ( + INCLUDE 'ods.*', + INCLUDE 'dwd.*', + INCLUDE 'dws.*', + EXCLUDE '*.tmp_*', + EXCLUDE '*.*_backup' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +### Warm Up the Same Table to Multiple Target Compute Groups + +If the same table must serve multiple query compute groups, create one job for each target compute group: + +```sql +WARM UP COMPUTE GROUP realtime_cg WITH COMPUTE GROUP write_cg +ON TABLES (INCLUDE 'sales.orders') +PROPERTIES ("sync_mode" = "event_driven", "sync_event" = "load"); + +WARM UP COMPUTE GROUP batch_cg WITH COMPUTE GROUP write_cg +ON TABLES (INCLUDE 'sales.orders') +PROPERTIES ("sync_mode" = "event_driven", "sync_event" = "load"); +``` ## Managing Warm-Up Jobs - - +### Viewing Jobs -### Viewing Job List +View all warm-up jobs: ```sql --- View all warm-up jobs SHOW WARM UP JOB; +``` + +View a specific job: --- View a specific job -SHOW WARM UP JOB WHERE ID = 12345; +```sql +SHOW WARM UP JOB WHERE id = ; ``` -Field descriptions for query results: +Field descriptions: | Field | Description | |--------|------| -| `JobId` | Unique ID of the sync job | -| `ComputeGroup` | Name of the target Compute Group | -| `SrcComputeGroup` | Name of the source Compute Group | -| `Type` | Sync type: `CLUSTER` (cluster-level) / `TABLE` (table-level) | -| `SyncMode` | Sync mode: `ONCE` / `PERIODIC(interval_sec)` / `EVENT_DRIVEN(event)` | -| `Status` | Job status: `PENDING` / `RUNNING` / `FINISHED` / `CANCELLED` / `DELETED` | -| `CreateTime` | Time when the job was created | -| `StartTime` | Time of the last start | -| `FinishTime` | Time of the last completion | -| `FinishBatch` | Number of batches completed | -| `AllBatch` | Total number of batches to sync | -| `ErrMsg` | Error message (empty if no error) | +| `JobId` | Unique ID of the warm-up job | +| `SrcComputeGroup` | Name of the source compute group | +| `DstComputeGroup` | Name of the target compute group | +| `Status` | Job status, such as `PENDING`, `RUNNING`, `FINISHED`, or `CANCELLED` | +| `Type` | Warm-up scope. `CLUSTER` means compute-group-level, `TABLE` means explicitly specified tables, and `TABLES` means matched by `ON TABLES` rules | +| `SyncMode` | Sync mode, such as `ONCE`, `PERIODIC(interval_sec)`, or `EVENT_DRIVEN(event)` | +| `CreateTime` | Job creation time | +| `StartTime` | Most recent start time | +| `FinishBatch` | Number of completed batches | +| `AllBatch` | Total number of batches | +| `FinishTime` | Most recent finish time. Event-driven jobs usually keep running | +| `ErrMsg` | Most recent error message. Empty if there is no error | +| `Tables` | Explicit table list, mainly used by one-time or periodic table-level warm-up | +| `TableFilter` | Canonical representation of `ON TABLES` rules. Empty for compute-group-level jobs | +| `MatchedTables` | Current list of matched table names. Periodic refresh reflects table creation, deletion, and rename | +| `SyncStats` | Sync progress of event-driven jobs. List query shows a summary; ID query shows detailed JSON | + +`SHOW WARM UP JOB` is suitable for daily inspection. To avoid an overly wide list, `SyncStats` shows a summary for the most recent 30 minutes: + +```json +{ + "window": "30m", + "src_size": "58.2mb", + "dst_size": "57.5mb", + "gap_size": "716kb", + "trigger_gap_ms": 1200 +} +``` + +When querying by Job ID, `SyncStats` shows detailed 5-minute, 30-minute, and 1-hour window metrics: + +```sql +SHOW WARM UP JOB WHERE id = ; +``` + +`SyncStats` example: + +```json +{ + "seg_num": { + "requested_5m": 42, + "finish_5m": 40, + "gap_5m": 2, + "fail_5m": 0, + "requested_30m": 180, + "finish_30m": 178, + "gap_30m": 2, + "fail_30m": 0, + "requested_1h": 320, + "finish_1h": 318, + "gap_1h": 2, + "fail_1h": 0 + }, + "seg_size": { + "requested_5m": "12.5mb", + "finish_5m": "11.8mb", + "gap_5m": "716kb", + "fail_5m": "0b", + "requested_30m": "58.2mb", + "finish_30m": "57.5mb", + "gap_30m": "716kb", + "fail_30m": "0b", + "requested_1h": "102.3mb", + "finish_1h": "101.6mb", + "gap_1h": "716kb", + "fail_1h": "0b" + }, + "idx_num": { + "requested_5m": 10, + "finish_5m": 10, + "gap_5m": 0, + "fail_5m": 0 + }, + "idx_size": { + "requested_5m": "2.1mb", + "finish_5m": "2.1mb", + "gap_5m": "0b", + "fail_5m": "0b" + }, + "last_trigger_ts": "14:32:15", + "last_finish_ts": "14:32:18", + "progress_trigger_ts": "14:32:14", + "trigger_gap_ms": 1000 +} +``` + +Pay attention to the following fields: + +| Field | Description | +|------|------| +| `requested_*` | Amount of warm-up requests submitted by the source compute group | +| `finish_*` | Amount of warm-up work completed by the target compute group | +| `gap_*` | Gap, indicating the amount that has not completed | +| `fail_*` | Amount of failed warm-up work on the target compute group | +| `last_trigger_ts` | Most recent warm-up trigger time | +| `progress_trigger_ts` | Upstream trigger time corresponding to the current progress on the target compute group | +| `last_finish_ts` | Most recent warm-up finish time | +| `trigger_gap_ms` | Time gap between the latest source trigger time and the target progress watermark, in milliseconds | ### Canceling a Job ```sql -CANCEL WARM UP JOB WHERE id = 12345; +CANCEL WARM UP JOB WHERE id = ; ``` -:::caution Note -The current version does not support using `ALTER` to modify an existing job configuration. To change parameters, cancel the job first and then create a new one. -::: +After cancellation, Doris stops listening for events and stops triggering warm-up for this job. Data already written into the target compute group's File Cache is not actively removed and is released by the normal cache eviction policy. ---- +The current version does not support directly modifying the `ON TABLES` rules of an existing job. To adjust the warm-up scope, cancel the old job first, then create a new one. -## How It Works +## Matching Refresh and Behavior Notes - +### Creating, Dropping, and Renaming Tables + +`ON TABLES` rules are evaluated when the job is created and are periodically re-evaluated while the job is running. The default refresh interval is 60 seconds. + +This means: + +- After the job is created, a newly created table is automatically included in the warm-up scope in a later refresh cycle if its name matches the rules. +- After a matched table is dropped, it is removed from `MatchedTables` in a later refresh cycle. +- After a matched table is renamed, whether it continues to be warmed up depends on whether the new name still matches the rules. + +There can be a delay window of up to 60 seconds between creating a new table and the next rule refresh. Writes to the new table during this delay window do not trigger this table-level job. Writes after the refresh trigger warm-up normally. + +### No Matched Table at Creation + +When creating a table-level event-driven warm-up job, the `ON TABLES` rules must match at least one existing table. If no table is matched, job creation fails. Check the database name, table name, and wildcard patterns. + +If you want to configure the warm-up relationship in advance, create at least one table that matches the rules before creating the warm-up job. + +### Schema Change + +`ON TABLES` only determines the table set and does not change the trigger semantics of the event type itself. For event types that are configured in the current job and supported by the current version, newly generated data is processed according to the table matching result. If the job is configured with `sync_event = "load"`, it listens only for the corresponding load event. + +## How It Works ### Periodic Sync Execution Flow -1. FE registers the job and records the `sync_interval` configuration. -2. FE periodically checks whether the trigger time has been reached, calculated from the last start time. -3. The sync job is triggered, avoiding overlapping executions. -4. After sync completes, the status is recorded and the system waits for the next cycle. +1. FE registers the job and records the sync interval. +2. FE periodically checks whether the trigger time has been reached. +3. When the trigger time is reached, FE converts the target tables or partitions into corresponding Tablets and dispatches tasks. +4. BE reads files from remote storage and writes them into the target compute group's File Cache. ### Event-Driven Sync Execution Flow -1. The user creates an event-driven job, FE registers the job and pushes the configuration to the source cluster BE. -2. The source BE automatically triggers the warm-up logic after events such as Load and Compaction complete. -3. The source BE initiates a sync request to the target BE at the Rowset granularity. -4. After sync completes, the target BE reports the execution status to FE. +1. The user creates an event-driven job, and FE persists the sync relationship. +2. FE pushes the event-driven configuration to the source compute group BE. +3. The source compute group BE triggers warm-up after a write event is committed. +4. For table-level event-driven jobs, the source BE processes only Rowsets that belong to the current matched table set. +5. The target compute group BE downloads the corresponding Segment and index files and writes them into the local File Cache. +6. FE exposes job status and sync progress through `SHOW WARM UP JOB` and FE metrics. -### Scheduling and Storage Mechanism +## Metrics Monitoring -- Sync relationships are persistently stored by FE as `CloudWarmUpJob` objects, supporting concurrent management of multiple jobs. -- Multiple jobs in `PENDING` status are allowed for the same target cluster, but only one job is allowed to be in `RUNNING` status at a time; the remaining jobs queue and wait. -- Sync relationships can be managed by Compute Group name, compatible with cluster renaming and migration operations. +### SQL Observation ---- +The most direct way to observe warm-up progress is to use `SHOW WARM UP JOB`: -## Metrics Monitoring +```sql +SHOW WARM UP JOB; +SHOW WARM UP JOB WHERE id = ; +``` - - +Usage suggestions: -### Periodic Jobs - FE-Side Metrics +- `gap_size` or detailed `gap_*` continuously approaching 0 means the target compute group is generally keeping up with the source compute group's write speed. +- `trigger_gap_ms` approaching 0 means the target compute group has caught up with the latest trigger event from the source compute group. +- If `fail_*` is greater than 0, check BE logs for disk space issues, remote storage access failures, or network errors. +- The 5-minute window is useful for real-time fluctuations, while the 30-minute and 1-hour windows are useful for sustained trends. -| Metric Name | Meaning | -|----------|------| -| `file_cache_warm_up_job_exec_count` | Number of scheduled executions | -| `file_cache_warm_up_job_requested_tablets` | Total number of tablets submitted | -| `file_cache_warm_up_job_finished_tablets` | Number of tablets that completed sync | -| `file_cache_warm_up_job_latest_start_time` | Time of the most recent job start | -| `file_cache_warm_up_job_last_finish_time` | Time of the most recent job completion | +### BE Bvar Metrics + +In addition to `SHOW WARM UP JOB` and FE `/metrics`, you can use the BE Bvar page to inspect warm-up execution metrics on a single BE: + +```bash +curl http://:/vars +``` -### Periodic Jobs - BE-Side Metrics +BE-side metrics for periodic jobs: | Metric Name | Meaning | |----------|------| @@ -191,7 +453,7 @@ The current version does not support using `ALTER` to modify an existing job con | `file_cache_once_or_periodic_warm_up_submitted_index_num` | Number of submitted indexes | | `file_cache_once_or_periodic_warm_up_finished_index_num` | Number of completed indexes | -### Event-Driven Jobs - Source BE Metrics +Source BE metrics for event-driven jobs: | Metric Name | Meaning | |----------|------| @@ -199,7 +461,7 @@ The current version does not support using `ALTER` to modify an existing job con | `file_cache_event_driven_warm_up_requested_index_num` | Number of indexes requested for sync | | `file_cache_warm_up_rowset_last_call_unix_ts` | Timestamp of the last sync request initiated | -### Event-Driven Jobs - Target BE Metrics +Target BE metrics for event-driven jobs: | Metric Name | Meaning | |----------|------| @@ -207,33 +469,145 @@ The current version does not support using `ALTER` to modify an existing job con | `file_cache_event_driven_warm_up_finished_segment_num` | Number of segments that completed warm-up | | `file_cache_warm_up_rowset_last_handle_unix_ts` | Timestamp of the last sync request handled | ---- +These metrics reflect execution on a single BE and are useful for checking whether a BE has received warm-up requests, completed downloads, and recently initiated or handled requests. For cross-BE job-level aggregation, prefer `SHOW WARM UP JOB WHERE id = ` or FE Prometheus metrics. -## FAQ +### FE Prometheus Metrics + +In cloud mode, FE periodically pulls and aggregates event-driven warm-up progress from BEs. The default refresh interval is 15 seconds. The interval is controlled by the FE configuration item `cloud_warm_up_sync_stats_refresh_interval_ms`, whose default value is `15000` milliseconds. + +You can collect the following metrics from FE `/metrics`: + +| Metric Name | Description | +|--------|------| +| `doris_fe_file_cache_warm_up_sync_job_info` | Job metadata. The value is always 1. Labels include `job_id`, `job_type`, `sync_mode`, `sync_event`, `job_state`, and source/target compute groups | +| `doris_fe_file_cache_warm_up_sync_job_size_bytes` | Total warm-up size submitted by the source side and completed by the target side, in bytes. Includes the `side` and `window` labels | +| `doris_fe_file_cache_warm_up_sync_job_trigger_gap_ms` | Time gap between the latest source trigger time and the target progress watermark, in milliseconds | + +Common PromQL examples: + +```promql +# Total warm-up size submitted by the source side in the last 5 minutes for each job +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="src",window="5m"} + +# Total warm-up size completed by the target side in the last 5 minutes for each job +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="dst",window="5m"} + +# Sync size gap in the last 5 minutes for each job +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="src",window="5m"} + - ignoring(side) +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="dst",window="5m"} + +# Trigger progress time gap for each job +doris_fe_file_cache_warm_up_sync_job_trigger_gap_ms +``` + +`ignoring(side)` tells Prometheus to ignore the `side` label when subtracting source and target size series, so that `src` and `dst` series with the same job and window can be matched. + +## End-to-End Procedure + +1. View current compute groups and confirm the source and target compute group names: + + ```sql + SHOW COMPUTE GROUPS; + ``` + +2. Confirm the table range that needs warm-up: + + ```sql + SHOW TABLES FROM ods; + SHOW TABLES FROM dw; + ``` + +3. Create a table-level event-driven warm-up job: + + ```sql + WARM UP COMPUTE GROUP read_cg WITH COMPUTE GROUP write_cg + ON TABLES ( + INCLUDE 'ods.*', + INCLUDE 'dw.fact_*', + EXCLUDE 'ods.tmp_*' + ) + PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" + ); + ``` - - +4. Check job status and matched tables: + + ```sql + SHOW WARM UP JOB; + ``` + +5. After writing data, observe sync progress: + + ```sql + SHOW WARM UP JOB WHERE id = ; + ``` + +6. To adjust rules, cancel the old job and create a new one: + + ```sql + CANCEL WARM UP JOB WHERE id = ; + ``` + +## Best Practices + +- If a query compute group accesses only a small set of core tables, prefer table-level event-driven warm-up to avoid excessive cache usage from compute-group-level warm-up. +- For data warehouses with clear table naming conventions, rules such as `INCLUDE 'dws.*'`, `INCLUDE 'ads.*'`, and `EXCLUDE '*.tmp_*'` are easier to maintain. +- Avoid having multiple jobs cover the same hot tables. Although the target side can avoid repeated downloads where possible, job management and metric interpretation become more complex. +- To change the warm-up scope, cancel and recreate the job. Do not rely on the old job to change its rules automatically. +- Use `SHOW WARM UP JOB` for single-job details, and use FE Prometheus metrics in Grafana for long-term trend monitoring. + +## FAQ **Q: Does a sync failure in one round cancel the entire job?** -No. A sync failure in the current round only skips the current execution; the job status remains unchanged and subsequent cycles continue to attempt execution. +No. A sync failure in the current round only skips that execution. The job status remains unchanged, and later cycles continue to retry. You can run `SHOW WARM UP JOB WHERE id = ` to inspect `ErrMsg` and failure counts in `SyncStats`, then check BE logs for the root cause. **Q: What happens when a periodic job execution times out?** -The system skips the current round after a timeout. The job itself is not deleted, and the next cycle triggers normally. +The system skips the current round after a timeout. The job itself is not deleted, and the next cycle triggers normally. You can inspect `StartTime`, `FinishTime`, `FinishBatch`, and `AllBatch` in `SHOW WARM UP JOB` to understand the most recent execution. + +**Q: Is it supported to sync from multiple source compute groups to the same target compute group?** + +Yes. For example, compute group A and compute group C can both sync to compute group B (A -> B and C -> B coexist). If multiple jobs cover the same tables, job management and metric interpretation become more complex. + +**Q: When should I use table-level event-driven warm-up?** + +Use table-level event-driven warm-up when the target compute group queries only part of the tables, or when the source compute group has many tables but only a small subset is hot. This reduces unnecessary warm-up and cache pollution. + +**Q: What happens if `ON TABLES` is not used?** + +Without `ON TABLES`, the job is a compute-group-level event-driven warm-up job. New data generated by matching events on the source compute group triggers warm-up. + +**Q: Does the order of `INCLUDE` and `EXCLUDE` matter?** + +No. Doris first computes the union of all `INCLUDE` rules and then removes all tables matched by `EXCLUDE` rules. + +**Q: If a table matching the rules is created after the job is created, will it be warmed up automatically?** + +Yes. Doris periodically re-evaluates the rules. A new table is included in the warm-up scope in a later refresh cycle if it matches the rules. The default refresh interval is 60 seconds. + +**Q: After a table is renamed, will it continue to be warmed up?** + +It depends on whether the new table name still matches the `ON TABLES` rules. If it matches, warm-up continues. If it does not match, warm-up stops in a later refresh cycle. -**Q: Is it supported to sync from multiple source clusters to the same target cluster?** +**Q: Can I warm up only one partition of a table?** -Yes. For example, cluster A and cluster C can both be configured to sync to cluster B simultaneously (A -> B and C -> B coexist). +Table-level event-driven warm-up filters at table granularity and does not support specifying partitions in `ON TABLES`. New data from a matched table is processed according to the table-level rules. -**Q: How do you verify that a warm-up job has taken effect?** +**Q: How do I verify that warm-up has taken effect?** -You can verify using the following methods: +You can verify it as follows: -1. Run `SHOW WARM UP JOB WHERE ID = ` to check whether `Status` is `RUNNING` or `FINISHED`. -2. Compare `FinishBatch` with `AllBatch` to confirm sync progress. -3. Observe the BE-side metrics of the target cluster and confirm that `finished_segment_num` continues to grow. +1. Run `SHOW WARM UP JOB WHERE id = ` and check whether `Status` is `RUNNING` or `FINISHED`. +2. For table-level event-driven warm-up, check whether `MatchedTables` matches your expectation. +3. Compare `FinishBatch` with `AllBatch` to confirm the progress of one-time or periodic jobs. +4. Check `SyncStats` and confirm that `gap_size` or detailed `gap_*` approaches 0 and `trigger_gap_ms` does not keep increasing. +5. Observe the target compute group's BE Bvar metrics and confirm that completion counters such as `file_cache_event_driven_warm_up_finished_segment_num` continue to increase. +6. Query the related tables on the target compute group, and use File Cache hit rate, FE metrics, and BE logs to confirm whether there are still many remote reads. -**Q: How do you modify the configuration of a sync job (such as adjusting the sync interval)?** +**Q: How do I modify a sync job configuration, such as changing the sync interval or `ON TABLES` rules?** Direct modification is not supported in the current version. You must first run `CANCEL WARM UP JOB WHERE id = ` to cancel the old job, then create a new job. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/rw/read-write-separation.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/rw/read-write-separation.md index f13d6385c7c3b..7d95530f354cf 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/rw/read-write-separation.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/compute-storage-decoupled/rw/read-write-separation.md @@ -3,186 +3,448 @@ "title": "读写分离与主备集群 File Cache 预热配置指南", "sidebar_label": "读写分离:File Cache 预热", "language": "zh-CN", - "description": "介绍 Doris File Cache 主动增量预热机制,支持读写分离和主备集群架构,涵盖预热任务创建、管理、监控及常见问题排查。", - "keywords": ["读写分离", "主备集群", "File Cache 预热", "compute group 同步", "高可用", "跨可用区"] + "description": "介绍 Doris File Cache 主动增量预热机制,支持读写分离和主备集群架构,涵盖计算组级和表级事件驱动预热、任务创建、状态观测、指标监控及常见问题排查。", + "keywords": ["读写分离", "主备集群", "File Cache 预热", "主动增量预热", "表级别预热", "ON TABLES", "compute group 同步", "高可用"] } --- - + ## 背景与适用场景 -为解决跨可用区(AZ)高可用切换和读写分离场景下的缓存冷启动问题,Doris 引入了 **File Cache 主动增量预热机制**。该机制确保目标集群的缓存数据与源集群保持高度一致,从而提升查询性能、减少抖动,并加快故障切换响应速度。 +在 Apache Doris 存算分离架构中,多个计算组(Compute Group)可以共享同一份远端存储数据。写入计算组负责导入、Compaction 或 Schema Change,查询计算组负责在线查询。新 Rowset 生成后,如果查询计算组尚未将对应文件加载到本地 File Cache,首次查询就需要访问对象存储或 HDFS,容易带来查询延迟抖动。 -该功能适用于以下两种典型场景: +File Cache 主动增量预热用于在写入侧产生新数据后,提前将相关 Segment 和索引文件预热到目标计算组的本地缓存中。它主要适用于以下场景: -| 场景 | 说明 | 核心需求 | +| 场景 | 说明 | 核心收益 | |------|------|----------| -| **主备集群高可用** | 备集群持续同步主集群热点数据,在主集群故障时快速接管负载 | 最小化切换延迟 | -| **读写分离** | 写集群的新增数据及时预热到读集群,避免查询命中冷缓存 | 降低读集群查询抖动 | +| 读写分离 | 写入计算组持续导入数据,查询计算组只负责查询 | 降低查询计算组读取新数据时的 Cache Miss | +| 主备集群高可用 | 备计算组持续同步主计算组热点数据 | 缩短故障切换后的冷缓存恢复时间 | +| 多租户或分层数仓 | 不同查询计算组只访问部分业务库表 | 通过表级过滤减少无效预热和缓存占用 | +| 成本优化 | 表数量较多,但热点查询集中在少量表 | 降低远端存储读取和网络传输开销 | :::tip 版本信息 -File Cache 主动增量预热功能已在 Apache Doris **3.1.0** 版本中引入。 +本文介绍 File Cache 主动增量预热及其表级 `ON TABLES` 过滤能力。具体支持版本请以对应版本的发行说明和 SQL 语法文档为准。 ::: ---- - ## 功能概览 - - -File Cache 主动预热支持以下两类缓存同步方式: - -1. **事件触发预热**:在 Load、Compaction、Schema Change 等写操作完成后自动触发同步,减少查询抖动。 -2. **热点周期同步**:通过周期性扫描,持续将热点查询数据同步到目标集群,保障主备切换时备集群性能稳定。 - ---- +File Cache 主动预热支持三类同步模式: -## 同步模式说明 +| 同步模式 | 参数值 | 适用场景 | +|------|--------|----------| +| 一次性同步 | `once` | 新计算组上线时手动触发初始预热 | +| 周期性同步 | `periodic` | 按固定间隔同步热点数据,适用于持续保温场景 | +| 事件驱动同步 | `event_driven` | 写入事件发生后自动预热新增 Rowset,适用于读写分离 | - - +事件驱动同步又可以分为两种范围: -三种同步模式的适用场景如下: +| 范围 | 语法形态 | 说明 | +|------|----------|------| +| 计算组级事件驱动预热 | 不带 `ON TABLES` | 源计算组上符合事件类型的新写入数据都会触发预热 | +| 表级事件驱动预热 | 带 `ON TABLES (...)` | 只有匹配规则的表产生新写入数据时才触发预热 | -| 模式 | 参数值 | 适用场景 | -|------|--------|----------| -| 一次性同步 | `ONCE` | 手动触发,适用于新集群上线时的初始预热 | -| 周期性同步 | `PERIODIC` | 定时同步热点数据,适用于持续保温场景 | -| 事件驱动同步 | `EVENT_DRIVEN` | 导入、Compaction、Schema Change 操作后自动触发 | - ---- +表级事件驱动预热适合只关心部分核心表的查询计算组。与计算组级预热相比,它可以减少不必要的远端读取、网络传输和目标计算组缓存占用。 ## 创建预热任务 - - - ### 一次性同步 -适用于新集群上线时手动触发初始预热: +一次性同步适用于新计算组上线时的初始预热: ```sql -WARM UP COMPUTE GROUP WITH COMPUTE GROUP ; +WARM UP COMPUTE GROUP WITH COMPUTE GROUP ; ``` ### 周期性同步 -适用于持续保持热点数据同步: +周期性同步适用于持续保持热点数据同步: ```sql -WARM UP COMPUTE GROUP WITH COMPUTE GROUP +WARM UP COMPUTE GROUP WITH COMPUTE GROUP PROPERTIES ( "sync_mode" = "periodic", "sync_interval_sec" = "600" ); ``` -- `sync_interval_sec`:同步间隔(秒),基于上次开始时间计算,默认值为 600 秒。 +`sync_interval_sec` 表示同步间隔,单位为秒,默认值为 `600`。 -### 事件驱动同步 +### 计算组级事件驱动预热 -适用于读写分离场景,在写操作完成后自动将新数据预热到读集群: +计算组级事件驱动预热会监听源计算组上的写入事件,并将新增数据预热到目标计算组: ```sql -WARM UP COMPUTE GROUP WITH COMPUTE GROUP +WARM UP COMPUTE GROUP WITH COMPUTE GROUP PROPERTIES ( "sync_mode" = "event_driven", "sync_event" = "load" ); ``` -- `sync_event`:触发事件类型,可选值包括 `load`(导入)、`compaction`(合并)、`schema_change`(结构变更)。 +`sync_event` 表示触发事件类型。计算组级事件驱动预热可用于导入、Compaction、Schema Change 等场景,具体可用值以当前版本的 `WARM UP` SQL 语法为准。 ---- +### 表级事件驱动预热 + +表级事件驱动预热在计算组级事件驱动预热的基础上增加 `ON TABLES` 子句,用于指定需要预热的库表范围: + +```sql +WARM UP COMPUTE GROUP WITH COMPUTE GROUP +ON TABLES ( + INCLUDE '.' + [, INCLUDE '.' ...] + [, EXCLUDE '.' ...] +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +参数说明: + +| 参数 | 是否必选 | 说明 | +|------|----------|------| +| `` | 是 | 目标计算组名称。匹配表的新写入数据会被预热到该计算组的本地 File Cache | +| `` | 是 | 源计算组名称。系统监听该计算组上的写入事件 | +| `ON TABLES` | 否 | 表级过滤规则。省略时为计算组级事件驱动预热 | +| `INCLUDE` | 使用 `ON TABLES` 时必选 | 声明需要纳入预热范围的库表模式,至少需要一条 | +| `EXCLUDE` | 否 | 从 `INCLUDE` 结果中排除不需要预热的库表模式 | +| `sync_mode` | 是 | 表级事件驱动预热固定使用 `event_driven` | +| `sync_event` | 是 | 表级事件驱动预热当前用于 `load` 事件 | + +:::caution 注意 +同一对源计算组和目标计算组之间,不要同时配置计算组级 `load` 事件驱动预热和表级 `ON TABLES` 事件驱动预热。两者语义存在重叠,系统会在创建阶段拒绝冲突配置。需要从计算组级切换到表级时,先取消旧 Job,再创建新的 `ON TABLES` Job。 +::: + +## ON TABLES 匹配规则 + +### 模式格式 + +`ON TABLES` 中的模式必须使用 `'库名.表名'` 格式,并用单引号包裹。支持以下通配符: + +| 通配符 | 含义 | 示例 | +|--------|------|------| +| `*` | 匹配任意数量的任意字符,包括零个字符 | `'ods.*'` 匹配 `ods` 库下所有表 | +| `?` | 匹配恰好一个任意字符 | `'logs.access_202?'` 匹配 `logs.access_2020` 到 `logs.access_2029` | + +不使用通配符时为精确匹配,例如 `'sales.orders'` 只匹配 `sales` 库中的 `orders` 表。 + +常见模式示例: + +| 模式 | 含义 | +|------|------| +| `'mydb.*'` | 匹配 `mydb` 库下所有表 | +| `'*.orders'` | 匹配所有库中名为 `orders` 的表 | +| `'dw.fact_*'` | 匹配 `dw` 库下 `fact_` 开头的表 | +| `'*.*_bak'` | 匹配所有库中 `_bak` 结尾的表 | +| `'sales.orders'` | 精确匹配 `sales.orders` | + +### INCLUDE 和 EXCLUDE + +系统按以下逻辑计算最终预热范围: + +```text +最终预热范围 = 所有 INCLUDE 规则匹配到的表 - 所有 EXCLUDE 规则匹配到的表 +``` + +规则说明: + +- `INCLUDE` 和 `EXCLUDE` 的书写顺序不影响最终结果。 +- 至少需要一条 `INCLUDE` 规则,不能只写 `EXCLUDE`。 +- 多条 `INCLUDE` 规则之间是并集关系。 +- 多条 `EXCLUDE` 规则会从 `INCLUDE` 的候选集合中继续排除。 +- 匹配遵循 Doris 库名和表名规则,建议使用与实际库表完全一致的大小写。 + +示例: + +```sql +WARM UP COMPUTE GROUP analytics_cg WITH COMPUTE GROUP write_cg +ON TABLES ( + INCLUDE 'ods.*', + INCLUDE 'dw.fact_*', + INCLUDE 'dw.dim_*', + EXCLUDE 'ods.tmp_*', + EXCLUDE '*.*_bak' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +该示例会预热: + +- `ods` 库下除 `tmp_` 前缀以外的表; +- `dw` 库下 `fact_` 和 `dim_` 前缀的表; +- 排除所有库中 `_bak` 结尾的备份表。 + +### 物化视图 + +`ON TABLES` 规则同时匹配普通表和异步物化视图。异步物化视图在数据库目录中作为独立表存在,会按照名称被 `INCLUDE` 和 `EXCLUDE` 规则匹配。 + +同步物化视图(Rollup)是基表的内部索引,不是独立表。当基表被预热时,其同步物化视图相关数据会随基表一起处理,无需单独配置。 + +## 使用示例 + +### 只预热指定表 + +```sql +WARM UP COMPUTE GROUP report_cg WITH COMPUTE GROUP business_cg +ON TABLES ( + INCLUDE 'sales.orders', + INCLUDE 'sales.customers', + INCLUDE 'inventory.stock_level' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +### 预热整个数据库 + +```sql +WARM UP COMPUTE GROUP analytics_cg WITH COMPUTE GROUP load_cg +ON TABLES ( + INCLUDE 'analytics_db.*' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +### 预热多个库并排除临时表 + +```sql +WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP etl_cg +ON TABLES ( + INCLUDE 'ods.*', + INCLUDE 'dwd.*', + INCLUDE 'dws.*', + EXCLUDE '*.tmp_*', + EXCLUDE '*.*_backup' +) +PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" +); +``` + +### 同一张表预热到多个目标计算组 + +如果同一张表需要服务多个查询计算组,需要为每个目标计算组分别创建 Job: + +```sql +WARM UP COMPUTE GROUP realtime_cg WITH COMPUTE GROUP write_cg +ON TABLES (INCLUDE 'sales.orders') +PROPERTIES ("sync_mode" = "event_driven", "sync_event" = "load"); + +WARM UP COMPUTE GROUP batch_cg WITH COMPUTE GROUP write_cg +ON TABLES (INCLUDE 'sales.orders') +PROPERTIES ("sync_mode" = "event_driven", "sync_event" = "load"); +``` ## 管理预热任务 - - +### 查看任务 -### 查看任务列表 +查看所有预热任务: ```sql --- 查看所有预热任务 SHOW WARM UP JOB; +``` --- 查看指定任务 -SHOW WARM UP JOB WHERE ID = 12345; +查看指定任务: + +```sql +SHOW WARM UP JOB WHERE id = ; ``` -查询结果字段说明: +结果字段说明: | 字段名 | 说明 | |--------|------| -| `JobId` | 同步任务唯一 ID | -| `ComputeGroup` | 目标 Compute Group 名称 | -| `SrcComputeGroup` | 源 Compute Group 名称 | -| `Type` | 同步类型:`CLUSTER`(集群级)/ `TABLE`(表级) | -| `SyncMode` | 同步模式:`ONCE` / `PERIODIC(interval_sec)` / `EVENT_DRIVEN(event)` | -| `Status` | 任务状态:`PENDING` / `RUNNING` / `FINISHED` / `CANCELLED` / `DELETED` | +| `JobId` | 预热任务唯一 ID | +| `SrcComputeGroup` | 源计算组名称 | +| `DstComputeGroup` | 目标计算组名称 | +| `Status` | 任务状态,例如 `PENDING`、`RUNNING`、`FINISHED`、`CANCELLED` | +| `Type` | 预热范围。`CLUSTER` 表示计算组级,`TABLE` 表示指定表,`TABLES` 表示 `ON TABLES` 规则匹配 | +| `SyncMode` | 同步模式,例如 `ONCE`、`PERIODIC(interval_sec)`、`EVENT_DRIVEN(event)` | | `CreateTime` | 任务创建时间 | -| `StartTime` | 上一次开始时间 | -| `FinishTime` | 上一次完成时间 | +| `StartTime` | 最近一次开始时间 | | `FinishBatch` | 已完成的 batch 数量 | -| `AllBatch` | 总共需要同步的 batch 数量 | -| `ErrMsg` | 错误信息(无错误时为空) | +| `AllBatch` | 总 batch 数量 | +| `FinishTime` | 最近一次完成时间。事件驱动任务通常持续运行 | +| `ErrMsg` | 最近一次错误信息,无错误时为空 | +| `Tables` | 显式指定的表列表,主要用于一次性或周期性表级预热 | +| `TableFilter` | `ON TABLES` 规则的规范化展示。计算组级任务为空 | +| `MatchedTables` | 当前实际匹配到的表名列表,会随定期刷新反映表新增、删除或重命名 | +| `SyncStats` | 事件驱动任务的同步进度。列表查询展示摘要,按 ID 查询展示详细 JSON | + +`SHOW WARM UP JOB` 列表适合日常巡检。为了避免列表过宽,`SyncStats` 会展示最近 30 分钟的摘要: + +```json +{ + "window": "30m", + "src_size": "58.2mb", + "dst_size": "57.5mb", + "gap_size": "716kb", + "trigger_gap_ms": 1200 +} +``` + +按 Job ID 查询时,`SyncStats` 会展示更详细的 5 分钟、30 分钟和 1 小时窗口指标: + +```sql +SHOW WARM UP JOB WHERE id = ; +``` + +`SyncStats` 示例: + +```json +{ + "seg_num": { + "requested_5m": 42, + "finish_5m": 40, + "gap_5m": 2, + "fail_5m": 0, + "requested_30m": 180, + "finish_30m": 178, + "gap_30m": 2, + "fail_30m": 0, + "requested_1h": 320, + "finish_1h": 318, + "gap_1h": 2, + "fail_1h": 0 + }, + "seg_size": { + "requested_5m": "12.5mb", + "finish_5m": "11.8mb", + "gap_5m": "716kb", + "fail_5m": "0b", + "requested_30m": "58.2mb", + "finish_30m": "57.5mb", + "gap_30m": "716kb", + "fail_30m": "0b", + "requested_1h": "102.3mb", + "finish_1h": "101.6mb", + "gap_1h": "716kb", + "fail_1h": "0b" + }, + "idx_num": { + "requested_5m": 10, + "finish_5m": 10, + "gap_5m": 0, + "fail_5m": 0 + }, + "idx_size": { + "requested_5m": "2.1mb", + "finish_5m": "2.1mb", + "gap_5m": "0b", + "fail_5m": "0b" + }, + "last_trigger_ts": "14:32:15", + "last_finish_ts": "14:32:18", + "progress_trigger_ts": "14:32:14", + "trigger_gap_ms": 1000 +} +``` + +重点关注以下字段: + +| 字段 | 说明 | +|------|------| +| `requested_*` | 源计算组已提交的预热请求量 | +| `finish_*` | 目标计算组已完成的预热量 | +| `gap_*` | 缺口,表示尚未完成的量 | +| `fail_*` | 目标计算组预热失败量 | +| `last_trigger_ts` | 最近一次预热触发时间 | +| `progress_trigger_ts` | 目标计算组当前预热进度对应的上游触发时间 | +| `last_finish_ts` | 最近一次预热完成时间 | +| `trigger_gap_ms` | 源端最新触发时间与目标端进度水位之间的时间差,单位毫秒 | ### 取消任务 ```sql -CANCEL WARM UP JOB WHERE id = 12345; +CANCEL WARM UP JOB WHERE id = ; ``` -:::caution 注意 -当前版本不支持 `ALTER` 修改已有任务配置。如需变更参数,须先取消任务,再重新创建。 -::: +取消后,系统停止监听该 Job 对应的事件并停止继续触发预热。已经写入目标计算组 File Cache 的数据不会被主动清除,会按正常缓存淘汰策略释放。 ---- +当前版本不支持直接修改已有 Job 的 `ON TABLES` 规则。如需调整预热范围,需要先取消旧 Job,再创建新 Job。 -## 工作原理 +## 匹配刷新与行为说明 - +### 新建、删除和重命名表 + +`ON TABLES` 规则会在 Job 创建时执行,并在任务运行过程中定期重新评估。默认刷新周期为 60 秒。 + +这意味着: + +- 创建 Job 后新建的表,如果名称匹配规则,会在后续刷新周期中自动纳入预热范围。 +- 已匹配的表被删除后,会在后续刷新周期中从 `MatchedTables` 中移除。 +- 已匹配的表被重命名后,是否继续预热取决于新名称是否仍匹配规则。 + +在新表创建到下一次规则刷新之间,可能存在最长 60 秒的延迟窗口。延迟窗口内对新表写入的数据不会触发该表级 Job 的预热;刷新后发生的新写入会正常触发。 + +### 创建时无匹配表 + +创建表级事件驱动预热 Job 时,`ON TABLES` 规则需要至少匹配一张已存在的表。如果没有任何表匹配,创建会失败。请检查库名、表名和通配符是否正确。 + +如果希望提前配置预热关系,建议先创建至少一张符合规则的表,再创建预热 Job。 + +### Schema Change + +`ON TABLES` 只决定表集合,不改变事件类型本身的触发语义。对于当前 Job 已配置且当前版本支持的事件类型,产生的新数据会按所属表的匹配结果处理;如果 Job 配置为 `sync_event = "load"`,则只监听对应的导入事件。 + +## 工作原理 ### 周期性同步执行流程 -1. FE 注册任务,记录 `sync_interval` 配置。 -2. FE 周期性检查是否到达触发时间(基于上次开始时间计算)。 -3. 触发同步任务,避免任务重叠执行。 -4. 同步完成后记录状态,等待下一个周期。 +1. FE 注册任务并记录同步间隔。 +2. FE 周期性检查是否到达触发时间。 +3. 到达触发时间后,FE 将待预热的表或分区转换为对应 Tablet 并分发任务。 +4. BE 从远端存储读取文件并写入目标计算组的 File Cache。 ### 事件驱动同步执行流程 -1. 用户创建事件驱动任务,FE 注册任务并将配置下发至源集群 BE。 -2. 源 BE 在 Load、Compaction 等事件完成后自动触发预热逻辑。 -3. 源 BE 向目标 BE 发起同步请求(以 Rowset 为粒度)。 -4. 同步完成后,目标 BE 向 FE 汇报执行状态。 +1. 用户创建事件驱动 Job,FE 持久化该同步关系。 +2. FE 将事件驱动配置下发到源计算组 BE。 +3. 源计算组 BE 在写入事件提交后触发预热逻辑。 +4. 对于表级事件驱动 Job,源 BE 只处理当前匹配表集合内的 Rowset。 +5. 目标计算组 BE 下载对应 Segment 和索引文件,写入本地 File Cache。 +6. FE 通过 `SHOW WARM UP JOB` 和 FE 指标暴露任务状态与同步进度。 -### 调度与存储机制 +## 指标监控 -- 同步关系由 FE 持久化存储为 `CloudWarmUpJob` 对象,支持多任务并发管理。 -- 同一目标集群允许存在多个 `PENDING` 状态的任务,但同一时间仅允许一个任务处于 `RUNNING` 状态,其余任务排队等候。 -- 支持通过 Compute Group 名称管理同步关系,兼容集群重命名和迁移操作。 +### SQL 观测 ---- +最直接的观测方式是使用 `SHOW WARM UP JOB`: -## 指标监控 +```sql +SHOW WARM UP JOB; +SHOW WARM UP JOB WHERE id = ; +``` - - +使用建议: -### 周期性任务 — FE 侧指标 +- `gap_size` 或详细 `gap_*` 持续趋近于 0,表示目标计算组基本跟上源计算组写入速度。 +- `trigger_gap_ms` 趋近于 0,表示目标计算组已经追上源计算组最新触发事件。 +- `fail_*` 大于 0 时,需要结合 BE 日志排查磁盘空间、远端存储访问或网络错误。 +- 5 分钟窗口适合看实时波动,30 分钟和 1 小时窗口适合看持续趋势。 -| 指标名称 | 含义 | -|----------|------| -| `file_cache_warm_up_job_exec_count` | 调度执行次数 | -| `file_cache_warm_up_job_requested_tablets` | 提交的 tablet 总数 | -| `file_cache_warm_up_job_finished_tablets` | 完成同步的 tablet 数量 | -| `file_cache_warm_up_job_latest_start_time` | 最近一次任务开始时间 | -| `file_cache_warm_up_job_last_finish_time` | 最近一次任务完成时间 | +### BE Bvar 指标 + +除 `SHOW WARM UP JOB` 和 FE `/metrics` 外,也可以通过 BE Bvar 页面查看单个 BE 上的预热执行指标: + +```bash +curl http://:/vars +``` -### 周期性任务 — BE 侧指标 +周期性任务的 BE 侧指标如下: | 指标名称 | 含义 | |----------|------| @@ -191,7 +453,7 @@ CANCEL WARM UP JOB WHERE id = 12345; | `file_cache_once_or_periodic_warm_up_submitted_index_num` | 已提交的 index 数量 | | `file_cache_once_or_periodic_warm_up_finished_index_num` | 已完成的 index 数量 | -### 事件驱动任务 — 源 BE 指标 +事件驱动任务的源 BE 指标如下: | 指标名称 | 含义 | |----------|------| @@ -199,7 +461,7 @@ CANCEL WARM UP JOB WHERE id = 12345; | `file_cache_event_driven_warm_up_requested_index_num` | 请求同步的 index 数量 | | `file_cache_warm_up_rowset_last_call_unix_ts` | 最后一次发起同步请求的时间戳 | -### 事件驱动任务 — 目标 BE 指标 +事件驱动任务的目标 BE 指标如下: | 指标名称 | 含义 | |----------|------| @@ -207,33 +469,145 @@ CANCEL WARM UP JOB WHERE id = 12345; | `file_cache_event_driven_warm_up_finished_segment_num` | 完成预热的 segment 数量 | | `file_cache_warm_up_rowset_last_handle_unix_ts` | 最后一次处理同步请求的时间戳 | ---- +这些指标反映单个 BE 本地的执行情况,适合排查某个 BE 是否收到预热请求、是否完成下载以及最近一次调用或处理时间。跨 BE 的 Job 级汇总仍建议优先查看 `SHOW WARM UP JOB WHERE id = ` 或 FE Prometheus 指标。 -## 常见问题 +### FE Prometheus 指标 + +在 cloud 模式下,FE 会周期性从 BE 拉取并聚合事件驱动预热进度,默认每 15 秒刷新一次。刷新间隔由 FE 配置项 `cloud_warm_up_sync_stats_refresh_interval_ms` 控制,默认值为 `15000` 毫秒。 + +可以从 FE `/metrics` 采集以下指标: + +| 指标名 | 说明 | +|--------|------| +| `doris_fe_file_cache_warm_up_sync_job_info` | Job 元信息,值恒为 1。包含 `job_id`、`job_type`、`sync_mode`、`sync_event`、`job_state`、源/目标计算组等标签 | +| `doris_fe_file_cache_warm_up_sync_job_size_bytes` | 源端已提交和目标端已完成的预热总大小,单位为字节。包含 `side` 和 `window` 标签 | +| `doris_fe_file_cache_warm_up_sync_job_trigger_gap_ms` | 源端最新触发时间与目标端进度水位之间的时间差,单位为毫秒 | + +常用 PromQL 示例: + +```promql +# 每个 Job 最近 5 分钟源端已提交的预热总大小 +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="src",window="5m"} + +# 每个 Job 最近 5 分钟目标端已完成的预热总大小 +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="dst",window="5m"} + +# 每个 Job 最近 5 分钟的同步大小缺口 +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="src",window="5m"} + - ignoring(side) +doris_fe_file_cache_warm_up_sync_job_size_bytes{side="dst",window="5m"} + +# 每个 Job 的触发进度时间差 +doris_fe_file_cache_warm_up_sync_job_trigger_gap_ms +``` + +`ignoring(side)` 表示在计算源端和目标端大小差值时忽略 `side` 标签,让 Prometheus 可以把同一个 Job、同一个窗口下的 `src` 和 `dst` 序列对齐相减。 + +## 完整操作流程 + +1. 查看当前计算组,确认源计算组和目标计算组名称: + + ```sql + SHOW COMPUTE GROUPS; + ``` + +2. 确认需要预热的库表范围: + + ```sql + SHOW TABLES FROM ods; + SHOW TABLES FROM dw; + ``` + +3. 创建表级事件驱动预热 Job: + + ```sql + WARM UP COMPUTE GROUP read_cg WITH COMPUTE GROUP write_cg + ON TABLES ( + INCLUDE 'ods.*', + INCLUDE 'dw.fact_*', + EXCLUDE 'ods.tmp_*' + ) + PROPERTIES ( + "sync_mode" = "event_driven", + "sync_event" = "load" + ); + ``` - - +4. 查看 Job 状态和匹配表: + + ```sql + SHOW WARM UP JOB; + ``` + +5. 写入数据后观察同步进度: + + ```sql + SHOW WARM UP JOB WHERE id = ; + ``` + +6. 如需调整规则,取消旧 Job 后重新创建: + + ```sql + CANCEL WARM UP JOB WHERE id = ; + ``` + +## 最佳实践 + +- 查询计算组只访问少量核心表时,优先使用表级事件驱动预热,避免计算组级预热占用过多缓存空间。 +- 对于库表命名规范清晰的数仓,使用 `INCLUDE 'dws.*'`、`INCLUDE 'ads.*'`、`EXCLUDE '*.tmp_*'` 这类规则维护成本更低。 +- 避免让多个 Job 覆盖同一批热点表,否则虽然目标端会尽量避免重复下载,但任务管理和指标解释会变复杂。 +- 需要修改预热范围时,使用取消并重建的方式,不要依赖旧 Job 自动改变规则。 +- 通过 `SHOW WARM UP JOB` 观察单个 Job 详情,通过 FE Prometheus 指标接入 Grafana 做长期趋势监控。 + +## 常见问题 **Q:某次同步失败会导致整个任务被取消吗?** -不会。当前轮次同步失败仅跳过本次执行,任务状态保持不变,后续周期会继续尝试执行。 +不会。当前轮次同步失败仅跳过本次执行,任务状态保持不变,后续周期会继续尝试执行。可以通过 `SHOW WARM UP JOB WHERE id = ` 查看 `ErrMsg` 和 `SyncStats` 中的失败计数,并结合 BE 日志定位失败原因。 **Q:周期性任务执行超时会怎样?** -超时后系统会跳过本轮执行,任务本身不会被删除,下一个周期将正常触发。 +超时后系统会跳过本轮执行,任务本身不会被删除,下一个周期将正常触发。可以通过 `SHOW WARM UP JOB` 中的 `StartTime`、`FinishTime`、`FinishBatch` 和 `AllBatch` 观察最近一次执行情况。 + +**Q:是否支持多个源计算组同步到同一目标计算组?** + +支持。例如计算组 A 和计算组 C 可以分别配置向计算组 B 同步(A -> B 与 C -> B 并存)。如果多个 Job 覆盖同一批表,需要注意任务管理和指标解释会更复杂。 + +**Q:什么时候应该使用表级事件驱动预热?** + +当目标计算组只查询部分库表,或者源计算组表数量很多但热点表较少时,使用表级事件驱动预热可以减少无效预热和缓存污染。 + +**Q:不使用 `ON TABLES` 时是什么行为?** + +不使用 `ON TABLES` 时为计算组级事件驱动预热,源计算组上符合事件类型的新写入数据都会触发预热。 + +**Q:`INCLUDE` 和 `EXCLUDE` 的顺序有影响吗?** + +没有影响。系统先计算所有 `INCLUDE` 的并集,再从中移除所有 `EXCLUDE` 匹配的表。 + +**Q:创建 Job 后新建了符合规则的表,会自动预热吗?** + +会。系统会定期重新评估规则,新表会在后续刷新周期中纳入预热范围。默认刷新周期为 60 秒。 + +**Q:表被重命名后还会继续预热吗?** + +取决于新表名是否仍匹配 `ON TABLES` 规则。匹配则继续预热,不匹配则在后续刷新周期中停止预热。 -**Q:是否支持多个源集群同步到同一目标集群?** +**Q:可以只预热某个分区的新写入数据吗?** -支持。例如集群 A 和集群 C 可以同时配置向集群 B 同步(A → B 与 C → B 并存)。 +表级事件驱动预热的过滤粒度是表,不支持在 `ON TABLES` 中指定分区。被匹配表上的新写入数据都会按表级规则处理。 -**Q:如何验证预热任务是否生效?** +**Q:如何验证预热是否生效?** -可通过以下方式验证: +可以通过以下方式验证: -1. 执行 `SHOW WARM UP JOB WHERE ID = ` 查看 `Status` 是否为 `RUNNING` 或 `FINISHED`。 -2. 对比 `FinishBatch` 与 `AllBatch`,确认同步进度。 -3. 观察目标集群的 BE 侧指标,确认 `finished_segment_num` 持续增长。 +1. 执行 `SHOW WARM UP JOB WHERE id = `,查看 `Status` 是否为 `RUNNING` 或 `FINISHED`。 +2. 对于表级事件驱动预热,检查 `MatchedTables` 是否符合预期。 +3. 对比 `FinishBatch` 与 `AllBatch`,确认一次性或周期性任务的同步进度。 +4. 查看 `SyncStats`,确认 `gap_size` 或详细 `gap_*` 趋近于 0,`trigger_gap_ms` 没有持续增大。 +5. 观察目标计算组的 BE Bvar 指标,确认 `file_cache_event_driven_warm_up_finished_segment_num` 等完成计数持续增长。 +6. 在目标计算组上查询相关表,结合 File Cache 命中率、FE 指标和 BE 日志确认是否仍存在大量远端读取。 -**Q:修改同步任务的配置(如调整同步间隔)需要怎么操作?** +**Q:修改同步任务的配置(如调整同步间隔或 `ON TABLES` 规则)需要怎么操作?** -当前版本不支持直接修改。需先执行 `CANCEL WARM UP JOB WHERE id = ` 取消旧任务,然后重新创建新任务。 +当前版本不支持直接修改已有任务配置。需要先执行 `CANCEL WARM UP JOB WHERE id = ` 取消旧任务,再创建新任务。