diff --git a/docs/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md b/docs/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md new file mode 100644 index 0000000000000..90eb8de35488a --- /dev/null +++ b/docs/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md @@ -0,0 +1,118 @@ +--- +{ +"title": "DATASKETCHES_HLL_UNION_AGG", +"language": "en", +"description": "The datasketches_hll_union_agg function is an aggregate function used to union multiple Apache DataSketches HLL sketches and return the estimated cardinality of the union as a DOUBLE value." +} +--- + +## Description + +`datasketches_hll_union_agg` is an aggregate function used to **union** multiple Apache DataSketches **HLL** (`hll_sketch`) serialized values and return the **estimated cardinality** (approximate distinct count / NDV) after union. + +This function expects the input to be **serialized bytes of a DataSketches HLL sketch** (for example, generated by `hll_sketch.serialize_compact()` in the DataSketches library). It does not accept arbitrary strings. + +Aliases: + +- `ds_hll_estimate` +- `datasketches_hll_estimate` + +## Syntax + +```sql +datasketches_hll_union_agg() +``` + +## Parameters + +| Parameter | Description | +| -- | -- | +| `` | The serialized bytes of an Apache DataSketches HLL sketch. Supported types: STRING / VARCHAR / VARBINARY. NULL values are ignored. Empty strings are treated as invalid input and will throw an error. | + +## Return Value + +Returns a DOUBLE (Float64) cardinality estimate value. +If there is no valid data in the group (or the input is empty), returns 0. +If the input bytes cannot be deserialized as a valid DataSketches HLL sketch (including empty string), an error is thrown (typically with error code `CORRUPTION`). + +## Example + +```sql +-- setup +CREATE TABLE test_datasketches_hll_union_agg_tbl ( + id INT, + sk STRING +) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ("replication_num" = "1"); + +-- The sketch bytes are inserted via Base64 decoding. +INSERT INTO test_datasketches_hll_union_agg_tbl VALUES + (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')), + (2, from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')), + (3, NULL); +``` + +```sql +-- The function returns DOUBLE, so use ROUND/CAST if you want an integer display. +SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++-------------------------------------------------------+ +| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | ++-------------------------------------------------------+ +| 17 | ++-------------------------------------------------------+ +``` + +```sql +-- aliases +SELECT + CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1, + CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2, + CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3 +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++------+------+------+ +| v1 | v2 | v3 | ++------+------+------+ +| 17 | 17 | 17 | ++------+------+------+ +``` + +```sql +-- empty input returns 0 +SELECT datasketches_hll_union_agg(sk) +FROM test_datasketches_hll_union_agg_tbl +WHERE sk IS NULL; +``` + +```text ++--------------------------------+ +| datasketches_hll_union_agg(sk) | ++--------------------------------+ +| 0 | ++--------------------------------+ +``` + +```sql +-- invalid sketch bytes will throw +SELECT datasketches_hll_union_agg(from_base64('AA==')); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: Attempt to deserialize unknown object type +``` + +```sql +-- empty string is invalid and will throw +SELECT datasketches_hll_union_agg(''); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: empty input. +``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md new file mode 100644 index 0000000000000..b4e54b4817f0d --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md @@ -0,0 +1,118 @@ +--- +{ + "title": "DATASKETCHES_HLL_UNION_AGG", + "language": "zh-CN", + "description": "datasketches_hll_union_agg 函数是一种聚合函数,用于对多个 Apache DataSketches HLL sketch 的序列化结果进行 union 合并,并返回合并后基数的估算值(DOUBLE)。" +} +--- + +## 描述 + +`datasketches_hll_union_agg` 函数是一种聚合函数,用于对多个 **Apache DataSketches HLL sketch(hll_sketch)** 的序列化结果进行 **union 合并**,并返回合并后基数的**估算值**(近似去重数 / NDV)。 + +该函数的输入不是普通字符串,而是 **DataSketches HLL sketch 的序列化字节串**(例如由 DataSketches 的 `hll_sketch.serialize_compact()` 生成)。 + +别名: + +- `ds_hll_estimate` +- `datasketches_hll_estimate` + +## 语法 + +```sql +datasketches_hll_union_agg() +``` + +## 参数 + +| 参数 | 说明 | +| -- | -- | +| `` | DataSketches HLL sketch 的序列化字节串。支持类型:STRING / VARCHAR / VARBINARY。NULL 会被忽略;空字符串属于非法输入,将报错。 | + +## 返回值 + +返回 DOUBLE(Float64)类型的基数估算值。 +如果没有合法数据(例如全为 NULL,或表为空)则返回 0。 +若输入字节串无法反序列化为合法的 DataSketches HLL sketch(包括空字符串),将报错(通常错误码为 `CORRUPTION`)。 + +## 举例 + +```sql +-- setup +CREATE TABLE test_datasketches_hll_union_agg_tbl ( + id INT, + sk STRING +) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ("replication_num" = "1"); + +-- 通过 from_base64() 将 Base64 文本解码为 sketch 字节串后写入 +INSERT INTO test_datasketches_hll_union_agg_tbl VALUES + (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')), + (2, from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')), + (3, NULL); +``` + +```sql +-- 该函数返回 DOUBLE,如需以整数形式展示可配合 ROUND/CAST +SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++-------------------------------------------------------+ +| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | ++-------------------------------------------------------+ +| 17 | ++-------------------------------------------------------+ +``` + +```sql +-- 别名用法 +SELECT + CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1, + CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2, + CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3 +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++------+------+------+ +| v1 | v2 | v3 | ++------+------+------+ +| 17 | 17 | 17 | ++------+------+------+ +``` + +```sql +-- 组内无合法数据返回 0 +SELECT datasketches_hll_union_agg(sk) +FROM test_datasketches_hll_union_agg_tbl +WHERE sk IS NULL; +``` + +```text ++--------------------------------+ +| datasketches_hll_union_agg(sk) | ++--------------------------------+ +| 0 | ++--------------------------------+ +``` + +```sql +-- 非法 sketch 字节串将报错 +SELECT datasketches_hll_union_agg(from_base64('AA==')); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: Attempt to deserialize unknown object type +``` + +```sql +-- 空字符串属于非法输入,将报错 +SELECT datasketches_hll_union_agg(''); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: empty input. +``` \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md new file mode 100644 index 0000000000000..5d1960fa90781 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md @@ -0,0 +1,120 @@ +--- +{ + "title": "DATASKETCHES_HLL_UNION_AGG", + "language": "zh-CN", + "description": "datasketches_hll_union_agg 函数是一种聚合函数,用于对多个 Apache DataSketches HLL sketch 的序列化结果进行 union 合并,并返回合并后基数的估算值(DOUBLE)。" +} +--- + +> 从 4.1.2 版本开始支持。 + +## 描述 + +`datasketches_hll_union_agg` 函数是一种聚合函数,用于对多个 **Apache DataSketches HLL sketch(hll_sketch)** 的序列化结果进行 **union 合并**,并返回合并后基数的**估算值**(近似去重数 / NDV)。 + +该函数的输入不是普通字符串,而是 **DataSketches HLL sketch 的序列化字节串**(例如由 DataSketches 的 `hll_sketch.serialize_compact()` 生成)。 + +## 别名 + +- `ds_hll_estimate` +- `datasketches_hll_estimate` + +## 语法 + +```sql +datasketches_hll_union_agg() +``` + +## 参数 + +| 参数 | 说明 | +| -- | -- | +| `` | DataSketches HLL sketch 的序列化字节串。支持类型:STRING / VARCHAR / VARBINARY。NULL 会被忽略;空字符串属于非法输入,将报错。 | + +## 返回值 + +返回 DOUBLE(Float64)类型的基数估算值。 +如果没有合法数据(例如全为 NULL,或表为空)则返回 0。 +若输入字节串无法反序列化为合法的 DataSketches HLL sketch(包括空字符串),将报错(通常错误码为 `CORRUPTION`)。 + +## 举例 + +```sql +-- setup +CREATE TABLE test_datasketches_hll_union_agg_tbl ( + id INT, + sk STRING +) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ("replication_num" = "1"); + +-- 通过 from_base64() 将 Base64 文本解码为 sketch 字节串后写入 +INSERT INTO test_datasketches_hll_union_agg_tbl VALUES + (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')), + (2, from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')), + (3, NULL); +``` + +```sql +-- 该函数返回 DOUBLE,如需以整数形式展示可配合 ROUND/CAST +SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++-------------------------------------------------------+ +| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | ++-------------------------------------------------------+ +| 17 | ++-------------------------------------------------------+ +``` + +```sql +-- 别名用法 +SELECT + CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1, + CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2, + CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3 +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++------+------+------+ +| v1 | v2 | v3 | ++------+------+------+ +| 17 | 17 | 17 | ++------+------+------+ +``` + +```sql +-- 组内无合法数据返回 0 +SELECT datasketches_hll_union_agg(sk) +FROM test_datasketches_hll_union_agg_tbl +WHERE sk IS NULL; +``` + +```text ++--------------------------------+ +| datasketches_hll_union_agg(sk) | ++--------------------------------+ +| 0 | ++--------------------------------+ +``` + +```sql +-- 非法 sketch 字节串将报错 +SELECT datasketches_hll_union_agg(from_base64('AA==')); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: Attempt to deserialize unknown object type +``` + +```sql +-- 空字符串属于非法输入,将报错 +SELECT datasketches_hll_union_agg(''); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: empty input. +``` \ No newline at end of file diff --git a/sidebars.ts b/sidebars.ts index 327afb8a2831c..8edfc219570bc 100644 --- a/sidebars.ts +++ b/sidebars.ts @@ -1997,6 +1997,7 @@ const sidebars: SidebarsConfig = { 'sql-manual/sql-functions/aggregate-functions/count-by-enum', 'sql-manual/sql-functions/aggregate-functions/covar', 'sql-manual/sql-functions/aggregate-functions/covar-samp', + 'sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg', 'sql-manual/sql-functions/aggregate-functions/exponential-moving-average', 'sql-manual/sql-functions/aggregate-functions/group-array-intersect', 'sql-manual/sql-functions/aggregate-functions/group-array-union', diff --git a/versioned_docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md b/versioned_docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md new file mode 100644 index 0000000000000..9c2d1dea1f059 --- /dev/null +++ b/versioned_docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md @@ -0,0 +1,120 @@ +--- +{ + "title": "DATASKETCHES_HLL_UNION_AGG", + "language": "en", + "description": "The datasketches_hll_union_agg function is an aggregate function used to union multiple Apache DataSketches HLL sketches and return the estimated cardinality of the union as a DOUBLE value." +} +--- + +> Supported since version 4.1.2. + +## Description + +`datasketches_hll_union_agg` is an aggregate function used to **union** multiple Apache DataSketches **HLL** (`hll_sketch`) serialized values and return the **estimated cardinality** (approximate distinct count / NDV) after union. + +This function expects the input to be **serialized bytes of a DataSketches HLL sketch** (for example, generated by `hll_sketch.serialize_compact()` in the DataSketches library). It does not accept arbitrary strings. + +## Alias + +- `ds_hll_estimate` +- `datasketches_hll_estimate` + +## Syntax + +```sql +datasketches_hll_union_agg() +``` + +## Parameters + +| Parameter | Description | +| -- | -- | +| `` | The serialized bytes of an Apache DataSketches HLL sketch. Supported types: STRING / VARCHAR / VARBINARY. NULL values are ignored. Empty strings are treated as invalid input and will throw an error. | + +## Return Value + +Returns a DOUBLE (Float64) cardinality estimate value. +If there is no valid data in the group (or the input is empty), returns 0. +If the input bytes cannot be deserialized as a valid DataSketches HLL sketch (including empty string), an error is thrown (typically with error code `CORRUPTION`). + +## Example + +```sql +-- setup +CREATE TABLE test_datasketches_hll_union_agg_tbl ( + id INT, + sk STRING +) +DISTRIBUTED BY HASH(id) BUCKETS 1 +PROPERTIES ("replication_num" = "1"); + +-- The sketch bytes are inserted via Base64 decoding. +INSERT INTO test_datasketches_hll_union_agg_tbl VALUES + (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')), + (2, from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')), + (3, NULL); +``` + +```sql +-- The function returns DOUBLE, so use ROUND/CAST if you want an integer display. +SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++-------------------------------------------------------+ +| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | ++-------------------------------------------------------+ +| 17 | ++-------------------------------------------------------+ +``` + +```sql +-- aliases +SELECT + CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1, + CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2, + CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3 +FROM test_datasketches_hll_union_agg_tbl; +``` + +```text ++------+------+------+ +| v1 | v2 | v3 | ++------+------+------+ +| 17 | 17 | 17 | ++------+------+------+ +``` + +```sql +-- empty input returns 0 +SELECT datasketches_hll_union_agg(sk) +FROM test_datasketches_hll_union_agg_tbl +WHERE sk IS NULL; +``` + +```text ++--------------------------------+ +| datasketches_hll_union_agg(sk) | ++--------------------------------+ +| 0 | ++--------------------------------+ +``` + +```sql +-- invalid sketch bytes will throw +SELECT datasketches_hll_union_agg(from_base64('AA==')); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: Attempt to deserialize unknown object type +``` + +```sql +-- empty string is invalid and will throw +SELECT datasketches_hll_union_agg(''); +``` + +```text +ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: empty input. +``` \ No newline at end of file diff --git a/versioned_sidebars/version-4.x-sidebars.json b/versioned_sidebars/version-4.x-sidebars.json index a329bbc91ae9c..57c240ad928d0 100644 --- a/versioned_sidebars/version-4.x-sidebars.json +++ b/versioned_sidebars/version-4.x-sidebars.json @@ -2168,6 +2168,7 @@ "sql-manual/sql-functions/aggregate-functions/count-by-enum", "sql-manual/sql-functions/aggregate-functions/covar", "sql-manual/sql-functions/aggregate-functions/covar-samp", + "sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg", "sql-manual/sql-functions/aggregate-functions/group-array-intersect", "sql-manual/sql-functions/aggregate-functions/group-array-union", "sql-manual/sql-functions/aggregate-functions/group-bit-and",