-
Notifications
You must be signed in to change notification settings - Fork 434
add docs for aggregation function datasketches_hll_union_agg #3711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
507850b
9993279
989cfc9
1ae9dc0
a43aba8
06bda0d
e2cd04a
df94deb
2c4f346
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| --- | ||
| { | ||
| "title": "DATASKETCHES_HLL_UNION_AGG", | ||
| "language": "en", | ||
| "description": "The datasketches_hll_union_agg function is an aggregate function used to union multiple Apache DataSketches HLL sketches and return the estimated cardinality of the union as a DOUBLE value." | ||
| } | ||
| --- | ||
|
|
||
| ## Description | ||
|
|
||
| `datasketches_hll_union_agg` is an aggregate function used to **union** multiple Apache DataSketches **HLL** (`hll_sketch`) serialized values and return the **estimated cardinality** (approximate distinct count / NDV) after union. | ||
|
|
||
| This function expects the input to be **serialized bytes of a DataSketches HLL sketch** (for example, generated by `hll_sketch.serialize_compact()` in the DataSketches library). It does not accept arbitrary strings. | ||
|
|
||
| Aliases: | ||
|
|
||
| - `ds_hll_estimate` | ||
| - `datasketches_hll_estimate` | ||
|
|
||
| ## Syntax | ||
|
|
||
| ```sql | ||
| datasketches_hll_union_agg(<sketch>) | ||
| ``` | ||
|
|
||
| ## Parameters | ||
|
|
||
| | Parameter | Description | | ||
| | -- | -- | | ||
| | `<sketch>` | The serialized bytes of an Apache DataSketches HLL sketch. Supported types: STRING / VARCHAR / VARBINARY. NULL values are ignored. Empty strings are treated as invalid input and will throw an error. | | ||
|
|
||
| ## Return Value | ||
|
|
||
| Returns a DOUBLE (Float64) cardinality estimate value. | ||
| If there is no valid data in the group (or the input is empty), returns 0. | ||
| If the input bytes cannot be deserialized as a valid DataSketches HLL sketch (including empty string), an error is thrown (typically with error code `CORRUPTION`). | ||
|
|
||
| ## Example | ||
|
|
||
| ```sql | ||
| -- setup | ||
| CREATE TABLE test_datasketches_hll_union_agg_tbl ( | ||
| id INT, | ||
| sk STRING | ||
| ) | ||
| DISTRIBUTED BY HASH(id) BUCKETS 1 | ||
| PROPERTIES ("replication_num" = "1"); | ||
|
|
||
| -- The sketch bytes are inserted via Base64 decoding. | ||
| INSERT INTO test_datasketches_hll_union_agg_tbl VALUES | ||
| (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')), | ||
| (2, from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')), | ||
| (3, NULL); | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- The function returns DOUBLE, so use ROUND/CAST if you want an integer display. | ||
| SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | ||
| FROM test_datasketches_hll_union_agg_tbl; | ||
| ``` | ||
|
|
||
| ```text | ||
| +-------------------------------------------------------+ | ||
| | CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | | ||
| +-------------------------------------------------------+ | ||
| | 17 | | ||
| +-------------------------------------------------------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- aliases | ||
| SELECT | ||
| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1, | ||
| CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2, | ||
| CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3 | ||
| FROM test_datasketches_hll_union_agg_tbl; | ||
| ``` | ||
|
|
||
| ```text | ||
| +------+------+------+ | ||
| | v1 | v2 | v3 | | ||
| +------+------+------+ | ||
| | 17 | 17 | 17 | | ||
| +------+------+------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- empty input returns 0 | ||
| SELECT datasketches_hll_union_agg(sk) | ||
| FROM test_datasketches_hll_union_agg_tbl | ||
| WHERE sk IS NULL; | ||
| ``` | ||
|
|
||
| ```text | ||
| +--------------------------------+ | ||
| | datasketches_hll_union_agg(sk) | | ||
| +--------------------------------+ | ||
| | 0 | | ||
| +--------------------------------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- invalid sketch bytes will throw | ||
|
nooneuse marked this conversation as resolved.
|
||
| SELECT datasketches_hll_union_agg(from_base64('AA==')); | ||
| ``` | ||
|
|
||
| ```text | ||
| ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: Attempt to deserialize unknown object type | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- empty string is invalid and will throw | ||
| SELECT datasketches_hll_union_agg(''); | ||
| ``` | ||
|
|
||
| ```text | ||
| ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: empty input. | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| --- | ||
| { | ||
| "title": "DATASKETCHES_HLL_UNION_AGG", | ||
| "language": "zh-CN", | ||
| "description": "datasketches_hll_union_agg 函数是一种聚合函数,用于对多个 Apache DataSketches HLL sketch 的序列化结果进行 union 合并,并返回合并后基数的估算值(DOUBLE)。" | ||
| } | ||
| --- | ||
|
|
||
| ## 描述 | ||
|
|
||
| `datasketches_hll_union_agg` 函数是一种聚合函数,用于对多个 **Apache DataSketches HLL sketch(hll_sketch)** 的序列化结果进行 **union 合并**,并返回合并后基数的**估算值**(近似去重数 / NDV)。 | ||
|
|
||
| 该函数的输入不是普通字符串,而是 **DataSketches HLL sketch 的序列化字节串**(例如由 DataSketches 的 `hll_sketch.serialize_compact()` 生成)。 | ||
|
|
||
| 别名: | ||
|
|
||
| - `ds_hll_estimate` | ||
| - `datasketches_hll_estimate` | ||
|
|
||
| ## 语法 | ||
|
|
||
| ```sql | ||
| datasketches_hll_union_agg(<sketch>) | ||
| ``` | ||
|
|
||
| ## 参数 | ||
|
|
||
| | 参数 | 说明 | | ||
| | -- | -- | | ||
| | `<sketch>` | DataSketches HLL sketch 的序列化字节串。支持类型:STRING / VARCHAR / VARBINARY。NULL 会被忽略;空字符串属于非法输入,将报错。 | | ||
|
|
||
| ## 返回值 | ||
|
|
||
| 返回 DOUBLE(Float64)类型的基数估算值。 | ||
| 如果没有合法数据(例如全为 NULL,或表为空)则返回 0。 | ||
| 若输入字节串无法反序列化为合法的 DataSketches HLL sketch(包括空字符串),将报错(通常错误码为 `CORRUPTION`)。 | ||
|
|
||
| ## 举例 | ||
|
|
||
| ```sql | ||
| -- setup | ||
| CREATE TABLE test_datasketches_hll_union_agg_tbl ( | ||
| id INT, | ||
| sk STRING | ||
| ) | ||
| DISTRIBUTED BY HASH(id) BUCKETS 1 | ||
| PROPERTIES ("replication_num" = "1"); | ||
|
|
||
| -- 通过 from_base64() 将 Base64 文本解码为 sketch 字节串后写入 | ||
| INSERT INTO test_datasketches_hll_union_agg_tbl VALUES | ||
| (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')), | ||
| (2, from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')), | ||
| (3, NULL); | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 该函数返回 DOUBLE,如需以整数形式展示可配合 ROUND/CAST | ||
| SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | ||
| FROM test_datasketches_hll_union_agg_tbl; | ||
| ``` | ||
|
|
||
| ```text | ||
| +-------------------------------------------------------+ | ||
| | CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | | ||
| +-------------------------------------------------------+ | ||
| | 17 | | ||
| +-------------------------------------------------------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 别名用法 | ||
| SELECT | ||
| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1, | ||
| CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2, | ||
| CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3 | ||
| FROM test_datasketches_hll_union_agg_tbl; | ||
| ``` | ||
|
|
||
| ```text | ||
| +------+------+------+ | ||
| | v1 | v2 | v3 | | ||
| +------+------+------+ | ||
| | 17 | 17 | 17 | | ||
| +------+------+------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 组内无合法数据返回 0 | ||
| SELECT datasketches_hll_union_agg(sk) | ||
| FROM test_datasketches_hll_union_agg_tbl | ||
| WHERE sk IS NULL; | ||
| ``` | ||
|
|
||
| ```text | ||
| +--------------------------------+ | ||
| | datasketches_hll_union_agg(sk) | | ||
| +--------------------------------+ | ||
| | 0 | | ||
| +--------------------------------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 非法 sketch 字节串将报错 | ||
| SELECT datasketches_hll_union_agg(from_base64('AA==')); | ||
| ``` | ||
|
|
||
| ```text | ||
| ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: Attempt to deserialize unknown object type | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 空字符串属于非法输入,将报错 | ||
| SELECT datasketches_hll_union_agg(''); | ||
| ``` | ||
|
|
||
| ```text | ||
| ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: empty input. | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,120 @@ | ||
| --- | ||
|
Check warning on line 1 in i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
|
||
| { | ||
| "title": "DATASKETCHES_HLL_UNION_AGG", | ||
| "language": "zh-CN", | ||
| "description": "datasketches_hll_union_agg 函数是一种聚合函数,用于对多个 Apache DataSketches HLL sketch 的序列化结果进行 union 合并,并返回合并后基数的估算值(DOUBLE)。" | ||
| } | ||
| --- | ||
|
|
||
| > 从 4.1.2 版本开始支持。 | ||
|
|
||
| ## 描述 | ||
|
|
||
| `datasketches_hll_union_agg` 函数是一种聚合函数,用于对多个 **Apache DataSketches HLL sketch(hll_sketch)** 的序列化结果进行 **union 合并**,并返回合并后基数的**估算值**(近似去重数 / NDV)。 | ||
|
|
||
| 该函数的输入不是普通字符串,而是 **DataSketches HLL sketch 的序列化字节串**(例如由 DataSketches 的 `hll_sketch.serialize_compact()` 生成)。 | ||
|
|
||
| ## 别名 | ||
|
|
||
| - `ds_hll_estimate` | ||
| - `datasketches_hll_estimate` | ||
|
|
||
|
Check warning on line 21 in i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/sql-manual/sql-functions/aggregate-functions/datasketches_hll_union_agg.md
|
||
| ## 语法 | ||
|
|
||
| ```sql | ||
| datasketches_hll_union_agg(<sketch>) | ||
| ``` | ||
|
|
||
| ## 参数 | ||
|
|
||
| | 参数 | 说明 | | ||
| | -- | -- | | ||
| | `<sketch>` | DataSketches HLL sketch 的序列化字节串。支持类型:STRING / VARCHAR / VARBINARY。NULL 会被忽略;空字符串属于非法输入,将报错。 | | ||
|
|
||
| ## 返回值 | ||
|
|
||
| 返回 DOUBLE(Float64)类型的基数估算值。 | ||
| 如果没有合法数据(例如全为 NULL,或表为空)则返回 0。 | ||
| 若输入字节串无法反序列化为合法的 DataSketches HLL sketch(包括空字符串),将报错(通常错误码为 `CORRUPTION`)。 | ||
|
|
||
| ## 举例 | ||
|
|
||
| ```sql | ||
| -- setup | ||
| CREATE TABLE test_datasketches_hll_union_agg_tbl ( | ||
| id INT, | ||
| sk STRING | ||
| ) | ||
| DISTRIBUTED BY HASH(id) BUCKETS 1 | ||
| PROPERTIES ("replication_num" = "1"); | ||
|
|
||
| -- 通过 from_base64() 将 Base64 文本解码为 sketch 字节串后写入 | ||
| INSERT INTO test_datasketches_hll_union_agg_tbl VALUES | ||
| (1, from_base64('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK')), | ||
| (2, from_base64('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA==')), | ||
| (3, NULL); | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 该函数返回 DOUBLE,如需以整数形式展示可配合 ROUND/CAST | ||
| SELECT CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | ||
| FROM test_datasketches_hll_union_agg_tbl; | ||
| ``` | ||
|
|
||
| ```text | ||
| +-------------------------------------------------------+ | ||
| | CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) | | ||
| +-------------------------------------------------------+ | ||
| | 17 | | ||
| +-------------------------------------------------------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 别名用法 | ||
| SELECT | ||
| CAST(ROUND(datasketches_hll_union_agg(sk)) AS BIGINT) AS v1, | ||
| CAST(ROUND(ds_hll_estimate(sk)) AS BIGINT) AS v2, | ||
| CAST(ROUND(datasketches_hll_estimate(sk)) AS BIGINT) AS v3 | ||
| FROM test_datasketches_hll_union_agg_tbl; | ||
| ``` | ||
|
|
||
| ```text | ||
| +------+------+------+ | ||
| | v1 | v2 | v3 | | ||
| +------+------+------+ | ||
| | 17 | 17 | 17 | | ||
| +------+------+------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 组内无合法数据返回 0 | ||
| SELECT datasketches_hll_union_agg(sk) | ||
| FROM test_datasketches_hll_union_agg_tbl | ||
| WHERE sk IS NULL; | ||
| ``` | ||
|
|
||
| ```text | ||
| +--------------------------------+ | ||
| | datasketches_hll_union_agg(sk) | | ||
| +--------------------------------+ | ||
| | 0 | | ||
| +--------------------------------+ | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 非法 sketch 字节串将报错 | ||
| SELECT datasketches_hll_union_agg(from_base64('AA==')); | ||
| ``` | ||
|
|
||
| ```text | ||
| ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: Attempt to deserialize unknown object type | ||
| ``` | ||
|
|
||
| ```sql | ||
| -- 空字符串属于非法输入,将报错 | ||
| SELECT datasketches_hll_union_agg(''); | ||
| ``` | ||
|
|
||
| ```text | ||
| ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[CORRUPTION]HLL sketch data corrupted when add: empty input. | ||
| ``` | ||
Uh oh!
There was an error while loading. Please reload this page.