-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-56944] [SQL] Trim aliases from ListAgg expression subtree #55984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -61,3 +61,15 @@ SELECT listagg(DISTINCT col1) WITHIN GROUP (ORDER BY col1, col2) FROM df; | |
| SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES (cast(1.1 as double)), (cast(2.2 as double)), (cast(2.2 as double)), (cast(3.3 as double)) AS t(col); | ||
| SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES (cast(1.0 as float)), (cast(2.0 as float)), (cast(2.0 as float)) AS t(col); | ||
| SELECT listagg(DISTINCT col, ',') WITHIN GROUP (ORDER BY col) FROM VALUES (TIMESTAMP'2024-01-01 10:00:00'), (TIMESTAMP'2024-01-02 12:00:00'), (TIMESTAMP'2024-01-01 10:00:00') AS t(col); | ||
|
|
||
| -- LISTAGG with semi-structured extract (parser wraps v:a in Alias with fresh ExprId) | ||
| -- Tests that isOrderCompatible strips Alias wrappers before comparing via semanticEquals | ||
| SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string) FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}') UNION ALL SELECT parse_json('{"a": "x"}')); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Consider adding a targeted single-pass unit test (e.g. in |
||
| -- Semi-structured extract without DISTINCT | ||
| SELECT listagg(v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string) FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}') UNION ALL SELECT parse_json('{"a": "x"}')); | ||
| -- Semi-structured extract with DESC ordering | ||
| SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string DESC) FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}') UNION ALL SELECT parse_json('{"a": "x"}')); | ||
| -- Semi-structured extract with nested path | ||
| SELECT listagg(DISTINCT v:a.b::string, ',') WITHIN GROUP (ORDER BY v:a.b::string) FROM (SELECT parse_json('{"a": {"b": "x"}}') v UNION ALL SELECT parse_json('{"a": {"b": "y"}}') UNION ALL SELECT parse_json('{"a": {"b": "x"}}')); | ||
| -- Semi-structured extract with GROUP BY | ||
| SELECT grp, listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string) FROM (SELECT 1 grp, parse_json('{"a": "x"}') v UNION ALL SELECT 1, parse_json('{"a": "y"}') UNION ALL SELECT 2, parse_json('{"a": "x"}') UNION ALL SELECT 2, parse_json('{"a": "x"}') UNION ALL SELECT 1, parse_json('{"a": "x"}')) GROUP BY grp; | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description says "trim aliases from
ListAggexpression subtree", butcheckOrderValueDeterminism(line 684) still doesorderExpressions.head.child.semanticEquals(castChild)without trimming. Withspark.sql.listagg.allowDistinctCastWithOrder.enabled=true(default), a query likereaches that path, and the two analyzers still produce different errors:
Alias-vs-AliassemanticEqualsis false because the ExprIds differ, so we returnNonDeterministicMismatchand throwfunctionAndOrderExpressionMismatchError.CleanupAliases):semanticEqualsis true, thenisCastEqualityPreserving(VariantType)is false, so we throwfunctionAndOrderExpressionUnsafeCastError.Should this comparison also wrap both sides in
trimAliases? Otherwise the single-pass/fixed-point divergence is only partly closed.