Skip to content

[SPARK-56944] [SQL] Trim aliases from ListAgg expression subtree#55984

Open
mihailoale-db wants to merge 1 commit into
apache:masterfrom
mihailoale-db:fixlistagg
Open

[SPARK-56944] [SQL] Trim aliases from ListAgg expression subtree#55984
mihailoale-db wants to merge 1 commit into
apache:masterfrom
mihailoale-db:fixlistagg

Conversation

@mihailoale-db
Copy link
Copy Markdown
Contributor

@mihailoale-db mihailoale-db commented May 19, 2026

What changes were proposed in this pull request?

In this PR I propose to trim aliases from ListAgg expression subtree in order to fix a discrepancy between single-pass and fixed-point analyzers.
It is a safe change since it would otherwise be removed in CleanupAliases.

Why are the changes needed?

To fix a dual-run issue.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added + existing tests.

Was this patch authored or co-authored using generative AI tooling?

Yes.

@mihailoale-db mihailoale-db changed the title [SPARK-56944] [SQL] Trim alises from ListAgg expression subtree [SPARK-56944] [SQL] Trim aliases from ListAgg expression subtree May 20, 2026
@mihailoale-db
Copy link
Copy Markdown
Contributor Author

@cloud-fan PTAL when you find time. Thanks!

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mihailo, the fix is reasonable and follows the pattern from SPARK-54871.

One question about completeness, and one about test coverage — see inline. Otherwise the change is straightforward.

}
if (someOrder.size == 1 && someOrder.head.child.semanticEquals(child)) {
if (someOrder.size == 1 &&
trimAliases(someOrder.head.child).semanticEquals(trimAliases(child))) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says "trim aliases from ListAgg expression subtree", but checkOrderValueDeterminism (line 684) still does orderExpressions.head.child.semanticEquals(castChild) without trimming. With spark.sql.listagg.allowDistinctCastWithOrder.enabled=true (default), a query like

SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a) FROM ...

reaches that path, and the two analyzers still produce different errors:

  • single-pass (aliases still present): the Alias-vs-Alias semanticEquals is false because the ExprIds differ, so we return NonDeterministicMismatch and throw functionAndOrderExpressionMismatchError.
  • fixed-point (post-CleanupAliases): semanticEquals is true, then isCastEqualityPreserving(VariantType) is false, so we throw functionAndOrderExpressionUnsafeCastError.

Should this comparison also wrap both sides in trimAliases? Otherwise the single-pass/fixed-point divergence is only partly closed.


-- LISTAGG with semi-structured extract (parser wraps v:a in Alias with fresh ExprId)
-- Tests that isOrderCompatible strips Alias wrappers before comparing via semanticEquals
SELECT listagg(DISTINCT v:a::string, ',') WITHIN GROUP (ORDER BY v:a::string) FROM (SELECT parse_json('{"a": "x"}') v UNION ALL SELECT parse_json('{"a": "y"}') UNION ALL SELECT parse_json('{"a": "x"}'));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLQueryTestSuite only runs the legacy analyzer here, where CleanupAliases has already stripped the parser-introduced Alias before CheckAnalysis calls isOrderCompatible. So these golden files pass on master without the fix too — they don't actually exercise the single-pass path the PR is fixing.

Consider adding a targeted single-pass unit test (e.g. in AggregateExpressionResolverSuite or alongside ResolverRunner) that constructs LISTAGG(DISTINCT v:a::string) WITHIN GROUP (ORDER BY v:a::string) and goes through Resolver directly, so the regression is pinned at the layer where it actually manifested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants