Skip to content

[Analytics Engine] Skip partial/final split on non-prefix groupSet type mismatch (exact SqlTypeName)#21888

Closed
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:pr/aggsplit-exact-type
Closed

[Analytics Engine] Skip partial/final split on non-prefix groupSet type mismatch (exact SqlTypeName)#21888
ahkcs wants to merge 1 commit into
opensearch-project:mainfrom
ahkcs:pr/aggsplit-exact-type

Conversation

@ahkcs
Copy link
Copy Markdown
Contributor

@ahkcs ahkcs commented May 29, 2026

Description

OpenSearchAggregateSplitRule.shouldSkipPartialFinalSplit() skips the PARTIAL/FINAL split for non-prefix group sets — those where a group-key ordinal k >= groupCount lands on a PARTIAL agg-output slot. The structural FINAL reuses the original ordinals over the keys-first PARTIAL output, so such a group key would take the agg-output column's type, which must equal the original input-field type or Volcano's Aggregate.typeMatchesInferred / row-type-equivalence check throws.

The collision check compared SqlTypeFamily, which is too coarse: BIGINT and INTEGER both belong to the NUMERIC family, so a span group key (INTEGER) paired with a measure declared BIGINT slipped through, the split was attempted, and planning crashed with either:

AssertionError: type mismatch: aggCall type: BIGINT inferred type: INTEGER

or

IllegalArgumentException: Type mismatch: rel rowtype ... $f2: BIGINT -> INTEGER

Fix

Compare the exact SqlTypeName instead of the type family, so these cases fall back to a SINGLE coordinator aggregate (always correct) rather than an unsound partial/final split. One-line semantics change; no behavior change for prefix group sets (handled by the earlier early-return) or for non-prefix sets whose collision-slot types already match.

How it surfaced / verification

Surfaced by PPL chart ... by <field> span=... over an INTEGER measure on the analytics-engine route (e.g. chart max(balance) by age span=10, chart usenull=... avg(balance) over gender by age span=10). The equivalent stats ... by <field> span=... shape hits the same path.

Verified against CalciteChartCommandIT (opensearch-project/sql) force-routed through the analytics-engine path. The two span tests that crashed on planning go red → green:

Test (CalciteChartCommandIT) Before After
testChartMaxBalanceByAgeSpan aggCall type: BIGINT inferred type: INTEGER
testChartUseNullTrueWithNullStr Type mismatch … $f2: BIGINT -> INTEGER

Combined with the partition-boundary coercion fix (#21878), chart on the analytics-engine route goes from 8/15 → 10/15; the remaining 5 are out of scope here (datetime wire-format sql#5420 ×2, and the attributes.client.ip Binary/BinaryView scan mismatch ×3).

Follow-up

A self-contained QA IT under sandbox/qa/analytics-engine-rest exercising stats max(<int>) by <field> span=N would regression-cover this in OpenSearch CI (no existing Span/Stats QA IT covers the stats max(int-measure) by … span non-prefix shape). Not included here.

Check List

  • Commits are signed per the DCO using --signoff.

…pe mismatch (exact SqlTypeName)

OpenSearchAggregateSplitRule.shouldSkipPartialFinalSplit() skips the
PARTIAL/FINAL split for non-prefix group sets whose group-key ordinal would
land on a PARTIAL agg-output slot — the structural FINAL reuses the original
ordinals over the keys-first PARTIAL output, so a non-prefix group key would
take the agg-output column's type, which must equal the original input field
type or Volcano's Aggregate.typeMatchesInferred / row-type equivalence check
throws.

The collision check compared SqlTypeFamily, which is too coarse: BIGINT and
INTEGER both belong to the NUMERIC family, so a span group key (INTEGER) over
a measure declared BIGINT slipped through and the split was attempted,
crashing with either:

    AssertionError: type mismatch: aggCall type: BIGINT inferred type: INTEGER
or
    IllegalArgumentException: Type mismatch: rel rowtype ... $f2: BIGINT -> INTEGER

Compare the exact SqlTypeName instead, so these cases fall back to a SINGLE
coordinator aggregate (always correct) rather than an unsound split.

Surfaced by PPL `chart ... by <field> span=...` over an INTEGER measure on the
analytics-engine route (e.g. `chart max(balance) by age span=10`); also covers
the equivalent `stats ... by <field> span=...` shape.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs requested a review from a team as a code owner May 29, 2026 18:38
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add negative index validation

The bounds check for aggIdx should occur before using it to access aggCalls.
Consider reordering the condition to check aggIdx >= aggCalls.size() first, or
validate aggIdx >= 0 to prevent potential negative index access if k < groupCount
logic fails.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/planner/rules/OpenSearchAggregateSplitRule.java [74-77]

 int aggIdx = k - groupCount;
-if (aggIdx >= aggCalls.size() || k >= inputFields.size()) {
+if (aggIdx < 0 || aggIdx >= aggCalls.size() || k >= inputFields.size()) {
     return true;
 }
Suggestion importance[1-10]: 2

__

Why: The suggestion to add aggIdx < 0 check is unnecessary because the loop iterates over aggregate.getGroupSet().toArray() and explicitly continues when k < groupCount (line 71-73), ensuring aggIdx = k - groupCount is always non-negative when the check is reached. The existing bounds checks are sufficient.

Low

@ahkcs
Copy link
Copy Markdown
Contributor Author

ahkcs commented May 29, 2026

Folded into #21878 — combining the partition-boundary coercion and the non-prefix agg-split type fix into a single PR, since together they're what makes PPL chart work end-to-end on the analytics-engine route.

@ahkcs ahkcs closed this May 29, 2026
@ahkcs ahkcs deleted the pr/aggsplit-exact-type branch May 29, 2026 18:45
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 05562f7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant