[analytics-engine] Fix planner gaps surfacing as 'JSONObject["datarows"] not found'#21876
[analytics-engine] Fix planner gaps surfacing as 'JSONObject["datarows"] not found'#21876mengweieric wants to merge 2 commits into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 5e7037c)Here are some key observations to aid the review process:
|
PPL `mvcombine` lowers to Calcite SqlLibraryOperators.ARRAY_AGG (name
"ARRAY_AGG", SqlKind.ARRAY_AGG) in CalciteRelNodeVisitor#performArrayAggAggregation.
The analytics-engine HEP marking phase, OpenSearchAggregateRule#resolveViableBackendsForCall,
calls AggregateFunction.fromSqlKind first and falls back to fromNameOrError
on null. Neither path previously recognized ARRAY_AGG: no enum constant
carried that SqlKind, and `valueOf("ARRAY_AGG")` threw IllegalArgumentException
which was wrapped as IllegalStateException and surfaced through the PPL
error renderer, taking down every mvcombine query on the analytics-engine
path.
Adds ARRAY_AGG as a STATE_EXPANDING aggregate with the same intermediate
shape as LIST: per-shard `array_agg` produces ARRAY<arg0>, cross-shard
`list_merge` un-nests. Routes through the same PplAggregateCallRewriter
LIST/VALUES branch (PARTIAL → LOCAL_ARRAY_AGG_OP, FINAL → LOCAL_LIST_MERGE_OP),
distinguished by the arg0-is-list check that already exists. Also adds
ARRAY_AGG to DataFusion's supported AGG_FUNCTIONS so the backend declares
capability.
Three-line edit: enum constant, capability set entry, switch case
extension.
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…hemaBuilder
Treat OpenSearch `nested` fields the same as `object` fields: walk their
`properties` sub-map and emit dotted leaves into the Calcite row type.
The parquet storage path (ArrowSchemaBuilder) already iterates the document
mapper's leaf mappers, so a nested field's children are written as flat
top-level columns (e.g. `skills.name`, `skills.level`). The Calcite row
type was the only side that hid them, leaving every PPL query referencing
a nested leaf to fail validation with "column not found" — which the SQL
plugin's PPL frontend renders as an error envelope without `datarows`,
producing the "JSONObject[\"datarows\"] not found" failures across the
mvexpand-edge-cases / graph-employees ITs on the force-routed AE path.
The pre-existing `nested`-skip branch was a placeholder ("a different
beast — deferred"); that deferral now bites every nested mapping. Sql
plugin's own OpenSearchDataType collapses Object and Nested into the
same recursion branch (data/type/OpenSearchDataType.java:147-157), and
this change matches that contract.
Renames the existing unit test that pinned the old skip-everything
behavior to reflect what it actually covers (object/nested without
properties is a no-op), and adds a positive test that asserts a nested
field with sub-properties surfaces its leaves as dotted columns.
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
b494dfb to
5e7037c
Compare
|
Persistent review updated to latest commit 5e7037c |
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
❌ Gradle check result for 5e7037c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Problem
PPL queries on the analytics-engine force-routed path fail with
JSONObject["datarows"] not found. Two distinct upstream causes:mvcombineplanning fails — PPLmvcombinelowers to CalciteARRAY_AGG, butAggregateFunctiondid not register it. The HEP marker rule throwsIllegalStateException, which the PPL renderer surfaces as an error envelope.OpenSearchSchemaBuilderskippednested-typed mappings entirely, so the Calcite row type never exposed their leaves. Parquet already writes them as flat dotted columns (e.g.skills.name), so every reference to such a leaf hits a Calcite "column not found".Solution
ARRAY_AGGas aSTATE_EXPANDINGaggregate (same shape asLIST); add it to DataFusion's capability set; route it through the existing PARTIAL/FINAL branch inPplAggregateCallRewriter.nestedlikeobjectinOpenSearchSchemaBuilder: recursepropertiesand emit dotted leaves. Matches the SQL plugin's ownOpenSearchDataType.Test plan
CalciteMvCombineCommandITon a force-routed AE clusterCalciteNewAddedCommandsITmvexpand-edge-cases / graph-employees on the same