Skip to content

[analytics-engine] Fix planner gaps surfacing as 'JSONObject["datarows"] not found'#21876

Draft
mengweieric wants to merge 2 commits into
opensearch-project:mainfrom
mengweieric:ae-datarows-not-found
Draft

[analytics-engine] Fix planner gaps surfacing as 'JSONObject["datarows"] not found'#21876
mengweieric wants to merge 2 commits into
opensearch-project:mainfrom
mengweieric:ae-datarows-not-found

Conversation

@mengweieric
Copy link
Copy Markdown
Contributor

@mengweieric mengweieric commented May 28, 2026

Problem

PPL queries on the analytics-engine force-routed path fail with JSONObject["datarows"] not found. Two distinct upstream causes:

  1. mvcombine planning fails — PPL mvcombine lowers to Calcite ARRAY_AGG, but AggregateFunction did not register it. The HEP marker rule throws IllegalStateException, which the PPL renderer surfaces as an error envelope.
  2. Nested-field queries fail validationOpenSearchSchemaBuilder skipped nested-typed mappings entirely, so the Calcite row type never exposed their leaves. Parquet already writes them as flat dotted columns (e.g. skills.name), so every reference to such a leaf hits a Calcite "column not found".

Solution

  • Register ARRAY_AGG as a STATE_EXPANDING aggregate (same shape as LIST); add it to DataFusion's capability set; route it through the existing PARTIAL/FINAL branch in PplAggregateCallRewriter.
  • Treat nested like object in OpenSearchSchemaBuilder: recurse properties and emit dotted leaves. Matches the SQL plugin's own OpenSearchDataType.

Test plan

  • Unit test added for nested-with-properties exposing dotted leaves
  • Compile + spotless clean on edited modules
  • CalciteMvCombineCommandIT on a force-routed AE cluster
  • CalciteNewAddedCommandsIT mvexpand-edge-cases / graph-employees on the same

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 28, 2026

PR Reviewer Guide 🔍

(Review updated until commit 5e7037c)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

PPL `mvcombine` lowers to Calcite SqlLibraryOperators.ARRAY_AGG (name
"ARRAY_AGG", SqlKind.ARRAY_AGG) in CalciteRelNodeVisitor#performArrayAggAggregation.
The analytics-engine HEP marking phase, OpenSearchAggregateRule#resolveViableBackendsForCall,
calls AggregateFunction.fromSqlKind first and falls back to fromNameOrError
on null. Neither path previously recognized ARRAY_AGG: no enum constant
carried that SqlKind, and `valueOf("ARRAY_AGG")` threw IllegalArgumentException
which was wrapped as IllegalStateException and surfaced through the PPL
error renderer, taking down every mvcombine query on the analytics-engine
path.

Adds ARRAY_AGG as a STATE_EXPANDING aggregate with the same intermediate
shape as LIST: per-shard `array_agg` produces ARRAY<arg0>, cross-shard
`list_merge` un-nests. Routes through the same PplAggregateCallRewriter
LIST/VALUES branch (PARTIAL → LOCAL_ARRAY_AGG_OP, FINAL → LOCAL_LIST_MERGE_OP),
distinguished by the arg0-is-list check that already exists. Also adds
ARRAY_AGG to DataFusion's supported AGG_FUNCTIONS so the backend declares
capability.

Three-line edit: enum constant, capability set entry, switch case
extension.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…hemaBuilder

Treat OpenSearch `nested` fields the same as `object` fields: walk their
`properties` sub-map and emit dotted leaves into the Calcite row type.

The parquet storage path (ArrowSchemaBuilder) already iterates the document
mapper's leaf mappers, so a nested field's children are written as flat
top-level columns (e.g. `skills.name`, `skills.level`). The Calcite row
type was the only side that hid them, leaving every PPL query referencing
a nested leaf to fail validation with "column not found" — which the SQL
plugin's PPL frontend renders as an error envelope without `datarows`,
producing the "JSONObject[\"datarows\"] not found" failures across the
mvexpand-edge-cases / graph-employees ITs on the force-routed AE path.

The pre-existing `nested`-skip branch was a placeholder ("a different
beast — deferred"); that deferral now bites every nested mapping. Sql
plugin's own OpenSearchDataType collapses Object and Nested into the
same recursion branch (data/type/OpenSearchDataType.java:147-157), and
this change matches that contract.

Renames the existing unit test that pinned the old skip-everything
behavior to reflect what it actually covers (object/nested without
properties is a no-op), and adds a positive test that asserts a nested
field with sub-properties surfaces its leaves as dotted columns.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the ae-datarows-not-found branch from b494dfb to 5e7037c Compare May 28, 2026 21:46
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 5e7037c

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check for fieldProps

Add null-safety check for fieldProps before accessing it. If fieldEntry.getValue()
returns null, the cast to Map<String, Object> will succeed but subsequent operations
will fail with NullPointerException.

sandbox/libs/analytics-api/src/main/java/org/opensearch/analytics/schema/OpenSearchSchemaBuilder.java [289-295]

+if (fieldProps == null) {
+    continue;
+}
 if (fieldType == null || "object".equals(fieldType) || "nested".equals(fieldType)) {
     Map<String, Object> nested = (Map<String, Object>) fieldProps.get("properties");
     if (nested != null) {
         addLeafFields(builder, typeFactory, nested, fieldName);
     }
     continue;
 }
Suggestion importance[1-10]: 7

__

Why: Valid defensive programming suggestion. If fieldEntry.getValue() returns null, the cast on line 279 succeeds but accessing fieldProps.get("type") on line 280 would throw NullPointerException. However, this appears to be an edge case that may not occur in practice with valid OpenSearch mappings.

Medium
General
Handle ARRAY_AGG distinct behavior explicitly

The new ARRAY_AGG case shares logic with LIST and VALUES, but the isValues boolean
only checks for "VALUES". If ARRAY_AGG requires distinct handling similar to VALUES,
this logic may be incomplete. Verify that ARRAY_AGG doesn't need its own distinct
flag.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/PplAggregateCallRewriter.java [79-88]

 case "LIST", "VALUES", "ARRAY_AGG" -> {
     // arg0 type tells us PARTIAL (raw element) vs FINAL (already array):
     // PARTIAL → array_agg (with INVOCATION_DISTINCT for VALUES); rebuild
     // return type as ARRAY<arg0> to repair PPL's lossy STRING_ARRAY.
     // FINAL → list_merge / list_merge_distinct un-nests per-shard arrays.
     if (call.getArgList().isEmpty()) {
         return call;
     }
     boolean isValues = "VALUES".equalsIgnoreCase(aggregation.getName());
+    boolean isArrayAgg = "ARRAY_AGG".equalsIgnoreCase(aggregation.getName());
Suggestion importance[1-10]: 5

__

Why: The suggestion asks to verify if ARRAY_AGG needs distinct handling. While adding an isArrayAgg boolean could be useful, the comment in the code states that ARRAY_AGG has the "same PARTIAL/FINAL state shape as LIST", suggesting it follows LIST's behavior rather than VALUES' distinct behavior. The suggestion is reasonable but may not be necessary based on the documented behavior.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 5e7037c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mengweieric mengweieric changed the title [analytics-engine] Fix two PPL-on-AE planner gaps surfacing as 'JSONObject["datarows"] not found' [analytics-engine] Fix planner gaps surfacing as 'JSONObject["datarows"] not found' May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant