Skip to content

Fix for text functions with non-text filter condition#21882

Open
nssuresh2007 wants to merge 1 commit into
opensearch-project:mainfrom
nssuresh2007:filterrulefix
Open

Fix for text functions with non-text filter condition#21882
nssuresh2007 wants to merge 1 commit into
opensearch-project:mainfrom
nssuresh2007:filterrulefix

Conversation

@nssuresh2007
Copy link
Copy Markdown
Contributor

Description

Reject text-relevance functions when applied to non-text/non-keyword fields, with a clear error at planning time.

Problem

query_string(['severityNumber'], 'severityNumber:>15') on a long field was being routed to Lucene, even though Lucene's QUERY_STRING capability is only declared for text and keyword types. The multi-field path in OpenSearchFilterRule extracted literal field names but never validated their types against what text-relevance functions support — leaving the user with a generic "no backend can evaluate" error downstream (or worse, a wrong result).

Fix

Added a type check in OpenSearchFilterRule.resolveViableBackends: for any text-relevance function with explicit fields=[...], validate each resolved field is text or keyword. If not, throw IllegalArgumentException naming the function, field, and mapping type, plus a hint to use a typed comparison. Unknown fields still fall through to existing behavior.

Example error:

Text-relevance function [QUERY_STRING] cannot be applied to field [severityNumber] of type [long]. Only text and keyword fields are supported. Use a typed comparison (e.g. [severityNumber > value]) for non-text fields.

Out of scope

  • Inline field references in the query expression (e.g. query_string(['title'], 'severity:>15') where severity appears inline) — needs Lucene query parsing.
  • Splitting mixed-type predicates across Lucene + DataFusion. For example, query_string(['title', 'severity'], 'login AND severity:>15') could in principle be split into a Lucene clause query_string(['title'], 'login') and a DataFusion clause severity > 15, but the rewrite is non-trivial (OR/NOT, score preservation, default-field semantics), so this PR rejects the predicate instead.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Suresh N S <nssuresh@amazon.com>
@nssuresh2007 nssuresh2007 requested a review from a team as a code owner May 29, 2026 06:52
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The loop at line 302 increments by 2 (i += 2) assuming field-name/boost pairs, but if nestedOperands.size() is odd, the last iteration will access index i (the field name) but skip the boost at i+1 that doesn't exist. While this may not cause an immediate exception if only the field name is read, the assumption that pairs exist is fragile. If the MAP structure is malformed or has an odd number of elements, this could lead to unexpected behavior or accessing out-of-bounds indices in future modifications.

for (int i = 0; i < nestedOperands.size(); i += 2) {
    RexNode fieldNode = nestedOperands.get(i);
    if (fieldNode instanceof RexLiteral fieldLit) {
        String fieldName = fieldLit.getValueAs(String.class);
        if (fieldName != null && !fieldName.isEmpty()) {
            names.add(fieldName);
        }
    }
}

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent index out of bounds

Add bounds checking before accessing nestedOperands.get(i) to prevent
IndexOutOfBoundsException when the nested MAP has an odd number of operands.
Malformed or unexpected MAP structures could cause the loop to access an index
beyond the list size.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/planner/rules/OpenSearchFilterRule.java [302-310]

-for (int i = 0; i < nestedOperands.size(); i += 2) {
+for (int i = 0; i + 1 < nestedOperands.size(); i += 2) {
     RexNode fieldNode = nestedOperands.get(i);
     if (fieldNode instanceof RexLiteral fieldLit) {
         String fieldName = fieldLit.getValueAs(String.class);
         if (fieldName != null && !fieldName.isEmpty()) {
             names.add(fieldName);
         }
     }
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential IndexOutOfBoundsException when the nested MAP has an odd number of operands. The improved loop condition i + 1 < nestedOperands.size() ensures safe access to paired elements, preventing runtime errors from malformed MAP structures.

Medium
General
Handle null mapping type gracefully

Check if storageInfo.getMappingType() returns null before using it in the error
message. If the mapping type is unavailable, the error message could display "null",
reducing clarity for users debugging the issue.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/planner/rules/OpenSearchFilterRule.java [361-374]

 if (type != null && allowed.contains(type) == false) {
+    String mappingType = storageInfo.getMappingType() != null ? storageInfo.getMappingType() : "unknown";
     throw new IllegalArgumentException(
         "Text-relevance function ["
             + functionName
             + "] cannot be applied to field ["
             + fieldName
             + "] of type ["
-            + storageInfo.getMappingType()
+            + mappingType
             + "]. Only text and keyword fields are supported. "
             + "Use a typed comparison (e.g. ["
             + fieldName
             + " > value]) for non-text fields."
     );
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion improves error message clarity by handling potential null values from storageInfo.getMappingType(). While this is a defensive programming practice that enhances user experience, the impact is moderate since the condition already checks type != null and getMappingType() returning null in this context may be unlikely.

Low

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for f8d5381: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.46%. Comparing base (e0404f4) to head (f8d5381).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21882      +/-   ##
============================================
+ Coverage     73.36%   73.46%   +0.10%     
- Complexity    75430    75487      +57     
============================================
  Files          6034     6034              
  Lines        342604   342604              
  Branches      49279    49279              
============================================
+ Hits         251357   251704     +347     
+ Misses        71220    70873     -347     
  Partials      20027    20027              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant