Skip to content

[Feature Request] gRPC indexing: add support for ExtraFieldsValues#21907

Open
laminelam wants to merge 2 commits into
opensearch-project:mainfrom
laminelam:feature/grpc_proto_release
Open

[Feature Request] gRPC indexing: add support for ExtraFieldsValues#21907
laminelam wants to merge 2 commits into
opensearch-project:mainfrom
laminelam:feature/grpc_proto_release

Conversation

@laminelam
Copy link
Copy Markdown
Contributor

Resolves #21892

Is your feature request related to a problem? Please describe

Adds support for extra_field_values in gRPC bulk API requests, enabling callers to attach binary or float array fields outside of _source for index, create, and update operations.

Related component

Indexing
Indexing:Performance

Additional context

Add support of "extra fields" outside _source indexing
[gRPC] Move primitive array types out of source

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Lamine Idjeraoui added 2 commits May 29, 2026 13:21
@laminelam laminelam requested a review from a team as a code owner May 30, 2026 16:39
@github-actions github-actions Bot added enhancement Enhancement or improvement to existing feature or request missing-component labels May 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

HashMap initial capacity calculation may cause excessive memory allocation. When m.size() is 1, Math.max(16, m.size() * 2) returns 16, but when m.size() is 8 or more, it returns m.size() * 2. This means for 8 entries, it allocates capacity for 16 entries, which is correct. However, for 1-7 entries, it always allocates 16, which may be wasteful for small maps. More critically, the formula m.size() * 2 does not account for HashMap's load factor (0.75 by default), so the map may still resize. For example, with 10 entries and capacity 20, the threshold is 15, causing a resize when the 16th entry is added during iteration.

Map<String, ExtraFieldValue> out = new HashMap<>(Math.max(16, m.size() * 2));
Possible Issue

The check extraFieldValues.isEmpty() == false may throw NullPointerException if extraFieldValues is null. While fromProto returns ExtraFieldValues.EMPTY for empty maps, if fromProto were to return null in any edge case or future change, this would fail. The pattern is inconsistent with the null-safe check in applyUpdateExtraFieldValues at line 465.

if (extraFieldValues.isEmpty() == false) {
    throw new IllegalArgumentException("extra_field_values are not supported for delete operations");

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Fix HashMap initial capacity calculation

The initial capacity calculation Math.max(16, m.size() * 2) is incorrect for HashMap
sizing. HashMap requires capacity to account for load factor (default 0.75). Use
(int) ((m.size() / 0.75f) + 1) to avoid rehashing, or simply use new
HashMap<>(m.size()) for reasonable performance.

modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/document/bulk/ExtraFieldValuesProtoUtils.java [36]

-Map<String, ExtraFieldValue> out = new HashMap<>(Math.max(16, m.size() * 2));
+Map<String, ExtraFieldValue> out = new HashMap<>((int) ((m.size() / 0.75f) + 1));
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies a suboptimal HashMap initialization. Using Math.max(16, m.size() * 2) doesn't account for the load factor, which could cause unnecessary rehashing. However, the impact is minor as it only affects performance slightly, not correctness.

Low
Improve error message for empty bytes

When dimension == 0 and byteLength == 0, the method returns 0, which is valid.
However, when dimension > 0 but byteLength == 0, the validation correctly fails.
Consider adding an explicit check for empty bytes when dimension is specified to
provide a clearer error message.

modules/transport-grpc/src/main/java/org/opensearch/transport/grpc/proto/request/document/bulk/ExtraFieldValuesProtoUtils.java [84-111]

 private static int resolvePackedDimension(ByteString bytes, int dimension, int bytesPerElement, String valueType) {
     if (dimension < 0) {
         throw new IllegalArgumentException(valueType + " dimension must be >= 0 but was " + dimension);
     }
 
     int byteLength = bytes.size();
     if (dimension == 0) {
         if (byteLength % bytesPerElement != 0) {
             throw new IllegalArgumentException(
                 valueType + " packed_le byte length must be multiple of " + bytesPerElement + " but was " + byteLength
             );
         }
         return byteLength / bytesPerElement;
+    }
+
+    if (byteLength == 0) {
+        throw new IllegalArgumentException(valueType + " packed_le bytes cannot be empty when dimension is specified");
     }
 
     final int expectedByteLength;
     try {
         expectedByteLength = Math.multiplyExact(dimension, bytesPerElement);
     } catch (ArithmeticException e) {
         throw new IllegalArgumentException(valueType + " dimension too large: " + dimension, e);
     }
     if (byteLength != expectedByteLength) {
         throw new IllegalArgumentException(
             "Bad packed " + valueType + " length=" + byteLength + " expected=" + expectedByteLength + " (dim=" + dimension + ")"
         );
     }
     return dimension;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion adds a clearer error message for the edge case where dimension > 0 but byteLength == 0. While this improves error messaging slightly, the existing validation already catches this case with "Bad packed ... length=0 expected=X", so the improvement is marginal.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 6b4cc7d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request missing-component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] gRPC indexing: add support for ExtraFieldsValues

1 participant