Skip to content

fix(search): make bitmap scorer repeatable#21815

Open
gingeekrishna wants to merge 27 commits into
opensearch-project:mainfrom
gingeekrishna:fix/21781-bitmap-terms-lookup
Open

fix(search): make bitmap scorer repeatable#21815
gingeekrishna wants to merge 27 commits into
opensearch-project:mainfrom
gingeekrishna:fix/21781-bitmap-terms-lookup

Conversation

@gingeekrishna
Copy link
Copy Markdown

@gingeekrishna gingeekrishna commented May 23, 2026

Description

BitmapIndexQuery and Bitmap64IndexQuery had their DocIdSetBuilder and MergePointVisitor declared as fields of the ScorerSupplier anonymous class, causing all calls to ScorerSupplier.get() to share mutable state. This violates the Lucene contract that each get() call must produce an independent scorer, and caused incorrect empty results in complex bool queries using bitmap terms lookup on multi-segment indices.

Root causes fixed:

  • Shared mutable state in get()DocIdSetBuilder and MergePointVisitor were constructed once and reused across multiple get() calls. Moved both into the body of get() so each invocation builds a fresh, independent DocIdSet.
  • BytesRef aliasing in bitmapEncodedIterator.next() — the returned BytesRef shared the same backing array as the internal buffer, so subsequent iteration silently overwrote previously returned values. Fixed by returning BytesRef.deepCopyOf(encoded).
  • Stale hasBuffered flag in Bitmap64IndexQuery.bitmapEncodedIterator.advance() — when the iterator was exhausted without finding a value >= target, hasBuffered was left in whatever state it had before, allowing a stale buffered value to surface on the next next() call. Fixed by explicitly setting hasBuffered = false when the loop exits without a match.

Testing

Added regression tests to both BitmapIndexQueryTests and Bitmap64IndexQueryTests:

  • testScorerSupplierGetIsRepeatable — obtains two independent ScorerSupplier instances per leaf, calls get() on each, and asserts both return identical match sets.
  • testScoreNoMatchingDocs — verifies correct empty-result handling when the bitmap contains no values present in the index.
  • testRewrite — verifies that an empty bitmap rewrites to MatchNoDocsQuery.

Related Issues

Resolves #21781

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable. Not applicable — no public API contract change.
  • Public documentation issue/PR created, if applicable. Not applicable — internal query execution correctness fix.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

PR Reviewer Guide 🔍

(Review updated until commit 3323ef2)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

PR Code Suggestions ✨

Latest suggestions up to 3323ef2

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure buffered value is copied

The advance method should also copy the matched value to encoded.bytes when a match
is found, similar to how next() uses currentBytes. This ensures consistency when the
buffered value is later returned by next().

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
+            System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
             return;
         }
     }
     hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 9

__

Why: The suggestion identifies a critical bug where advance() sets hasBuffered = true but doesn't copy the matched value to encoded.bytes. When next() is called with hasBuffered = true, it copies from currentBytes to encoded.bytes (line 77), but currentBytes may have been modified by subsequent operations. The suggested fix ensures the matched value is preserved correctly by copying it immediately when found.

High

Previous suggestions

Suggestions up to commit b007896
CategorySuggestion                                                                                                                                    Impact
Possible issue
Reset state at method start

The advance method should initialize hasBuffered to false at the start to ensure
correct state when called multiple times. Without this, if advance is called after a
previous successful match, hasBuffered may remain true even when the iterator is
exhausted, leading to incorrect behavior.

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
+    hasBuffered = false;
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
             return;
         }
     }
-    hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential state management issue. Setting hasBuffered = false at the start of advance ensures the method is idempotent and prevents stale state from previous invocations. This is a critical correctness fix that prevents incorrect behavior when advance is called multiple times.

Medium
Suggestions up to commit a1de5ff
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to encoded buffer

The advance method should also copy the matching value to encoded.bytes when a match
is found, similar to how next() uses System.arraycopy. Without this, the buffered
value in currentBytes won't be properly transferred when next() is called after
advance().

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
+            System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
             return;
         }
     }
     hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance method finds a matching value and sets hasBuffered = true, but doesn't copy currentBytes to encoded.bytes. When next() is subsequently called, it uses System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES) to transfer the buffered value, but currentBytes may have been overwritten by subsequent operations. The suggestion correctly identifies that the copy should happen in advance() when the match is found, ensuring data integrity.

High
Suggestions up to commit e2a1964
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to encoded buffer

The advance method should also copy the matched value to encoded.bytes when a match
is found, similar to how next() uses System.arraycopy. Without this, the buffered
value in currentBytes won't be properly transferred when next() is called after
advance().

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
+            System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
             return;
         }
     }
     hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance method finds a matching value and sets hasBuffered = true, but doesn't copy the value from currentBytes to encoded.bytes. When next() is subsequently called, it uses System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES) to transfer the buffered value, but currentBytes may have been overwritten by then, leading to incorrect results.

High
Suggestions up to commit 25d07f7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to encoded buffer

The advance method should also copy the matched value to encoded.bytes when a match
is found, similar to how next() uses System.arraycopy. Without this, the buffered
value in currentBytes won't be properly transferred when next() is called after
advance().

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [89-100]

 public void advance(byte[] target) {
     while (it.hasNext()) {
         long v = it.next();
         LongPoint.encodeDimension(v, currentBytes, 0);
 
         if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
             hasBuffered = true;
+            System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
             return;
         }
     }
     hasBuffered = false; // Iterator exhausted, no match found
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance method sets hasBuffered = true but doesn't copy currentBytes to encoded.bytes. When next() is subsequently called, it uses System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES) expecting the buffered value to be in currentBytes, but currentBytes may have been overwritten. The suggestion correctly identifies that the copy should happen in advance when a match is found, ensuring data integrity.

High
Suggestions up to commit 7d96516
CategorySuggestion                                                                                                                                    Impact
Possible issue
Copy matched value to buffer

The advance() method should copy currentBytes to encoded.bytes when a match is
found, ensuring the buffered value is properly stored for the next next() call.
Without this copy, next() may return stale data from encoded.bytes.

server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java [95-98]

 if (Arrays.compareUnsigned(currentBytes, target) >= 0) {
+    System.arraycopy(currentBytes, 0, encoded.bytes, 0, Long.BYTES);
     hasBuffered = true;
     return;
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug fix. The advance() method sets hasBuffered = true but doesn't copy currentBytes to encoded.bytes. When next() is called, it copies from encoded.bytes which contains stale data, not the buffered value. This would cause incorrect query results.

High

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a correctness bug in bitmap index queries by making ScorerSupplier#get() repeatable: it rebuilds mutable intersection state on each get() call so repeated scorer creation doesn’t consume shared iterator/visitor state and lead to empty results in complex queries.

Changes:

  • Rebuild DocIdSetBuilder and MergePointVisitor inside ScorerSupplier#get() for both 32-bit and 64-bit bitmap index queries.
  • Add regression tests for both query types that call ScorerSupplier#get() twice and compare matches.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
server/src/main/java/org/opensearch/search/query/BitmapIndexQuery.java Moves mutable intersection state into ScorerSupplier#get() to ensure repeatable scorer creation.
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java Same repeatability fix as 32-bit variant for 64-bit bitmap queries.
server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java Adds a repeatability regression test and helper to extract matches from a Scorer.
server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java Adds a repeatability regression test and helper to extract matches from a Scorer.
Comments suppressed due to low confidence (1)

server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java:266

  • The repeatability check only inspects reader.leaves().get(0). With randomized IndexWriter settings it’s possible for leaf 0 to contain none of the matching docs, in which case both get() calls return an empty list and the regression test would pass even if the scorer is broken. Consider aggregating matches across all leaves (or asserting the first scorer returns the expected non-empty matches) before comparing the second scorer’s results.
        LeafReaderContext leaf = reader.leaves().get(0);
        ScorerSupplier supplier = weight.scorerSupplier(leaf);
        assertNotNull(supplier);

        Scorer scorer1 = supplier.get(Long.MAX_VALUE);
        Scorer scorer2 = supplier.get(Long.MAX_VALUE);

        assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java Outdated
Comment thread server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ac01cdf

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 91c3d85

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment on lines 83 to 86
w.commit();
reader.close();
reader = DirectoryReader.open(w);
searcher = newSearcher(reader);
Comment on lines +261 to +269

LeafReaderContext leaf = reader.leaves().get(0);
ScorerSupplier supplier = weight.scorerSupplier(leaf);
assertNotNull(supplier);

Scorer scorer1 = supplier.get(Long.MAX_VALUE);
Scorer scorer2 = supplier.get(Long.MAX_VALUE);

assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf));
Comment thread server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java Outdated
Use supplier.cost() as the leadCost when invoking ScorerSupplier#get in
bitmap repeatability tests. Passing Long.MAX_VALUE violates Lucene's
asserting scorer supplier checks and causes assertion failures unrelated
to scorer repeatability behavior.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…per get() call

Lucene's AssertingWeight (used by newSearcher() in tests) wraps
ScorerSupplier and asserts get() is called at most once per instance.
Calling get() twice on the same supplier triggers AssertionError.

Fix both BitmapIndexQueryTests and Bitmap64IndexQueryTests to obtain
a fresh ScorerSupplier from the weight for each scorer retrieval,
which is also the semantically correct way to test that the weight
produces repeatable results across independent scorer requests.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
DocIdSetBuilder.build() never returns null — it returns DocIdSet.EMPTY
when no documents are added. Remove the dead null check and inline
the iterator() call to eliminate the uncovered branch.

Add testScoreNoMatchingDocs to both BitmapIndexQueryTests and
Bitmap64IndexQueryTests to cover the iterator-is-null path (reached
when the field has point values but no documents match the bitmap).
This brings patch coverage above the 73.33% threshold required by
Codecov.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…tion

In Lucene 10.4.0, DocIdSetBuilder.build() never returns DocIdSet.EMPTY;
when no documents match it returns an IntArrayDocIdSet with length 0
whose iterator() is non-null (yields NO_MORE_DOCS immediately).

The 'if (iterator == null)' guard added in the previous commit was
therefore dead code and the assertNull(weight.scorer(leaf)) assertion
in testScoreNoMatchingDocs was wrong — the scorer is a valid empty
ConstantScoreScorer, not null.

Simplify get() back to the single-line return and keep only the
meaningful assertion (result list is empty) in testScoreNoMatchingDocs.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…4IndexQueryTests

Add testRewrite to verify empty bitmap rewrites to MatchNoDocsQuery,
mirroring the existing test in BitmapIndexQueryTests. Also initialize
searcher in @before for symmetry with BitmapIndexQueryTests.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
@gingeekrishna gingeekrishna force-pushed the fix/21781-bitmap-terms-lookup branch from 5597c11 to 1f3fb32 Compare May 25, 2026 17:06
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1f3fb32

@gingeekrishna gingeekrishna force-pushed the fix/21781-bitmap-terms-lookup branch from 1f3fb32 to 7850d8e Compare May 25, 2026 17:09
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a1de5ff

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for a1de5ff: SUCCESS

…s helper

SortedNumericDocValues.advanceExact() returns false when the document
has no doc values for the field; calling docValueCount()/nextValue()
on a miss is an API violation. Guard the loop body with the return value,
consistent with the Bitmap64IndexQueryTests helper.

Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b007896

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for b007896: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit cc8cfea

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for cc8cfea: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Persistent review updated to latest commit 889c5a7

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

✅ Gradle check result for 889c5a7: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Persistent review updated to latest commit db0efbd

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

❌ Gradle check result for db0efbd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

Persistent review updated to latest commit 3323ef2

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

✅ Gradle check result for 3323ef2: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working missing-component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The combination of bitmap terms lookup + additional filter + text match produces wrong results, and force merge fixes it.

2 participants