fix(search): make bitmap scorer repeatable#21815
Conversation
PR Reviewer Guide 🔍(Review updated until commit 3323ef2)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 3323ef2 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit b007896
Suggestions up to commit a1de5ff
Suggestions up to commit e2a1964
Suggestions up to commit 25d07f7
Suggestions up to commit 7d96516
|
There was a problem hiding this comment.
Pull request overview
This PR fixes a correctness bug in bitmap index queries by making ScorerSupplier#get() repeatable: it rebuilds mutable intersection state on each get() call so repeated scorer creation doesn’t consume shared iterator/visitor state and lead to empty results in complex queries.
Changes:
- Rebuild
DocIdSetBuilderandMergePointVisitorinsideScorerSupplier#get()for both 32-bit and 64-bit bitmap index queries. - Add regression tests for both query types that call
ScorerSupplier#get()twice and compare matches.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
server/src/main/java/org/opensearch/search/query/BitmapIndexQuery.java |
Moves mutable intersection state into ScorerSupplier#get() to ensure repeatable scorer creation. |
server/src/main/java/org/opensearch/search/query/Bitmap64IndexQuery.java |
Same repeatability fix as 32-bit variant for 64-bit bitmap queries. |
server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java |
Adds a repeatability regression test and helper to extract matches from a Scorer. |
server/src/test/java/org/opensearch/search/query/Bitmap64IndexQueryTests.java |
Adds a repeatability regression test and helper to extract matches from a Scorer. |
Comments suppressed due to low confidence (1)
server/src/test/java/org/opensearch/search/query/BitmapIndexQueryTests.java:266
- The repeatability check only inspects
reader.leaves().get(0). With randomized IndexWriter settings it’s possible for leaf 0 to contain none of the matching docs, in which case bothget()calls return an empty list and the regression test would pass even if the scorer is broken. Consider aggregating matches across all leaves (or asserting the first scorer returns the expected non-empty matches) before comparing the second scorer’s results.
LeafReaderContext leaf = reader.leaves().get(0);
ScorerSupplier supplier = weight.scorerSupplier(leaf);
assertNotNull(supplier);
Scorer scorer1 = supplier.get(Long.MAX_VALUE);
Scorer scorer2 = supplier.get(Long.MAX_VALUE);
assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf));
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Persistent review updated to latest commit ac01cdf |
|
Persistent review updated to latest commit 91c3d85 |
| w.commit(); | ||
| reader.close(); | ||
| reader = DirectoryReader.open(w); | ||
| searcher = newSearcher(reader); |
|
|
||
| LeafReaderContext leaf = reader.leaves().get(0); | ||
| ScorerSupplier supplier = weight.scorerSupplier(leaf); | ||
| assertNotNull(supplier); | ||
|
|
||
| Scorer scorer1 = supplier.get(Long.MAX_VALUE); | ||
| Scorer scorer2 = supplier.get(Long.MAX_VALUE); | ||
|
|
||
| assertEquals(getMatchingValues(scorer1, leaf), getMatchingValues(scorer2, leaf)); |
Use supplier.cost() as the leadCost when invoking ScorerSupplier#get in bitmap repeatability tests. Passing Long.MAX_VALUE violates Lucene's asserting scorer supplier checks and causes assertion failures unrelated to scorer repeatability behavior. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…per get() call Lucene's AssertingWeight (used by newSearcher() in tests) wraps ScorerSupplier and asserts get() is called at most once per instance. Calling get() twice on the same supplier triggers AssertionError. Fix both BitmapIndexQueryTests and Bitmap64IndexQueryTests to obtain a fresh ScorerSupplier from the weight for each scorer retrieval, which is also the semantically correct way to test that the weight produces repeatable results across independent scorer requests. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
DocIdSetBuilder.build() never returns null — it returns DocIdSet.EMPTY when no documents are added. Remove the dead null check and inline the iterator() call to eliminate the uncovered branch. Add testScoreNoMatchingDocs to both BitmapIndexQueryTests and Bitmap64IndexQueryTests to cover the iterator-is-null path (reached when the field has point values but no documents match the bitmap). This brings patch coverage above the 73.33% threshold required by Codecov. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…tion In Lucene 10.4.0, DocIdSetBuilder.build() never returns DocIdSet.EMPTY; when no documents match it returns an IntArrayDocIdSet with length 0 whose iterator() is non-null (yields NO_MORE_DOCS immediately). The 'if (iterator == null)' guard added in the previous commit was therefore dead code and the assertNull(weight.scorer(leaf)) assertion in testScoreNoMatchingDocs was wrong — the scorer is a valid empty ConstantScoreScorer, not null. Simplify get() back to the single-line return and keep only the meaningful assertion (result list is empty) in testScoreNoMatchingDocs. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…4IndexQueryTests Add testRewrite to verify empty bitmap rewrites to MatchNoDocsQuery, mirroring the existing test in BitmapIndexQueryTests. Also initialize searcher in @before for symmetry with BitmapIndexQueryTests. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
5597c11 to
1f3fb32
Compare
|
Persistent review updated to latest commit 1f3fb32 |
1f3fb32 to
7850d8e
Compare
|
Persistent review updated to latest commit a1de5ff |
…s helper SortedNumericDocValues.advanceExact() returns false when the document has no doc values for the field; calling docValueCount()/nextValue() on a miss is an API violation. Guard the loop body with the return value, consistent with the Bitmap64IndexQueryTests helper. Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
|
Persistent review updated to latest commit b007896 |
|
❌ Gradle check result for b007896: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit cc8cfea |
|
Persistent review updated to latest commit 889c5a7 |
|
Persistent review updated to latest commit db0efbd |
|
❌ Gradle check result for db0efbd: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Persistent review updated to latest commit 3323ef2 |
Description
BitmapIndexQueryandBitmap64IndexQueryhad theirDocIdSetBuilderandMergePointVisitordeclared as fields of theScorerSupplieranonymous class, causing all calls toScorerSupplier.get()to share mutable state. This violates the Lucene contract that eachget()call must produce an independent scorer, and caused incorrect empty results in complex bool queries using bitmap terms lookup on multi-segment indices.Root causes fixed:
get()—DocIdSetBuilderandMergePointVisitorwere constructed once and reused across multipleget()calls. Moved both into the body ofget()so each invocation builds a fresh, independentDocIdSet.BytesRefaliasing inbitmapEncodedIterator.next()— the returnedBytesRefshared the same backing array as the internal buffer, so subsequent iteration silently overwrote previously returned values. Fixed by returningBytesRef.deepCopyOf(encoded).hasBufferedflag inBitmap64IndexQuery.bitmapEncodedIterator.advance()— when the iterator was exhausted without finding a value>= target,hasBufferedwas left in whatever state it had before, allowing a stale buffered value to surface on the nextnext()call. Fixed by explicitly settinghasBuffered = falsewhen the loop exits without a match.Testing
Added regression tests to both
BitmapIndexQueryTestsandBitmap64IndexQueryTests:testScorerSupplierGetIsRepeatable— obtains two independentScorerSupplierinstances per leaf, callsget()on each, and asserts both return identical match sets.testScoreNoMatchingDocs— verifies correct empty-result handling when the bitmap contains no values present in the index.testRewrite— verifies that an empty bitmap rewrites toMatchNoDocsQuery.Related Issues
Resolves #21781
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.