Bulk score hnsw neighbor array by leng25 · Pull Request #15958 · apache/lucene

leng25 · 2026-04-14T23:40:03Z

Refactor NeighborArray#isWorstNonDiverse to use bulkScore instead of score, enabling multiple nodes to be evaluated at a time when searching for the worst neighbor to remove.

This change follows the approach taken in #15607, though unlike that case, this function is called less frequently so the direct performance gains are modest and within margin of error.

The primary motivation is consistency, aligning both call sites to use bulkScore ensures this code automatically benefits from any future optimizations made to implementations of that function.

Benchmarks run using luceneutil knnPerfTest
on a ThinkPad T14 Gen 1 (Intel i5-10210U, 8 logical cores, 16GB RAM, Linux).
Dataset: Cohere v3 Wikipedia 1024d, 50K docs, dot_product metric,
maxConn=64, beamWidthIndex=250, quantizeBits=8.

Version	recall	index(s)	index_docs/s	force_merge(s)	latency(ms)	QPS
baseline	0.985	74.35	672.54	0.02	8.233	121
candidate	0.985	73.83	677.26	0.02	8.282	120

…y#isWorstNonDiverse

leng25 · 2026-05-09T01:26:41Z

Hi @benwtrent , would appreciate a look when you have time. Happy to make any changes if needed. Thanks!

github-actions · 2026-05-24T00:58:39Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

benwtrent · 2026-05-26T18:07:15Z

-          return true;
-        }
-      }
+      return scorer.bulkScore(nodes, bulkScores, candidateIndex) >= minAcceptedSimilarity;


It seems like we only want to score uncheckedIndexes right?

Hi @benwtrent , thanks for the review!
Not exactly, it depends on whether the candidate itself is checked or unchecked.

Case 1 (candidate is unchecked): brand new node, never been diversity-checked against anyone, so we score against all better neighbors (0..candidateIndex-1) not just unchecked ones.
Case 2 (candidate is checked): already survived diversity checks against all checked nodes, so we only score against the newly added unchecked nodes.

This is the same logic as before my change, I only replaced the per-node scorer.score() loop with bulkScore.

That being said, looking at it again I made two small improvements: renamed bulkScoreNodes to uncheckedNodes to make it clearer what it actually contains, and moved the population of uncheckedNodes up into findWorstNonDiverse so it's built once and reused across calls to isWorstNonDiverse rather than being recreated every time.

Luis Negrin and others added 3 commits April 13, 2026 16:50

Utilize bulk scoring interface during HNSW graph builder NeighborArra…

72fddd8

…y#isWorstNonDiverse

Merge branch 'apache:main' into bulk-score-hnsw-neighbor-array

9b80a3c

Merge branch 'apache:main' into bulk-score-hnsw-neighbor-array

4dec1ee

github-actions Bot added the module:core/hnsw label Apr 14, 2026

github-actions Bot added this to the 11.0.0 milestone Apr 14, 2026

leng25 mentioned this pull request Apr 14, 2026

Use bulk scoring more places for HNSW graphs #15606

Open

8 tasks

leng25 added 5 commits April 16, 2026 10:24

Merge branch 'main' into bulk-score-hnsw-neighbor-array

cc51f88

Merge branch 'main' into bulk-score-hnsw-neighbor-array

b12cce5

Merge branch 'main' into bulk-score-hnsw-neighbor-array

fc10f97

Merge branch 'main' into bulk-score-hnsw-neighbor-array

f7e3c22

Merge branch 'main' into bulk-score-hnsw-neighbor-array

0d90048

github-actions Bot added the Stale label May 24, 2026

benwtrent reviewed May 26, 2026

View reviewed changes

github-actions Bot removed the Stale label May 27, 2026

Refactor: populate uncheckedNodes upfront in findWorstNonDiverse

e3e5261

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk score hnsw neighbor array#15958

Bulk score hnsw neighbor array#15958
leng25 wants to merge 9 commits into
apache:mainfrom
leng25:bulk-score-hnsw-neighbor-array

leng25 commented Apr 14, 2026 •

edited

Loading

Uh oh!

leng25 commented May 9, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

benwtrent May 26, 2026

Uh oh!

leng25 May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leng25 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leng25 commented May 9, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

benwtrent May 26, 2026

Choose a reason for hiding this comment

Uh oh!

leng25 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

leng25 commented Apr 14, 2026 •

edited

Loading