Skip to content

Bulk score hnsw neighbor array#15958

Open
leng25 wants to merge 9 commits into
apache:mainfrom
leng25:bulk-score-hnsw-neighbor-array
Open

Bulk score hnsw neighbor array#15958
leng25 wants to merge 9 commits into
apache:mainfrom
leng25:bulk-score-hnsw-neighbor-array

Conversation

@leng25
Copy link
Copy Markdown
Contributor

@leng25 leng25 commented Apr 14, 2026

issue #15606

Refactor NeighborArray#isWorstNonDiverse to use bulkScore instead of score, enabling multiple nodes to be evaluated at a time when searching for the worst neighbor to remove.

This change follows the approach taken in #15607, though unlike that case, this function is called less frequently so the direct performance gains are modest and within margin of error.

The primary motivation is consistency, aligning both call sites to use bulkScore ensures this code automatically benefits from any future optimizations made to implementations of that function.

Benchmarks run using luceneutil knnPerfTest
on a ThinkPad T14 Gen 1 (Intel i5-10210U, 8 logical cores, 16GB RAM, Linux).
Dataset: Cohere v3 Wikipedia 1024d, 50K docs, dot_product metric,
maxConn=64, beamWidthIndex=250, quantizeBits=8.

Version recall index(s) index_docs/s force_merge(s) latency(ms) QPS
baseline 0.985 74.35 672.54 0.02 8.233 121
candidate 0.985 73.83 677.26 0.02 8.282 120

@leng25
Copy link
Copy Markdown
Contributor Author

leng25 commented May 9, 2026

Hi @benwtrent , would appreciate a look when you have time. Happy to make any changes if needed. Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions Bot added the Stale label May 24, 2026
return true;
}
}
return scorer.bulkScore(nodes, bulkScores, candidateIndex) >= minAcceptedSimilarity;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we only want to score uncheckedIndexes right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @benwtrent , thanks for the review!
Not exactly, it depends on whether the candidate itself is checked or unchecked.

Case 1 (candidate is unchecked): brand new node, never been diversity-checked against anyone, so we score against all better neighbors (0..candidateIndex-1) not just unchecked ones.
Case 2 (candidate is checked): already survived diversity checks against all checked nodes, so we only score against the newly added unchecked nodes.

This is the same logic as before my change, I only replaced the per-node scorer.score() loop with bulkScore.

That being said, looking at it again I made two small improvements: renamed bulkScoreNodes to uncheckedNodes to make it clearer what it actually contains, and moved the population of uncheckedNodes up into findWorstNonDiverse so it's built once and reused across calls to isWorstNonDiverse rather than being recreated every time.

@github-actions github-actions Bot removed the Stale label May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants