Skip to content

Use bulk scoring more places for HNSW graphs #15606

@benwtrent

Description

@benwtrent

Description

While we have added bulk scoring, which can provide a substantial performance boost, we still don't use it everywhere. I happened to notice its missing from filter search & higher level searches (fixed for 10.4), but there are some others.

Looking at the code, there are various places where I still see RandomVectorScorer.score that might benefit from the bulk scorer API.

  • HnswGraphBuilder#diversityCheck (I am looking at this one now)
  • NeighborArray#isWorstNonDiverse

Now, there are also places that use VectorScorer.score() that could benefit from bulk scoring

  • DiversifyingChildrenVectorScorer#nextParent (bulk score the children?)
  • VectorSimilarityScorerSupplier should maybe satisfy the BulkScorer interface and delegate correctly?
  • FullPrecisionFloatVectorSimilarityValuesSource satisfies the DoubleValues interface, which...doesn't have any bulk interfaces :(. But might benefit from them.
  • Same for VectorSimilarityValuesSource

Places that use VectorUtil directly, that might be harder to refactor but could benefit from bulk scoring:

  • KMeans
  • BpVectorReorderer

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions