Skip to content

feat: add hnsw-rabitq support#69

Open
egolearner wants to merge 56 commits intomainfrom
feat/rabitq
Open

feat: add hnsw-rabitq support#69
egolearner wants to merge 56 commits intomainfrom
feat/rabitq

Conversation

@egolearner
Copy link
Collaborator

@egolearner egolearner commented Feb 4, 2026

resolve #42

Greptile Summary

This PR adds a new HNSW-RaBitQ index type to zvec, integrating scalar quantization (RaBitQ) into the HNSW graph structure to reduce memory usage and accelerate approximate nearest-neighbour search. The change is large (~14,500 lines across 96 files) and touches the full index lifecycle: builder, searcher, streamer, Python bindings, DB integration, and tests.

Key observations:

  • The overall architecture is well-structured: the RaBitQ quantization layer (RabitqConverterRabitqReformer) cleanly separates training/conversion from the HNSW graph code.
  • Previously flagged VLA and search_bf_by_p_keys_impl stub issues have been addressed.
  • cluster->mount() return value is silently discarded in rabitq_converter.cc:173 — if mounting the sampler fails, training continues and produces misleading downstream errors rather than an actionable failure.
  • rand() is used for MemoryDumper file-ID generation in both hnsw_rabitq_builder.cc:293 and rabitq_converter.cc:261. rand() relies on global state and is not thread-safe; while training is currently single-threaded, this is a latent hazard.
  • Dead null-check after std::make_shared in hnsw_rabitq_builder.cc:174–178: std::make_shared throws on failure and never returns null; the guard is misleading.
  • RABITQ_CONVERER_SEG_ID constant name typo in rabitq_utils.h:24 (missing T in CONVERTER). The value string is correct so there is no runtime impact, but the misspelling propagates across many files.
  • Two // TODO: release memory of memory_storage comments remain unresolved in hnsw_rabitq_builder.cc:329 and rabitq_converter.cc:298; these should be tracked or resolved before a stable release.

Confidence Score: 3/5

  • The PR is functional but has a silent error-swallowing bug in training and a handful of cleanup items that should be addressed before production use.
  • The core logic is sound and the previously flagged issues have been handled. However, the silently ignored cluster->mount() return value can mask training failures, and the two unresolved memory-release TODOs represent acknowledged but unaddressed resource management concerns. These lower confidence from a clean merge.
  • src/core/algorithm/hnsw_rabitq/rabitq_converter.cc (ignored mount() return value) and src/core/algorithm/hnsw_rabitq/hnsw_rabitq_builder.cc (TODO memory release, rand() usage).

Important Files Changed

Filename Overview
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_builder.cc Implements the offline build pipeline for the HNSW-RaBitQ index. Contains a dead null-check after std::make_shared and uses the non-thread-safe rand() for MemoryDumper file ID generation. Also carries an unresolved // TODO: release memory of memory_storage at line 329.
src/core/algorithm/hnsw_rabitq/rabitq_converter.cc Trains RaBitQ centroids via KMeans. The return value of cluster->mount(sampler) is silently ignored (line 173), which can cause misleading downstream errors. Also uses rand() (not thread-safe) for MemoryDumper file IDs and carries a // TODO: release memory of memory_storage comment.
src/core/algorithm/hnsw_rabitq/rabitq_reformer.cc Pimpl-based implementation of the RaBitQ reformer. Handles loading centroids and the rotator from storage, building a LinearSeeker for centroid search, and transforming queries into RaBitQ query entities. Logic is sound; no critical issues found.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_query_algorithm.cc Implements HNSW graph traversal during query time using RaBitQ approximate distances. The expand_neighbors_by_group function's inner-loop break on group_num does not terminate the outer while loop (outer loop continues consuming candidates). This could cause extra work but was flagged in previous review threads.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_searcher.cc Implements offline searcher for the HNSW-RaBitQ index. All three search paths (HNSW, brute-force, and p_keys brute-force) are now fully implemented. No new critical issues found.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_algorithm.cc Core HNSW build-time graph insertion logic. Mutex acquisition/release pattern for concurrent level updates is correctly scoped. Previously flagged VLA issues have been addressed using std::vector. No new critical issues.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_streamer.cc Implements the online streaming indexer. Open/close/dump lifecycle, concurrent add-with-id, and group-by search all look structurally correct. Reformer is loaded from storage on first open or written if not present.
src/core/algorithm/hnsw_rabitq/rabitq_utils.h Defines the shared segment ID constant and RabitqConverterHeader struct. The constant name RABITQ_CONVERER_SEG_ID is misspelled (missing 'T') but the string value is correct, so no runtime impact. The static_assert ensuring 32-byte alignment is a good practice.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Builder as HnswRabitqBuilder
    participant Converter as RabitqConverter
    participant Reformer as RabitqReformer
    participant Entity as HnswRabitqEntity
    participant Alg as HnswRabitqAlgorithm

    Caller->>Builder: init(meta, params)
    Builder->>Converter: init(meta, params)
    Builder->>Entity: init()
    Builder->>Alg: init()

    Caller->>Builder: train(holder)
    Builder->>Converter: train(holder) [KMeans clustering]
    Converter-->>Builder: centroids ready
    Builder->>Converter: dump(MemoryDumper)
    Builder->>Reformer: init(params)
    Builder->>Reformer: load(MemoryReadStorage)
    Reformer-->>Builder: reformer ready

    Caller->>Builder: build(holder)
    loop for each vector in holder
        Builder->>Reformer: convert(raw_vec) → quantized_vec
        Builder->>Entity: add_vector(level, key, quantized_vec)
    end
    loop parallel threads
        Builder->>Alg: add_node(id, level, ctx)
        Alg->>Entity: get/update neighbors
    end

    Caller->>Builder: dump(dumper)
    Builder->>Converter: dump(dumper) [centroids]
    Builder->>Entity: dump(dumper) [graph data]

    note over Caller,Alg: Search path (HnswRabitqSearcher)
    Caller->>Reformer: transform_to_entity(query) → QueryEntity
    Caller->>Alg: search(QueryEntity, ctx)
    Alg->>Entity: get_neighbors / get_vector
    Alg-->>Caller: topk results
Loading

Last reviewed commit: ad5a807

Greptile also left 3 inline comments on this PR.

@egolearner
Copy link
Collaborator Author

Depends on VectorDB-NTU/RaBitQ-Library#36

@egolearner egolearner marked this pull request as ready for review February 10, 2026 13:31
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

93 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Copy link
Collaborator

@richyreachy richyreachy Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class can be moved to quantizer folder with rabitq_reformer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rabitq_converter/reformer is specialized for rabitq. It's better to put them inside hnsw_rabitq for now.


void HnswRabitqQueryAlgorithm::expand_neighbors_by_group(
TopkHeap &topk, HnswRabitqContext *ctx) const {
// if (!ctx->group_by().is_valid()) {
Copy link
Collaborator

@richyreachy richyreachy Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to return with some message when group's setup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented group by search, will test it after framework support group by .

continue;
}
} else {
// Candidate cand{ResultRecord(candest.est_dist, candest.low_dist),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check ex bits here? can it be checked earlier?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary to check ex_bits here.

  1. first get_bin_est use single bit to estimate score
  2. ex_bits > 0
    1. yes. check if need to do full estimate with extra bits
    2. no. Nothing to be done since we already used all the information.


// TODO: check loop type

// if ((!topk.full()) || cur_dist < topk[0].second) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lagacy codes here and elsewhere can be cleaned up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

query_wrapper.set_g_add(q_to_centroids[cluster_id]);
float est_dist = rabitqlib::split_distance_boosting(
ex_data, ip_func_, query_wrapper, padded_dim_, ex_bits_, res.ip_x0_qr);
float low_dist = est_dist - (res.est_dist - res.low_dist) / (1 << ex_bits_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can predefine param to convert to multiply manipulation:
double inv_divisor = 1.0 / (1 << ex_bits_);
auto result = (res.est_dist - res.low_dist) * inv_divisor;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_ex_est is not used, deleted.

@egolearner
Copy link
Collaborator Author

@greptile

@egolearner
Copy link
Collaborator Author

@Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Integrate RaBitQ Quantization into HNSW Index

6 participants