feat(proxy): versioned /v2/embedding route + server v2 client#232
Merged
Conversation
Adds /v2/embedding (+ /v2/embedding/check) to the proxy: serves a
config-selected embedding model at a fixed dimensionality, returns fp32
vectors, and advertises the model in every response. Additive — /embed
(v1) and its cache format are untouched, so existing clients are
unaffected. The v2 cache key folds in dimensions ({model}:{dims}:{hash});
v1 keys are unchanged.
The server search runtime prefers /v2/embedding (probed at startup) and
falls back to the legacy /embed routes when the proxy lacks v2, so a new
server still works against an older or self-hosted proxy.
This is the cutover mechanism for switching the embedding model (e.g.
text-embedding-3-large -> gemini-embedding-2 @ 1536): point embedding_v2
at the new model. A separate route + cache namespace means v1 clients
keep using the old model and are never rolled.
…gModel A proxy configured with only embedding_v2 (no v1 embedding) advertised embedding_available=false, so the search runtime's guard tripped before the v2 probe ran and silently disabled all search. Both accessors now consider embeddingServiceV2: available when either service exists; model prefers v1 (what /embed serves) and falls through to the v2 model so a v2-only proxy advertises a non-empty model.
Member
Author
|
Fixed in 7ad275f: |
Member
Author
|
Lfg @qu0b |
Contributor
🐼 Smoke eval —
|
| question | result | tokens | tools |
|---|---|---|---|
forky_node_coverage |
✅ | 12,513 | 3 |
tracoor_node_coverage |
✅ | 14,146 | 4 |
mainnet_block_arrival_p50 |
✅ | 14,359 | 9 |
list_datasources |
✅ | 11,468 | 1 |
block_count_24h |
✅ | 18,866 | 10 |
missed_slots_24h |
✅ | 15,385 | 6 |
🔭 Langfuse traces (6 runs; ⚠️ = failed)
The report walks this branch's commits against the master baseline and the most recent release. A self-contained copy is in the run's eval-smoke-* artifact.
samcm
added a commit
that referenced
this pull request
Jun 18, 2026
* feat(search): re-index on proxy embedding model change A background poll now detects when the proxy's served embedding model changes and re-indexes the whole corpus under the new model: it parks every index not-ready first (via atomic-swap wrappers) so no search dot-products a new-model query against an old-model index, rebuilds from the retained registries, then swaps the fresh indices in. Delivers the 'update the model in panda-proxy and clients self-heal' guarantee — a swap would otherwise silently mix embedding spaces on running servers until restart. Follow-up to the v2 embedding cutover (#232). * fix(search): retry reindex on partial rebuild failure reindex advanced builtModel unconditionally, so if any single index rebuild failed (e.g. a transient proxy timeout) that index stayed parked not-ready forever — the model-change guard was satisfied, so the next tick never re-entered reindex. Now builtModel only advances on full success; a partial failure leaves the guard unsatisfied so the next tick retries, and the failed index stays not-ready (never mixed) until it rebuilds.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an additive, versioned
/v2/embedding(+/v2/embedding/check) proxy route that serves a config-selected embedding model (embedding_v2:block) at a fixed dimensionality, returns fp32 vectors L2-normalized, and advertises the model in every response; the server search runtime prefers it (probed at startup) and falls back to the legacy/embedroutes when the proxy lacks v2. This is the safe cutover mechanism for changing the embedding model (e.g. text-embedding-3-large → gemini-embedding-2 @ 1536): a separate route + cache namespace ({model}:{dims}:{hash}, v1 keys unchanged) lets the new model roll out to new clients while v1 clients keep using the old model untouched — swapping the model in place would silently roll running clients (new-model queries vs old-model corpus). Re-index-on-model-change and on-disk cache orphan-purge are deferred as scoped follow-ups.