Skip to content

feat(proxy): versioned /v2/embedding route + server v2 client#232

Merged
samcm merged 2 commits into
masterfrom
worktree-proxy-v2-embedding
Jun 18, 2026
Merged

feat(proxy): versioned /v2/embedding route + server v2 client#232
samcm merged 2 commits into
masterfrom
worktree-proxy-v2-embedding

Conversation

@samcm

@samcm samcm commented Jun 18, 2026

Copy link
Copy Markdown
Member

Adds an additive, versioned /v2/embedding (+ /v2/embedding/check) proxy route that serves a config-selected embedding model (embedding_v2: block) at a fixed dimensionality, returns fp32 vectors L2-normalized, and advertises the model in every response; the server search runtime prefers it (probed at startup) and falls back to the legacy /embed routes when the proxy lacks v2. This is the safe cutover mechanism for changing the embedding model (e.g. text-embedding-3-large → gemini-embedding-2 @ 1536): a separate route + cache namespace ({model}:{dims}:{hash}, v1 keys unchanged) lets the new model roll out to new clients while v1 clients keep using the old model untouched — swapping the model in place would silently roll running clients (new-model queries vs old-model corpus). Re-index-on-model-change and on-disk cache orphan-purge are deferred as scoped follow-ups.

samcm added 2 commits June 18, 2026 11:43
Adds /v2/embedding (+ /v2/embedding/check) to the proxy: serves a
config-selected embedding model at a fixed dimensionality, returns fp32
vectors, and advertises the model in every response. Additive — /embed
(v1) and its cache format are untouched, so existing clients are
unaffected. The v2 cache key folds in dimensions ({model}:{dims}:{hash});
v1 keys are unchanged.

The server search runtime prefers /v2/embedding (probed at startup) and
falls back to the legacy /embed routes when the proxy lacks v2, so a new
server still works against an older or self-hosted proxy.

This is the cutover mechanism for switching the embedding model (e.g.
text-embedding-3-large -> gemini-embedding-2 @ 1536): point embedding_v2
at the new model. A separate route + cache namespace means v1 clients
keep using the old model and are never rolled.
…gModel

A proxy configured with only embedding_v2 (no v1 embedding) advertised
embedding_available=false, so the search runtime's guard tripped before
the v2 probe ran and silently disabled all search. Both accessors now
consider embeddingServiceV2: available when either service exists; model
prefers v1 (what /embed serves) and falls through to the v2 model so a
v2-only proxy advertises a non-empty model.
@samcm

samcm commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Fixed in 7ad275f: EmbeddingAvailable() now returns true when either the v1 or v2 service is configured, and EmbeddingModel() prefers the v1 model (what /embed serves) but falls through to the v2 model so a embedding_v2-only proxy advertises a non-empty model and the runtime guard passes before the v2 probe runs. Good catch — a v2-only proxy would otherwise have had a working /v2/embedding endpoint but zero search.

@samcm

samcm commented Jun 18, 2026

Copy link
Copy Markdown
Member Author

Lfg @qu0b

@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

🐼 Smoke eval — 7ad275f: ✅ 6/6 pass

📊 Interactive report — tokens p50 14,252 · tokens/solve 14,456.

Reference points: v0.34.0 100% · master@10a72f3 0%.

question result tokens tools
forky_node_coverage 12,513 3
tracoor_node_coverage 14,146 4
mainnet_block_arrival_p50 14,359 9
list_datasources 11,468 1
block_count_24h 18,866 10
missed_slots_24h 15,385 6
🔭 Langfuse traces (6 runs; ⚠️ = failed)

The report walks this branch's commits against the master baseline and the most recent release. A self-contained copy is in the run's eval-smoke-* artifact.

@samcm samcm merged commit 1671f0b into master Jun 18, 2026
12 of 14 checks passed
samcm added a commit that referenced this pull request Jun 18, 2026
* feat(search): re-index on proxy embedding model change

A background poll now detects when the proxy's served embedding model
changes and re-indexes the whole corpus under the new model: it parks
every index not-ready first (via atomic-swap wrappers) so no search
dot-products a new-model query against an old-model index, rebuilds from
the retained registries, then swaps the fresh indices in. Delivers the
'update the model in panda-proxy and clients self-heal' guarantee — a
swap would otherwise silently mix embedding spaces on running servers
until restart. Follow-up to the v2 embedding cutover (#232).

* fix(search): retry reindex on partial rebuild failure

reindex advanced builtModel unconditionally, so if any single index
rebuild failed (e.g. a transient proxy timeout) that index stayed parked
not-ready forever — the model-change guard was satisfied, so the next
tick never re-entered reindex. Now builtModel only advances on full
success; a partial failure leaves the guard unsatisfied so the next tick
retries, and the failed index stays not-ready (never mixed) until it
rebuilds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant