fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path by mobilebarn · Pull Request #1068 · volcengine/OpenViking

mobilebarn · 2026-03-29T04:06:02Z

Problem

Under single-worker uvicorn, the OpenViking server becomes unresponsive (TCP accepts, HTTP never responds) within 10-40 minutes of normal operation. This happens when auto-recall search and auto-capture commit operations overlap.

Root Cause

Several synchronous blocking calls are made from inside async def handlers:

embedder.embed() in hierarchical_retriever.py — synchronous HTTP call to OpenAI embedding API
_adapter.query() in viking_vector_index_backend.py — synchronous storage query
rerank_batch() in hierarchical_retriever.py — synchronous HTTP call via requests.request()
agfs.stat/read in viking_fs.py — synchronous file I/O in abstract(), overview(), _read_relation_table()

Each call blocks the event loop for 100-500ms+. Under concurrent load, the health endpoint never gets a timeslot and the server appears hung.

Fix

Wrap all sync blocking calls in asyncio.to_thread() so they run in the default thread pool executor without blocking the event loop.

Testing

Server previously hung within 10-40 minutes under normal auto-recall + auto-capture load
With patches applied, server remains responsive under sustained load
Diagnostic identified by SENTINEL agent (Paperclip QA team) via systematic code-path audit

Files Changed

openviking/retrieve/hierarchical_retriever.py — embed + rerank → to_thread
openviking/storage/viking_vector_index_backend.py — query → to_thread
openviking/storage/viking_fs.py — agfs.stat/read → to_thread

…/recall path Under single-worker uvicorn, synchronous blocking calls in async handlers starve the event loop and cause the server to become unresponsive (TCP accepts but HTTP never responds). Changes: - retrieve/hierarchical_retriever.py: Wrap embedder.embed() and rerank_batch() in asyncio.to_thread(); convert _rerank_scores to async - storage/viking_vector_index_backend.py: Wrap _adapter.query() in asyncio.to_thread() - storage/viking_fs.py: Wrap agfs.stat/read calls in abstract(), overview(), and _read_relation_table() with asyncio.to_thread() These calls make synchronous HTTP requests (OpenAI embedding API), file I/O (AGFS), and database queries that block the event loop for 100-500ms+ per call. Under concurrent auto-recall + auto-capture load, this reliably deadlocks the server within 10-40 minutes. Tested: Server remains responsive under sustained auto-recall load with these patches applied (previously hung within 10-40 minutes). Co-Authored-By: Paperclip <noreply@paperclip.ing>

github-actions · 2026-03-29T04:06:47Z

Failed to generate code suggestions for PR

github-project-automation bot added this to OpenViking project Mar 29, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 29, 2026

plhys mentioned this pull request Mar 29, 2026

[Bug] RocksDB lock contention: multiple _SingleAccountBackend instances open same DB path (local backend) #1072

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path#1068

fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path#1068
mobilebarn wants to merge 1 commit intovolcengine:mainfrom
mobilebarn:fix/async-blocking-in-search-path

mobilebarn commented Mar 29, 2026

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mobilebarn commented Mar 29, 2026

Problem

Root Cause

Fix

Testing

Files Changed

Uh oh!

github-actions bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant