Skip to content

fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path#1068

Open
mobilebarn wants to merge 1 commit intovolcengine:mainfrom
mobilebarn:fix/async-blocking-in-search-path
Open

fix(server): wrap sync blocking calls in asyncio.to_thread for search/recall path#1068
mobilebarn wants to merge 1 commit intovolcengine:mainfrom
mobilebarn:fix/async-blocking-in-search-path

Conversation

@mobilebarn
Copy link
Copy Markdown

Problem

Under single-worker uvicorn, the OpenViking server becomes unresponsive (TCP accepts, HTTP never responds) within 10-40 minutes of normal operation. This happens when auto-recall search and auto-capture commit operations overlap.

Root Cause

Several synchronous blocking calls are made from inside async def handlers:

  1. embedder.embed() in hierarchical_retriever.py — synchronous HTTP call to OpenAI embedding API
  2. _adapter.query() in viking_vector_index_backend.py — synchronous storage query
  3. rerank_batch() in hierarchical_retriever.py — synchronous HTTP call via requests.request()
  4. agfs.stat/read in viking_fs.py — synchronous file I/O in abstract(), overview(), _read_relation_table()

Each call blocks the event loop for 100-500ms+. Under concurrent load, the health endpoint never gets a timeslot and the server appears hung.

Fix

Wrap all sync blocking calls in asyncio.to_thread() so they run in the default thread pool executor without blocking the event loop.

Testing

  • Server previously hung within 10-40 minutes under normal auto-recall + auto-capture load
  • With patches applied, server remains responsive under sustained load
  • Diagnostic identified by SENTINEL agent (Paperclip QA team) via systematic code-path audit

Files Changed

  • openviking/retrieve/hierarchical_retriever.py — embed + rerank → to_thread
  • openviking/storage/viking_vector_index_backend.py — query → to_thread
  • openviking/storage/viking_fs.py — agfs.stat/read → to_thread

…/recall path

Under single-worker uvicorn, synchronous blocking calls in async handlers
starve the event loop and cause the server to become unresponsive (TCP
accepts but HTTP never responds).

Changes:
- retrieve/hierarchical_retriever.py: Wrap embedder.embed() and
  rerank_batch() in asyncio.to_thread(); convert _rerank_scores to async
- storage/viking_vector_index_backend.py: Wrap _adapter.query() in
  asyncio.to_thread()
- storage/viking_fs.py: Wrap agfs.stat/read calls in abstract(),
  overview(), and _read_relation_table() with asyncio.to_thread()

These calls make synchronous HTTP requests (OpenAI embedding API),
file I/O (AGFS), and database queries that block the event loop for
100-500ms+ per call. Under concurrent auto-recall + auto-capture load,
this reliably deadlocks the server within 10-40 minutes.

Tested: Server remains responsive under sustained auto-recall load
with these patches applied (previously hung within 10-40 minutes).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant