fix(milvus): raise gRPC channel-ready timeout to 60 s#443
Conversation
📝 WalkthroughWalkthroughMilvusVectorStore.init sets self._timeout = 60 and supplies it as the ChangesMilvus Client Timeout Configuration
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
openrag/services/storage/milvus_store.py (1)
147-147: ⚡ Quick winConsider making the timeout configurable via VectorDBConfig.
The timeout is currently hardcoded to 30 seconds. While this value addresses the immediate issue on low-resource machines, making it configurable would provide flexibility for different deployment environments without requiring code changes.
🔧 Proposed refactor to add timeout configuration
Add a timeout field to
VectorDBConfig:# In openrag/core/config/infrastructure.py (or wherever VectorDBConfig is defined) class VectorDBConfig: # ... existing fields ... timeout: int = 30 # gRPC connection timeout in secondsThen use it in the constructor:
def __init__(self, config: VectorDBConfig) -> None: self._config = config self._collection_name = config.collection_name self._hybrid = config.hybrid_search self._uri = f"http://{config.host}:{config.port}" - self._timeout = 30 + self._timeout = config.timeout try: self._client = MilvusClient(uri=self._uri, timeout=self._timeout) self._async_client = AsyncMilvusClient(uri=self._uri, timeout=self._timeout)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openrag/services/storage/milvus_store.py` at line 147, The hardcoded 30s gRPC timeout in MilvusStore should be made configurable: add a timeout: int = 30 field to VectorDBConfig (or the existing config class used to construct MilvusStore) and then read that value in MilvusStore.__init__ to set self._timeout from the passed VectorDBConfig instance (instead of the literal 30). Update any callers/constructors that build VectorDBConfig to preserve default behavior when not provided.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@openrag/services/storage/milvus_store.py`:
- Line 147: The hardcoded 30s gRPC timeout in MilvusStore should be made
configurable: add a timeout: int = 30 field to VectorDBConfig (or the existing
config class used to construct MilvusStore) and then read that value in
MilvusStore.__init__ to set self._timeout from the passed VectorDBConfig
instance (instead of the literal 30). Update any callers/constructors that build
VectorDBConfig to preserve default behavior when not provided.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4d139065-8d4f-4f8d-906f-7ce9a2a92471
📒 Files selected for processing (1)
openrag/services/storage/milvus_store.py
034ba93 to
5676d8a
Compare
On low-resource machines the Milvus 2.6 gRPC handshake can exceed the default 10-second channel-ready wait, causing pymilvus to throw MilvusException (code=2, "Fail connecting to server … illegal connection params or server unavailable"). The underlying TCP port is open and the HTTP health endpoint responds OK — the channel just needs more time to complete the gRPC negotiation. - Pass `timeout=60` to both `MilvusClient` and `AsyncMilvusClient` at construction time in `MilvusVectorStore.__init__`. - Store the value in `self._timeout` so any future callers have a single source of truth. Root-cause confirmed by reproducing the error with the default timeout and a successful connect with timeout=60 against the same Milvus 2.6.11 container.
5676d8a to
3b57447
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@openrag/services/storage/milvus_store.py`:
- Around line 147-150: The timeout is hard-coded to 60 but must match the PR
acceptance criteria of 30; update the initialization so self._timeout is set to
30 and ensure that same value is passed to both MilvusClient(...) and
AsyncMilvusClient(...), i.e., change self._timeout = 60 to self._timeout = 30 so
the constructors for MilvusClient and AsyncMilvusClient use the validated
30-second timeout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 071a5744-7ffd-4aaa-a972-c17507cf3770
📒 Files selected for processing (1)
openrag/services/storage/milvus_store.py
|
These modifs are directly integrated in this PR #444 |
Problem
On low-resource machines the Milvus 2.6 gRPC channel handshake takes longer than pymilvus's default 10-second `channel_ready_future` timeout, producing:
```
VDBInsertError: Milvus insert failed: <MilvusException: (code=2, message=Fail connecting to
server on milvus:19530, illegal connection params or server unavailable)>
```
The error is misleading — DNS resolves correctly, TCP port 19530 is open, and the HTTP health endpoint (port 9091) responds OK. The gRPC handshake simply needs more time under resource pressure. Root-cause confirmed: fails with default 10 s timeout, connects cleanly with `timeout=60` against the same `milvus:2.6.11` container.
Fix
Test plan
🤖 Generated with Claude Code