Skip to content

fix(milvus): raise gRPC channel-ready timeout to 60 s#443

Closed
Ahmath-Gadji wants to merge 1 commit into
refactor/hexagonalfrom
fix/milvus-grpc-connection-timeout
Closed

fix(milvus): raise gRPC channel-ready timeout to 60 s#443
Ahmath-Gadji wants to merge 1 commit into
refactor/hexagonalfrom
fix/milvus-grpc-connection-timeout

Conversation

@Ahmath-Gadji
Copy link
Copy Markdown
Collaborator

@Ahmath-Gadji Ahmath-Gadji commented Jun 3, 2026

Problem

On low-resource machines the Milvus 2.6 gRPC channel handshake takes longer than pymilvus's default 10-second `channel_ready_future` timeout, producing:

```
VDBInsertError: Milvus insert failed: <MilvusException: (code=2, message=Fail connecting to
server on milvus:19530, illegal connection params or server unavailable)>
```

The error is misleading — DNS resolves correctly, TCP port 19530 is open, and the HTTP health endpoint (port 9091) responds OK. The gRPC handshake simply needs more time under resource pressure. Root-cause confirmed: fails with default 10 s timeout, connects cleanly with `timeout=60` against the same `milvus:2.6.11` container.

Fix

  • Pass `timeout=60` to both `MilvusClient` and `AsyncMilvusClient` in `MilvusVectorStore.init`
  • Store the value in `self._timeout` as a single source of truth

Test plan

  • Index a document on a low-resource machine — no `VDBInsertError` on startup or during insert
  • `MilvusVectorStore` constructs successfully with `timeout=60` against a freshly started Milvus 2.6 container

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

MilvusVectorStore.init sets self._timeout = 60 and supplies it as the timeout argument to both MilvusClient and AsyncMilvusClient; no other changes.

Changes

Milvus Client Timeout Configuration

Layer / File(s) Summary
Client initialization with explicit timeout
openrag/services/storage/milvus_store.py
MilvusVectorStore sets self._timeout = 60 and passes this value as timeout to both MilvusClient and AsyncMilvusClient during instantiation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

I nudge the clocks to sixty beats,
Sync and async tap their feets.
Clients sip the seconds fine,
Timeout set, the stars align. 🐇⏱️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: setting a gRPC channel-ready timeout to 60 seconds for Milvus connections, which matches the objective of fixing connection timeout issues on low-resource machines.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/milvus-grpc-connection-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
openrag/services/storage/milvus_store.py (1)

147-147: ⚡ Quick win

Consider making the timeout configurable via VectorDBConfig.

The timeout is currently hardcoded to 30 seconds. While this value addresses the immediate issue on low-resource machines, making it configurable would provide flexibility for different deployment environments without requiring code changes.

🔧 Proposed refactor to add timeout configuration

Add a timeout field to VectorDBConfig:

# In openrag/core/config/infrastructure.py (or wherever VectorDBConfig is defined)
class VectorDBConfig:
    # ... existing fields ...
    timeout: int = 30  # gRPC connection timeout in seconds

Then use it in the constructor:

     def __init__(self, config: VectorDBConfig) -> None:
         self._config = config
         self._collection_name = config.collection_name
         self._hybrid = config.hybrid_search
         self._uri = f"http://{config.host}:{config.port}"
-        self._timeout = 30
+        self._timeout = config.timeout
         try:
             self._client = MilvusClient(uri=self._uri, timeout=self._timeout)
             self._async_client = AsyncMilvusClient(uri=self._uri, timeout=self._timeout)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@openrag/services/storage/milvus_store.py` at line 147, The hardcoded 30s gRPC
timeout in MilvusStore should be made configurable: add a timeout: int = 30
field to VectorDBConfig (or the existing config class used to construct
MilvusStore) and then read that value in MilvusStore.__init__ to set
self._timeout from the passed VectorDBConfig instance (instead of the literal
30). Update any callers/constructors that build VectorDBConfig to preserve
default behavior when not provided.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@openrag/services/storage/milvus_store.py`:
- Line 147: The hardcoded 30s gRPC timeout in MilvusStore should be made
configurable: add a timeout: int = 30 field to VectorDBConfig (or the existing
config class used to construct MilvusStore) and then read that value in
MilvusStore.__init__ to set self._timeout from the passed VectorDBConfig
instance (instead of the literal 30). Update any callers/constructors that build
VectorDBConfig to preserve default behavior when not provided.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4d139065-8d4f-4f8d-906f-7ce9a2a92471

📥 Commits

Reviewing files that changed from the base of the PR and between e207d64 and 034ba93.

📒 Files selected for processing (1)
  • openrag/services/storage/milvus_store.py

@Ahmath-Gadji Ahmath-Gadji force-pushed the fix/milvus-grpc-connection-timeout branch from 034ba93 to 5676d8a Compare June 4, 2026 00:19
On low-resource machines the Milvus 2.6 gRPC handshake can exceed the
default 10-second channel-ready wait, causing pymilvus to throw
MilvusException (code=2, "Fail connecting to server … illegal connection
params or server unavailable").  The underlying TCP port is open and the
HTTP health endpoint responds OK — the channel just needs more time to
complete the gRPC negotiation.

- Pass `timeout=60` to both `MilvusClient` and `AsyncMilvusClient` at
  construction time in `MilvusVectorStore.__init__`.
- Store the value in `self._timeout` so any future callers have a single
  source of truth.

Root-cause confirmed by reproducing the error with the default timeout and
a successful connect with timeout=60 against the same Milvus 2.6.11
container.
@Ahmath-Gadji Ahmath-Gadji force-pushed the fix/milvus-grpc-connection-timeout branch from 5676d8a to 3b57447 Compare June 4, 2026 08:54
@Ahmath-Gadji Ahmath-Gadji changed the title fix(milvus): raise gRPC channel-ready timeout to 30 s fix(milvus): raise gRPC channel-ready timeout to 60 s Jun 4, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@openrag/services/storage/milvus_store.py`:
- Around line 147-150: The timeout is hard-coded to 60 but must match the PR
acceptance criteria of 30; update the initialization so self._timeout is set to
30 and ensure that same value is passed to both MilvusClient(...) and
AsyncMilvusClient(...), i.e., change self._timeout = 60 to self._timeout = 30 so
the constructors for MilvusClient and AsyncMilvusClient use the validated
30-second timeout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 071a5744-7ffd-4aaa-a972-c17507cf3770

📥 Commits

Reviewing files that changed from the base of the PR and between 5676d8a and 3b57447.

📒 Files selected for processing (1)
  • openrag/services/storage/milvus_store.py

Comment thread openrag/services/storage/milvus_store.py
@Ahmath-Gadji
Copy link
Copy Markdown
Collaborator Author

These modifs are directly integrated in this PR #444

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant