fix(embedding): chunk OpenAI embedBatch to avoid runner input-count limits#960
fix(embedding): chunk OpenAI embedBatch to avoid runner input-count limits#960shgew wants to merge 2 commits into
Conversation
…imits
Some self-hosted OpenAI-compatible runners (Ollama in particular)
crash the model subprocess when one /embeddings request carries too
many inputs (observed ~600 inputs returning HTTP 400 "tokenize: EOF"),
regardless of total token volume. migrateVectorIndex passes the full
set of session observations and memories to embedBatch on bulk
reindex, so the largest sessions failed against such runners.
OpenAIEmbeddingProvider.embedBatch now splits into sequential
sub-batches of at most OPENAI_EMBEDDING_MAX_BATCH inputs (default 256).
Sequential, not parallel, to respect single-slot runners. Input order
is preserved across chunks; errors propagate. Invalid env values
(non-numeric, negative, zero) are rejected at construction.
Tests:
- test/embedding-provider.test.ts: under-max single POST, over-max
chunking, env override, order preservation, sequential ordering,
error propagation, invalid env rejection.
- .env.example and README.md document the env knob.
Signed-off-by: Hleb Shauchenka <me@marleb.org>
|
@shgew is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds ChangesOpenAI Embedding Batch Size Limit
Sequence DiagramsequenceDiagram
participant Caller
participant embedBatch
participant embedChunk
Caller->>embedBatch: texts[]
alt texts.length <= maxBatch
embedBatch->>embedChunk: texts (full slice)
embedChunk-->>embedBatch: Float32Array[]
else texts.length > maxBatch
loop each maxBatch-sized slice
embedBatch->>embedChunk: slice
embedChunk-->>embedBatch: Float32Array[]
end
embedBatch->>embedBatch: concatenate results
end
embedBatch-->>Caller: Float32Array[]
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/providers/embedding/openai.ts`:
- Around line 46-55: The `parseInt` function in the `resolveMaxBatch` function
accepts strings with numeric prefixes followed by non-numeric characters (e.g.,
"100abc" parses to 100), bypassing the intended validation. To fix this,
validate that the trimmed override string contains only numeric characters
before parsing, or alternatively verify that converting the parsed number back
to a string matches the original trimmed input to ensure no trailing non-numeric
characters exist. This will properly reject mixed alphanumeric strings as
invalid input.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c96be495-7791-45ce-9744-4884e9774955
📒 Files selected for processing (4)
.env.exampleREADME.mdsrc/providers/embedding/openai.tstest/embedding-provider.test.ts
parseInt('100abc', 10) returns 100, so the previous validation accepted
strings with numeric prefixes followed by junk. Use a positive-integer
regex to reject anything outside /^[1-9]\d*$/.
What
Chunk
OpenAIEmbeddingProvider.embedBatchinto sequential sub-batches sized by a new envOPENAI_EMBEDDING_MAX_BATCH(default 256). Each sub-batch issues a single/embeddingsPOST. Input order is preserved across chunks; errors propagate; invalid env values (non-numeric, negative, zero) are rejected at construction.Why
Some self-hosted OpenAI-compatible runners (Ollama in particular) crash the model subprocess when one
/embeddingsrequest carries too many inputs (observed ~600 inputs returning HTTP 400tokenize: EOF), regardless of total token volume.migrateVectorIndexhandsembedBatcha whole session's observations and all memories at once, so bulk reindex against such runners failed on the largest sessions.How
Sequential, not parallel, to respect single-slot runners.
Tests
7 new cases in
test/embedding-provider.test.ts: under-max single POST, over-max chunking, env override, order preservation, sequential ordering, error propagation, invalid env rejection. Targeted suite: 24/24 pass.Compatibility
Unset env = previous behavior (single POST). No regression for hosted OpenAI.
.env.exampleand the README env vars section document the knob.Summary by CodeRabbit
Release Notes
New Features
OPENAI_EMBEDDING_MAX_BATCHto control the maximum number of inputs sent per/embeddingsrequest. When exceeded, requests are split into sequential sub-batches while preserving input order.Documentation
OPENAI_EMBEDDING_MAX_BATCH=256in bothREADME.mdand.env.example, including guidance to lower the value for self-hosted setups if needed.Tests