Skip to content

fix(embedding): chunk OpenAI embedBatch to avoid runner input-count limits#960

Open
shgew wants to merge 2 commits into
rohitg00:mainfrom
shgew:fix/embedding-openai-batch-chunking
Open

fix(embedding): chunk OpenAI embedBatch to avoid runner input-count limits#960
shgew wants to merge 2 commits into
rohitg00:mainfrom
shgew:fix/embedding-openai-batch-chunking

Conversation

@shgew

@shgew shgew commented Jun 20, 2026

Copy link
Copy Markdown

What

Chunk OpenAIEmbeddingProvider.embedBatch into sequential sub-batches sized by a new env OPENAI_EMBEDDING_MAX_BATCH (default 256). Each sub-batch issues a single /embeddings POST. Input order is preserved across chunks; errors propagate; invalid env values (non-numeric, negative, zero) are rejected at construction.

Why

Some self-hosted OpenAI-compatible runners (Ollama in particular) crash the model subprocess when one /embeddings request carries too many inputs (observed ~600 inputs returning HTTP 400 tokenize: EOF), regardless of total token volume. migrateVectorIndex hands embedBatch a whole session's observations and all memories at once, so bulk reindex against such runners failed on the largest sessions.

How

Sequential, not parallel, to respect single-slot runners.

Tests

7 new cases in test/embedding-provider.test.ts: under-max single POST, over-max chunking, env override, order preservation, sequential ordering, error propagation, invalid env rejection. Targeted suite: 24/24 pass.

Compatibility

Unset env = previous behavior (single POST). No regression for hosted OpenAI.

.env.example and the README env vars section document the knob.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added OPENAI_EMBEDDING_MAX_BATCH to control the maximum number of inputs sent per /embeddings request. When exceeded, requests are split into sequential sub-batches while preserving input order.
  • Documentation

    • Documented OPENAI_EMBEDDING_MAX_BATCH=256 in both README.md and .env.example, including guidance to lower the value for self-hosted setups if needed.
  • Tests

    • Expanded coverage for batching, overrides, ordering, sequential (non-parallel) behavior, error propagation, and validation of invalid environment values.

…imits

Some self-hosted OpenAI-compatible runners (Ollama in particular)
crash the model subprocess when one /embeddings request carries too
many inputs (observed ~600 inputs returning HTTP 400 "tokenize: EOF"),
regardless of total token volume. migrateVectorIndex passes the full
set of session observations and memories to embedBatch on bulk
reindex, so the largest sessions failed against such runners.

OpenAIEmbeddingProvider.embedBatch now splits into sequential
sub-batches of at most OPENAI_EMBEDDING_MAX_BATCH inputs (default 256).
Sequential, not parallel, to respect single-slot runners. Input order
is preserved across chunks; errors propagate. Invalid env values
(non-numeric, negative, zero) are rejected at construction.

Tests:
  - test/embedding-provider.test.ts: under-max single POST, over-max
    chunking, env override, order preservation, sequential ordering,
    error propagation, invalid env rejection.
  - .env.example and README.md document the env knob.

Signed-off-by: Hleb Shauchenka <me@marleb.org>
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

@shgew is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8764f02a-a806-422e-9cd4-2d082603d315

📥 Commits

Reviewing files that changed from the base of the PR and between 5517e6a and 17422fc.

📒 Files selected for processing (1)
  • src/providers/embedding/openai.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/providers/embedding/openai.ts

📝 Walkthrough

Walkthrough

Adds OPENAI_EMBEDDING_MAX_BATCH environment variable support to OpenAIEmbeddingProvider. A resolveMaxBatch() helper parses and validates the env var; embedBatch() splits inputs exceeding the limit into sequential sub-batches. Tests cover chunking, ordering, concurrency, errors, and validation. Documentation added to .env.example and README.md.

Changes

OpenAI Embedding Batch Size Limit

Layer / File(s) Summary
Batch limit resolver and embedBatch chunking
src/providers/embedding/openai.ts
Adds DEFAULT_MAX_BATCH (256) and resolveMaxBatch() with positive-integer validation of OPENAI_EMBEDDING_MAX_BATCH; adds private maxBatch field set in the constructor; updates embedBatch() to iterate through maxBatch-sized slices sequentially when input exceeds the limit, concatenating results.
Tests, env example, and README docs
test/embedding-provider.test.ts, .env.example, README.md
Clears OPENAI_EMBEDDING_MAX_BATCH in beforeEach; adds tests for single-request path at/below 256, sequential splitting above 256, env override, order preservation, max-concurrency-of-1 guarantee, sub-batch error propagation, and constructor throws on invalid values. Documents the variable in .env.example and README.md.

Sequence Diagram

sequenceDiagram
  participant Caller
  participant embedBatch
  participant embedChunk
  Caller->>embedBatch: texts[]
  alt texts.length <= maxBatch
    embedBatch->>embedChunk: texts (full slice)
    embedChunk-->>embedBatch: Float32Array[]
  else texts.length > maxBatch
    loop each maxBatch-sized slice
      embedBatch->>embedChunk: slice
      embedChunk-->>embedBatch: Float32Array[]
    end
    embedBatch->>embedBatch: concatenate results
  end
  embedBatch-->>Caller: Float32Array[]
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

  • rohitg00

Poem

🐇 A rabbit hops through batches wide,
No more than 256 inputs per ride!
If Ollama stumbles on too big a pile,
We slice and sequence, file by file.
Each chunk sent neat, results combined—
Order preserved, no embeddings left behind! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding chunking to OpenAI embedBatch to prevent input-count limit failures in self-hosted runners.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/providers/embedding/openai.ts`:
- Around line 46-55: The `parseInt` function in the `resolveMaxBatch` function
accepts strings with numeric prefixes followed by non-numeric characters (e.g.,
"100abc" parses to 100), bypassing the intended validation. To fix this,
validate that the trimmed override string contains only numeric characters
before parsing, or alternatively verify that converting the parsed number back
to a string matches the original trimmed input to ensure no trailing non-numeric
characters exist. This will properly reject mixed alphanumeric strings as
invalid input.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c96be495-7791-45ce-9744-4884e9774955

📥 Commits

Reviewing files that changed from the base of the PR and between f6f9e3c and 5517e6a.

📒 Files selected for processing (4)
  • .env.example
  • README.md
  • src/providers/embedding/openai.ts
  • test/embedding-provider.test.ts

Comment thread src/providers/embedding/openai.ts
parseInt('100abc', 10) returns 100, so the previous validation accepted
strings with numeric prefixes followed by junk. Use a positive-integer
regex to reject anything outside /^[1-9]\d*$/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant