feat(embeddings): add native Ollama provider for local embeddings#73
feat(embeddings): add native Ollama provider for local embeddings#73rothnic wants to merge 4 commits intoPatrickSys:masterfrom
Conversation
Add support for custom OpenAI-compatible API endpoints via OPENAI_BASE_URL environment variable. This enables using: - Ollama for local LLM inference - LiteLLM Proxy for unified model access - Groq, OpenRouter, and other OpenAI-compatible providers - Self-hosted models (vLLM, text-generation-inference) Changes: - Read OPENAI_BASE_URL from environment in DEFAULT_EMBEDDING_CONFIG - Update README.md with configuration documentation - Update CHANGELOG.md with feature entry Fixes PatrickSys#70
Add full support for Ollama as an embedding provider, enabling local embeddings without cloud dependencies. New Features: - New OllamaEmbeddingProvider class (src/embeddings/ollama.ts) - EMBEDDING_PROVIDER=ollama option - OLLAMA_HOST environment variable (default: http://localhost:11434) - Automatic dimension detection for common Ollama models: - nomic-embed-text: 768 dimensions (default) - mxbai-embed-large: 1024 dimensions - all-minilm: 384 dimensions - Also adds OPENAI_BASE_URL for custom OpenAI-compatible endpoints Files Changed: - src/embeddings/ollama.ts: New Ollama provider implementation - src/embeddings/index.ts: Add Ollama provider integration - src/embeddings/types.ts: Add OLLAMA_HOST support, dynamic apiEndpoint - README.md: Document Ollama configuration options - CHANGELOG.md: Update with feature details Tested with nomic-embed-text generating 768-dimensional embeddings. Closes PatrickSys#70 Related to PatrickSys#68
Greptile SummaryThis PR adds a native Ollama embedding provider to Key changes and issues:
Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| src/embeddings/ollama.ts | New Ollama embedding provider with sequential batch processing, text truncation, and model dimension detection. Missing embeddinggemma from model lookup tables despite being the primary tested model. |
| src/embeddings/index.ts | Lazy loading fix is undermined by the static re-export of TransformersEmbeddingProvider/MODEL_CONFIGS at line 98; also, OLLAMA_HOST env var is bypassed when provider is passed programmatically without EMBEDDING_PROVIDER being set. |
| src/embeddings/types.ts | Adds OLLAMA_HOST and OPENAI_BASE_URL support via a getter on DEFAULT_EMBEDDING_CONFIG; getter-based approach has a subtle spread-evaluation pitfall that contributes to the OLLAMA_HOST bypass issue. |
| README.md | Documentation updated with new EMBEDDING_PROVIDER options, OPENAI_BASE_URL, and OLLAMA_HOST env vars — accurate and complete. |
| CHANGELOG.md | Standard changelog entry for the new Ollama provider feature under [Unreleased]. |
| OLLAMA_TEST_RESULTS.md | New test results document; useful internal reference but likely not intended for permanent inclusion in the repository. |
Sequence Diagram
sequenceDiagram
participant Caller
participant index.ts
participant types.ts
participant ollama.ts
participant OllamaServer
Caller->>index.ts: getEmbeddingProvider({ provider: 'ollama' })
index.ts->>types.ts: spread DEFAULT_EMBEDDING_CONFIG (getter evaluates apiEndpoint)
types.ts-->>index.ts: apiEndpoint (may be undefined if EMBEDDING_PROVIDER≠ollama)
index.ts->>ollama.ts: dynamic import OllamaEmbeddingProvider
index.ts->>ollama.ts: new OllamaEmbeddingProvider(model, apiEndpoint || 'localhost:11434')
ollama.ts-->>index.ts: provider instance
index.ts->>ollama.ts: provider.initialize()
ollama.ts-->>index.ts: ready
Caller->>index.ts: provider.embedBatch(texts)
loop for each text (sequential)
index.ts->>ollama.ts: truncateText(text)
ollama.ts->>OllamaServer: POST /api/embeddings { model, prompt }
OllamaServer-->>ollama.ts: { embedding: number[] }
ollama.ts-->>index.ts: embeddings.push(embedding)
end
index.ts-->>Caller: number[][]
Last reviewed commit: 0986581
src/embeddings/index.ts
Outdated
|
|
||
| // Re-export TransformersEmbeddingProvider and MODEL_CONFIGS for consumers who need them | ||
| // These will trigger transformers loading, but only when explicitly imported | ||
| export { TransformersEmbeddingProvider, MODEL_CONFIGS } from './transformers.js'; |
There was a problem hiding this comment.
Static re-export defeats the lazy loading fix
The stated purpose of this PR's index.ts change is to prevent transformers.js from being eagerly loaded when using non-transformers providers. The old export * from './transformers.js' was removed for exactly this reason. However, line 98 introduces the same problem via a named re-export:
export { TransformersEmbeddingProvider, MODEL_CONFIGS } from './transformers.js';In ES modules, re-exports are resolved statically at module load time — not lazily. Any code that does import { ... } from './index.js' (which includes virtually everything that uses this package) will trigger transformers.js to be loaded immediately, regardless of whether getEmbeddingProvider is ever called with a transformers config. This means the hang observed when using the Ollama provider will still occur for any consumer that imports from this index module.
The getEmbeddingProvider function already correctly uses await import('./transformers.js') for lazy loading. If TransformersEmbeddingProvider and MODEL_CONFIGS must remain re-exported for downstream consumers, those consumers should import directly from './transformers.js' rather than via the index, or this line should be removed and documented as a breaking change for consumers relying on the re-export.
| if (mergedConfig.provider === 'ollama') { | ||
| console.warn('Ollama provider not yet implemented, falling back to Transformers.js'); | ||
| const { OllamaEmbeddingProvider } = await import('./ollama.js'); | ||
| const provider = new OllamaEmbeddingProvider( | ||
| mergedConfig.model || 'nomic-embed-text', | ||
| mergedConfig.apiEndpoint || 'http://localhost:11434' | ||
| ); | ||
| await provider.initialize(); | ||
| cachedProvider = provider; | ||
| cachedProviderType = providerKey; | ||
| return provider; | ||
| } |
There was a problem hiding this comment.
OLLAMA_HOST ignored when provider is passed programmatically
DEFAULT_EMBEDDING_CONFIG defines apiEndpoint as a getter that calls getDefaultApiEndpoint(this.provider). When getEmbeddingProvider spreads this config:
const mergedConfig = { ...DEFAULT_EMBEDDING_CONFIG, ...config };The spread operator evaluates the getter at spread time, with this bound to DEFAULT_EMBEDDING_CONFIG. So this.provider equals DEFAULT_EMBEDDING_CONFIG.provider, which is derived from process.env.EMBEDDING_PROVIDER ?? 'transformers'.
If EMBEDDING_PROVIDER is not set (defaults to 'transformers'), the getter returns undefined for apiEndpoint. This means calling getEmbeddingProvider({ provider: 'ollama' }) programmatically will always fall back to the hardcoded 'http://localhost:11434' on line 78, silently ignoring the OLLAMA_HOST environment variable.
A straightforward fix is to read OLLAMA_HOST directly within the provider branch:
const endpoint =
mergedConfig.apiEndpoint ||
process.env.OLLAMA_HOST ||
'http://localhost:11434';
const provider = new OllamaEmbeddingProvider(
mergedConfig.model || 'nomic-embed-text',
endpoint
);| const MODEL_CONTEXT_WINDOWS: Record<string, number> = { | ||
| 'nomic-embed-text': 2048, | ||
| 'nomic-embed-text:latest': 2048, | ||
| 'mxbai-embed-large': 512, | ||
| 'mxbai-embed-large:latest': 512, | ||
| 'all-minilm': 512, | ||
| 'all-minilm:latest': 512 | ||
| }; | ||
|
|
||
| // Conservative character limit (approx 2 chars per token for code) | ||
| // Code has more tokens per character due to punctuation and symbols | ||
| function getMaxChars(modelName: string): number { | ||
| const tokens = MODEL_CONTEXT_WINDOWS[modelName] || 2048; | ||
| return tokens * 2; // Very conservative: 2 chars per token | ||
| } | ||
|
|
||
| /** | ||
| * Ollama Embedding Provider | ||
| * Supports local embedding models via Ollama API. | ||
| * API endpoint: POST /api/embeddings | ||
| */ | ||
| export class OllamaEmbeddingProvider implements EmbeddingProvider { | ||
| readonly name = 'ollama'; | ||
| private maxChars: number; | ||
|
|
||
| // Default dimensions for nomic-embed-text (768) | ||
| // Override via EMBEDDING_MODEL env var for other models | ||
| get dimensions(): number { | ||
| // Common Ollama embedding model dimensions | ||
| const modelDimensions: Record<string, number> = { | ||
| 'nomic-embed-text': 768, | ||
| 'nomic-embed-text:latest': 768, | ||
| 'mxbai-embed-large': 1024, | ||
| 'mxbai-embed-large:latest': 1024, | ||
| 'all-minilm': 384, | ||
| 'all-minilm:latest': 384 | ||
| }; | ||
| return modelDimensions[this.modelName] || 768; |
There was a problem hiding this comment.
embeddinggemma missing from model lookup tables
The PR description and OLLAMA_TEST_RESULTS.md both highlight embeddinggemma as a first-class supported and tested model. However, it is absent from both MODEL_CONTEXT_WINDOWS and the modelDimensions map in the dimensions getter. Unknown models silently fall back to 768 dimensions and 2048 token context. If embeddinggemma's actual values differ from these defaults in a future Ollama version, users will get silent LanceDB schema mismatches during re-indexing.
The same gap exists in getConfiguredDimensions in index.ts (line 34–42). Consider adding an explicit entry:
const MODEL_CONTEXT_WINDOWS: Record<string, number> = {
'nomic-embed-text': 2048,
'nomic-embed-text:latest': 2048,
'embeddinggemma': 2048, // add
'embeddinggemma:latest': 2048, // add
'mxbai-embed-large': 512,
...
};|
Going to clean this up a bit still and remove the testing doc, etc. |
…provider - Add context window-aware text truncation to prevent API errors - Implement conservative 2 chars/token ratio for code truncation - Fix eager transformers loading that caused hangs with Ollama - Move MODEL_CONFIGS inline to avoid importing heavy transformers module - Add support for model-specific context windows (nomic-embed-text, mxbai, etc.)
0986581 to
170758f
Compare
Test Results - ollama server with p40 Video CardTested both embedding models on the same project (60 files, 188 chunks): Performance Comparison
Search Quality Examplesnomic-embed-text results:
Both models produce good results, but nomic-embed-text is significantly faster on the same hardware. This aligns with its design as a dedicated embedding model vs embeddinggemma's general-purpose architecture. Configuration UsedEMBEDDING_PROVIDER=ollama
OLLAMA_HOST=http://<ollama host>:11434
EMBEDDING_MODEL=nomic-embed-text # or embeddinggemmaThe OLLAMA_HOST fix from the code review is working correctly - the environment variable is properly respected when set. |
…fix OLLAMA_HOST, add embeddinggemma
170758f to
8ecc514
Compare
Update: EMBEDDING_DIMENSIONS SupportAdded support for Usage# Use a custom model with explicit dimensions
EMBEDDING_PROVIDER=ollama \
EMBEDDING_MODEL=my-custom-model \
EMBEDDING_DIMENSIONS=1024 \
npx codebase-context reindexThis addresses the review feedback about unknown models falling back to 768 dimensions silently. Now users can:
The env var is checked in both:
Both locations check |
Summary
This PR adds native Ollama support for codebase-context, enabling privacy-first local or self-hosted embedding generation as an alternative to OpenAI cloud embeddings. It addresses Issue #70 for custom OpenAI-compatible API endpoints.
What Changed
New Features
src/embeddings/ollama.ts): Full integration with Ollama's/api/embeddingsendpointEMBEDDING_DIMENSIONSenv var for models not in lookup tablesEMBEDDING_PROVIDER=ollamaOLLAMA_HOST=http://localhost:11434(or remote server)EMBEDDING_MODEL=nomic-embed-textEMBEDDING_DIMENSIONS=768(optional override)Bug Fixes
Configuration Examples
Files Changed
src/embeddings/ollama.ts(new)src/embeddings/index.ts(lazy loading fix, dimension lookup)src/embeddings/types.ts(OLLAMA_HOST support)README.md(documentation)CHANGELOG.md(feature entry)Testing
All tests pass. Provider tested with nomic-embed-text and embeddinggemma models on remote Ollama server.
Closes #70