Skip to content

feat(embeddings): add native Ollama provider for local embeddings#73

Draft
rothnic wants to merge 4 commits intoPatrickSys:masterfrom
rothnic:feature/openai-base-url-support
Draft

feat(embeddings): add native Ollama provider for local embeddings#73
rothnic wants to merge 4 commits intoPatrickSys:masterfrom
rothnic:feature/openai-base-url-support

Conversation

@rothnic
Copy link

@rothnic rothnic commented Mar 11, 2026

Summary

This PR adds native Ollama support for codebase-context, enabling privacy-first local or self-hosted embedding generation as an alternative to OpenAI cloud embeddings. It addresses Issue #70 for custom OpenAI-compatible API endpoints.

What Changed

New Features

  • Native Ollama Provider (src/embeddings/ollama.ts): Full integration with Ollama's /api/embeddings endpoint
  • Multi-model support: nomic-embed-text, embeddinggemma, mxbai-embed-large, all-minilm
  • Custom model support: EMBEDDING_DIMENSIONS env var for models not in lookup tables
  • Environment variables:
    • EMBEDDING_PROVIDER=ollama
    • OLLAMA_HOST=http://localhost:11434 (or remote server)
    • EMBEDDING_MODEL=nomic-embed-text
    • EMBEDDING_DIMENSIONS=768 (optional override)

Bug Fixes

  • Fixed eager transformers loading: Removed static re-export that caused hangs
  • Fixed OLLAMA_HOST bypass: Properly respects env var in programmatic usage

Configuration Examples

# Local Ollama
EMBEDDING_PROVIDER=ollama EMBEDDING_MODEL=nomic-embed-text npx codebase-context reindex

# Remote Ollama server
EMBEDDING_PROVIDER=ollama OLLAMA_HOST=http://server:11434 EMBEDDING_MODEL=nomic-embed-text npx codebase-context reindex

# Custom model with explicit dimensions
EMBEDDING_PROVIDER=ollama EMBEDDING_MODEL=my-model EMBEDDING_DIMENSIONS=1024 npx codebase-context reindex

Files Changed

  • src/embeddings/ollama.ts (new)
  • src/embeddings/index.ts (lazy loading fix, dimension lookup)
  • src/embeddings/types.ts (OLLAMA_HOST support)
  • README.md (documentation)
  • CHANGELOG.md (feature entry)

Testing

All tests pass. Provider tested with nomic-embed-text and embeddinggemma models on remote Ollama server.

Closes #70

rothnic added 2 commits March 11, 2026 10:16
Add support for custom OpenAI-compatible API endpoints via OPENAI_BASE_URL
environment variable. This enables using:
- Ollama for local LLM inference
- LiteLLM Proxy for unified model access
- Groq, OpenRouter, and other OpenAI-compatible providers
- Self-hosted models (vLLM, text-generation-inference)

Changes:
- Read OPENAI_BASE_URL from environment in DEFAULT_EMBEDDING_CONFIG
- Update README.md with configuration documentation
- Update CHANGELOG.md with feature entry

Fixes PatrickSys#70
Add full support for Ollama as an embedding provider, enabling local
embeddings without cloud dependencies.

New Features:
- New OllamaEmbeddingProvider class (src/embeddings/ollama.ts)
- EMBEDDING_PROVIDER=ollama option
- OLLAMA_HOST environment variable (default: http://localhost:11434)
- Automatic dimension detection for common Ollama models:
  - nomic-embed-text: 768 dimensions (default)
  - mxbai-embed-large: 1024 dimensions
  - all-minilm: 384 dimensions
- Also adds OPENAI_BASE_URL for custom OpenAI-compatible endpoints

Files Changed:
- src/embeddings/ollama.ts: New Ollama provider implementation
- src/embeddings/index.ts: Add Ollama provider integration
- src/embeddings/types.ts: Add OLLAMA_HOST support, dynamic apiEndpoint
- README.md: Document Ollama configuration options
- CHANGELOG.md: Update with feature details

Tested with nomic-embed-text generating 768-dimensional embeddings.

Closes PatrickSys#70
Related to PatrickSys#68
@greptile-apps
Copy link

greptile-apps bot commented Mar 11, 2026

Greptile Summary

This PR adds a native Ollama embedding provider to codebase-context, enabling privacy-first local or self-hosted embedding generation as an alternative to OpenAI cloud embeddings. It also fixes a module-hang bug caused by eager loading of the heavy transformers.js module, and extends the EmbeddingConfig with OLLAMA_HOST and OPENAI_BASE_URL support.

Key changes and issues:

  • New src/embeddings/ollama.ts: Clean implementation of the EmbeddingProvider interface using Ollama's /api/embeddings endpoint, with sequential batch processing and text truncation to respect model context windows.
  • src/embeddings/index.ts — lazy loading partially broken: The eager-loading fix removes the old export * from './transformers.js' but then re-adds export { TransformersEmbeddingProvider, MODEL_CONFIGS } from './transformers.js' at line 98. Static re-exports are resolved at module load time in ES modules, so this line will still cause transformers.js to be loaded eagerly whenever index.ts is imported, defeating the stated fix.
  • src/embeddings/index.tsOLLAMA_HOST bypassed programmatically: The apiEndpoint getter on DEFAULT_EMBEDDING_CONFIG is evaluated during object spread, using the default provider (usually 'transformers'). If getEmbeddingProvider({ provider: 'ollama' }) is called programmatically while EMBEDDING_PROVIDER is not set to 'ollama', OLLAMA_HOST is silently ignored and the hardcoded fallback 'http://localhost:11434' is used instead.
  • embeddinggemma missing from model maps: Despite being the primary tested model, embeddinggemma is absent from the dimension and context-window lookup tables in both ollama.ts and index.ts. It currently works by falling through to the 768-dimension default, but this is fragile.
  • OLLAMA_TEST_RESULTS.md appears to be a developer scratch file; consider whether it belongs in the repository long-term.

Confidence Score: 2/5

  • Not safe to merge as-is — the lazy loading fix is broken by a static re-export, and OLLAMA_HOST can be silently ignored in programmatic usage.
  • Two logic-level issues need resolution: (1) the re-added static re-export at index.ts line 98 effectively reverts the core bug fix this PR claims to make, and (2) the OLLAMA_HOST env var is bypassed when the provider is specified programmatically. These are not edge-case concerns — the first affects every consumer of the package when using the Ollama provider, and the second affects any programmatic API user. The provider implementation itself (ollama.ts) is clean and well-structured, which keeps the score above 1.
  • Pay close attention to src/embeddings/index.ts — specifically line 98 (static re-export defeating lazy loading) and lines 74-84 (OLLAMA_HOST fallback logic).

Important Files Changed

Filename Overview
src/embeddings/ollama.ts New Ollama embedding provider with sequential batch processing, text truncation, and model dimension detection. Missing embeddinggemma from model lookup tables despite being the primary tested model.
src/embeddings/index.ts Lazy loading fix is undermined by the static re-export of TransformersEmbeddingProvider/MODEL_CONFIGS at line 98; also, OLLAMA_HOST env var is bypassed when provider is passed programmatically without EMBEDDING_PROVIDER being set.
src/embeddings/types.ts Adds OLLAMA_HOST and OPENAI_BASE_URL support via a getter on DEFAULT_EMBEDDING_CONFIG; getter-based approach has a subtle spread-evaluation pitfall that contributes to the OLLAMA_HOST bypass issue.
README.md Documentation updated with new EMBEDDING_PROVIDER options, OPENAI_BASE_URL, and OLLAMA_HOST env vars — accurate and complete.
CHANGELOG.md Standard changelog entry for the new Ollama provider feature under [Unreleased].
OLLAMA_TEST_RESULTS.md New test results document; useful internal reference but likely not intended for permanent inclusion in the repository.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant index.ts
    participant types.ts
    participant ollama.ts
    participant OllamaServer

    Caller->>index.ts: getEmbeddingProvider({ provider: 'ollama' })
    index.ts->>types.ts: spread DEFAULT_EMBEDDING_CONFIG (getter evaluates apiEndpoint)
    types.ts-->>index.ts: apiEndpoint (may be undefined if EMBEDDING_PROVIDER≠ollama)
    index.ts->>ollama.ts: dynamic import OllamaEmbeddingProvider
    index.ts->>ollama.ts: new OllamaEmbeddingProvider(model, apiEndpoint || 'localhost:11434')
    ollama.ts-->>index.ts: provider instance
    index.ts->>ollama.ts: provider.initialize()
    ollama.ts-->>index.ts: ready

    Caller->>index.ts: provider.embedBatch(texts)
    loop for each text (sequential)
        index.ts->>ollama.ts: truncateText(text)
        ollama.ts->>OllamaServer: POST /api/embeddings { model, prompt }
        OllamaServer-->>ollama.ts: { embedding: number[] }
        ollama.ts-->>index.ts: embeddings.push(embedding)
    end
    index.ts-->>Caller: number[][]
Loading

Last reviewed commit: 0986581


// Re-export TransformersEmbeddingProvider and MODEL_CONFIGS for consumers who need them
// These will trigger transformers loading, but only when explicitly imported
export { TransformersEmbeddingProvider, MODEL_CONFIGS } from './transformers.js';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static re-export defeats the lazy loading fix

The stated purpose of this PR's index.ts change is to prevent transformers.js from being eagerly loaded when using non-transformers providers. The old export * from './transformers.js' was removed for exactly this reason. However, line 98 introduces the same problem via a named re-export:

export { TransformersEmbeddingProvider, MODEL_CONFIGS } from './transformers.js';

In ES modules, re-exports are resolved statically at module load time — not lazily. Any code that does import { ... } from './index.js' (which includes virtually everything that uses this package) will trigger transformers.js to be loaded immediately, regardless of whether getEmbeddingProvider is ever called with a transformers config. This means the hang observed when using the Ollama provider will still occur for any consumer that imports from this index module.

The getEmbeddingProvider function already correctly uses await import('./transformers.js') for lazy loading. If TransformersEmbeddingProvider and MODEL_CONFIGS must remain re-exported for downstream consumers, those consumers should import directly from './transformers.js' rather than via the index, or this line should be removed and documented as a breaking change for consumers relying on the re-export.

Comment on lines 74 to 84
if (mergedConfig.provider === 'ollama') {
console.warn('Ollama provider not yet implemented, falling back to Transformers.js');
const { OllamaEmbeddingProvider } = await import('./ollama.js');
const provider = new OllamaEmbeddingProvider(
mergedConfig.model || 'nomic-embed-text',
mergedConfig.apiEndpoint || 'http://localhost:11434'
);
await provider.initialize();
cachedProvider = provider;
cachedProviderType = providerKey;
return provider;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OLLAMA_HOST ignored when provider is passed programmatically

DEFAULT_EMBEDDING_CONFIG defines apiEndpoint as a getter that calls getDefaultApiEndpoint(this.provider). When getEmbeddingProvider spreads this config:

const mergedConfig = { ...DEFAULT_EMBEDDING_CONFIG, ...config };

The spread operator evaluates the getter at spread time, with this bound to DEFAULT_EMBEDDING_CONFIG. So this.provider equals DEFAULT_EMBEDDING_CONFIG.provider, which is derived from process.env.EMBEDDING_PROVIDER ?? 'transformers'.

If EMBEDDING_PROVIDER is not set (defaults to 'transformers'), the getter returns undefined for apiEndpoint. This means calling getEmbeddingProvider({ provider: 'ollama' }) programmatically will always fall back to the hardcoded 'http://localhost:11434' on line 78, silently ignoring the OLLAMA_HOST environment variable.

A straightforward fix is to read OLLAMA_HOST directly within the provider branch:

const endpoint =
  mergedConfig.apiEndpoint ||
  process.env.OLLAMA_HOST ||
  'http://localhost:11434';
const provider = new OllamaEmbeddingProvider(
  mergedConfig.model || 'nomic-embed-text',
  endpoint
);

Comment on lines +8 to +45
const MODEL_CONTEXT_WINDOWS: Record<string, number> = {
'nomic-embed-text': 2048,
'nomic-embed-text:latest': 2048,
'mxbai-embed-large': 512,
'mxbai-embed-large:latest': 512,
'all-minilm': 512,
'all-minilm:latest': 512
};

// Conservative character limit (approx 2 chars per token for code)
// Code has more tokens per character due to punctuation and symbols
function getMaxChars(modelName: string): number {
const tokens = MODEL_CONTEXT_WINDOWS[modelName] || 2048;
return tokens * 2; // Very conservative: 2 chars per token
}

/**
* Ollama Embedding Provider
* Supports local embedding models via Ollama API.
* API endpoint: POST /api/embeddings
*/
export class OllamaEmbeddingProvider implements EmbeddingProvider {
readonly name = 'ollama';
private maxChars: number;

// Default dimensions for nomic-embed-text (768)
// Override via EMBEDDING_MODEL env var for other models
get dimensions(): number {
// Common Ollama embedding model dimensions
const modelDimensions: Record<string, number> = {
'nomic-embed-text': 768,
'nomic-embed-text:latest': 768,
'mxbai-embed-large': 1024,
'mxbai-embed-large:latest': 1024,
'all-minilm': 384,
'all-minilm:latest': 384
};
return modelDimensions[this.modelName] || 768;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

embeddinggemma missing from model lookup tables

The PR description and OLLAMA_TEST_RESULTS.md both highlight embeddinggemma as a first-class supported and tested model. However, it is absent from both MODEL_CONTEXT_WINDOWS and the modelDimensions map in the dimensions getter. Unknown models silently fall back to 768 dimensions and 2048 token context. If embeddinggemma's actual values differ from these defaults in a future Ollama version, users will get silent LanceDB schema mismatches during re-indexing.

The same gap exists in getConfiguredDimensions in index.ts (line 34–42). Consider adding an explicit entry:

const MODEL_CONTEXT_WINDOWS: Record<string, number> = {
  'nomic-embed-text': 2048,
  'nomic-embed-text:latest': 2048,
  'embeddinggemma': 2048,       // add
  'embeddinggemma:latest': 2048, // add
  'mxbai-embed-large': 512,
  ...
};

@rothnic rothnic marked this pull request as draft March 11, 2026 22:34
@rothnic
Copy link
Author

rothnic commented Mar 11, 2026

Going to clean this up a bit still and remove the testing doc, etc.

…provider

- Add context window-aware text truncation to prevent API errors
- Implement conservative 2 chars/token ratio for code truncation
- Fix eager transformers loading that caused hangs with Ollama
- Move MODEL_CONFIGS inline to avoid importing heavy transformers module
- Add support for model-specific context windows (nomic-embed-text, mxbai, etc.)
@rothnic rothnic force-pushed the feature/openai-base-url-support branch from 0986581 to 170758f Compare March 11, 2026 22:42
@rothnic
Copy link
Author

rothnic commented Mar 11, 2026

Test Results - ollama server with p40 Video Card

Tested both embedding models on the same project (60 files, 188 chunks):

Performance Comparison

Model Indexing Time Throughput Notes
nomic-embed-text 19 seconds ~9.8 chunks/sec Fast, optimized for embeddings
embeddinggemma 199 seconds ~0.9 chunks/sec 10x slower, general-purpose

Search Quality Examples

nomic-embed-text results:

  • "scrape website" → Found Firecrawl scraping components (confidence: 0.75)
  • "fetch data" → Found API testing code (confidence: 0.99)
  • "error handling" → Found try/catch blocks (confidence: 1.00)
  • "authentication" → Found auth components (confidence: 0.98)

Both models produce good results, but nomic-embed-text is significantly faster on the same hardware. This aligns with its design as a dedicated embedding model vs embeddinggemma's general-purpose architecture.

Configuration Used

EMBEDDING_PROVIDER=ollama
OLLAMA_HOST=http://<ollama host>:11434
EMBEDDING_MODEL=nomic-embed-text  # or embeddinggemma

The OLLAMA_HOST fix from the code review is working correctly - the environment variable is properly respected when set.

@rothnic rothnic force-pushed the feature/openai-base-url-support branch from 170758f to 8ecc514 Compare March 11, 2026 22:54
@rothnic
Copy link
Author

rothnic commented Mar 11, 2026

Update: EMBEDDING_DIMENSIONS Support

Added support for EMBEDDING_DIMENSIONS environment variable to handle custom models not in the hardcoded lookup tables.

Usage

# Use a custom model with explicit dimensions
EMBEDDING_PROVIDER=ollama \
EMBEDDING_MODEL=my-custom-model \
EMBEDDING_DIMENSIONS=1024 \
npx codebase-context reindex

This addresses the review feedback about unknown models falling back to 768 dimensions silently. Now users can:

  1. Use models not explicitly listed in the code
  2. Override dimensions if Ollama updates a model's output size
  3. Prevent LanceDB schema mismatches during re-indexing

The env var is checked in both:

  • OllamaEmbeddingProvider.dimensions getter (for the provider instance)
  • getConfiguredDimensions() (for LanceDB validation)

Both locations check process.env.EMBEDDING_DIMENSIONS first before falling back to lookup tables or defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support custom OpenAI-compatible API endpoints (OPENAI_BASE_URL)

1 participant