Skip to content

Upgrade embedding model to bge-small-en-v1.5#9

Closed
devwhodevs wants to merge 1 commit into
mainfrom
feature/upgrade-embedding-model
Closed

Upgrade embedding model to bge-small-en-v1.5#9
devwhodevs wants to merge 1 commit into
mainfrom
feature/upgrade-embedding-model

Conversation

@devwhodevs

Copy link
Copy Markdown
Owner

Summary

  • Replaces all-MiniLM-L6-v2 (2021, 256 token context, MTEB ~60th) with bge-small-en-v1.5 (2023, 512 token context, MTEB ~30th)
  • Same 384 dimensions — no schema migration needed
  • Fixes silent truncation of chunks longer than 256 tokens

Breaking change

Users must delete ~/.engraph/models/ and run engraph index --rebuild after upgrading. The old model files are incompatible.

Trade-off

Model download grows from 23MB to 127MB (one-time).

Test plan

  • 225 unit tests passing
  • Clippy clean
  • Integration test with model download (cargo test --test integration -- --ignored)
  • Manual: engraph index with new model, verify search quality improvement on long notes

🤖 Generated with Claude Code

Replaces all-MiniLM-L6-v2 (256 token context, MTEB ~60th) with
bge-small-en-v1.5 (512 token context, MTEB ~30th). Same 384
dimensions — no migration or re-index schema change needed.

Key improvement: chunks up to 512 tokens are now fully embedded
instead of silently truncated at 256. This fixes a quality issue
where the second half of larger chunks was invisible to semantic
search.

Trade-off: model download grows from 23MB to 127MB (one-time).

Users must delete ~/.engraph/models/ and run engraph index --rebuild
after upgrading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@devwhodevs devwhodevs closed this Mar 25, 2026
@devwhodevs devwhodevs deleted the feature/upgrade-embedding-model branch March 25, 2026 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant