Skip to content

Conversation

@naknomum
Copy link
Member

@naknomum naknomum commented Dec 10, 2025

Provide a foundation for future (10.10) usage of embedding-based matching of annotations.

Overview

  • Vector support: in postgres and java (PGVector)
  • Embedding java class created with 1-to-many relationship to Annotations
  • Embeddings (objects) contain the vector as well as some metadata about "method" (algorithm name), method version, and timestamp.
  • Embeddings searchable (as true vector) in OpenSearch annotation document
  • getMatches() on Annotation to perform a combined matching-set-query and embeddings search in OpenSearch to provide matching annotations in a single query. [work-in-progress for usage in 10.10]
  • MLService java class created for handling queries to (external) ml-service.
  • Vectors creation now part of IA pipeline [see note below]
  • sendMediaAssetsForceId() and sendAnnotationsForceId() to send these objects to wbia but impose our id to be used in wbia

PR fixes #1329

Notes on vectors as part of IA pipeline

  • Originally the plan was to have an "all-in-one" query to wbia during the detection phase, which would return the annotations (per usual) plus the embedding(s); however, this was deemed too difficult for current work
  • MLService provides a separate api endpoint to get a vector from an annotation, so this is implemented as part of the pipeline (post-detection) when a new Annotation is created
  • For backward-compatibility this embedding creation step is only done when the IA.json file is set up with MLService parameters

Notes on java/db vector support

  • PGVector class is used in java, but not actually used when persisting to postgresql due to limitations of DataNucleus
  • Currently there no real need to use vectors in postgresql (e.g. writing sql to query on them), as vector space is queried instead in OpenSearch; but when sql vectors are needed, the above point will need to be addressed
  • Despite no vectors being used in postgresql, the docker image has been updated in devops/develop/ to reflect support for vector anyway; technically this is not necessary but seemed to be the right time to do it

OpenSearch and vector length

  • OpenSearch requires you declare vector size in the mapping defining the index. This means vectors of varying lengths will need their own mapping. Presently only length 2152 (based on current ml-service embeddings) is supported, but this concept will need to be expanded to support additional lengths as they arise.

naknomum added 30 commits July 25, 2025 12:31
@codecov-commenter
Copy link

codecov-commenter commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 9.78261% with 581 lines in your changes missing coverage. Please review.
✅ Project coverage is 11.47%. Comparing base (ef61621) to head (9e0ffa2).

Files with missing lines Patch % Lines
src/main/java/org/ecocean/ia/MLService.java 0.00% 235 Missing ⚠️
...c/main/java/org/ecocean/ia/plugin/WildbookIAM.java 0.00% 117 Missing ⚠️
src/main/java/org/ecocean/Annotation.java 18.11% 87 Missing and 26 partials ⚠️
src/main/java/org/ecocean/identity/IBEISIA.java 0.00% 32 Missing ⚠️
src/main/java/org/ecocean/Embedding.java 58.06% 20 Missing and 6 partials ⚠️
src/main/java/org/ecocean/Util.java 0.00% 18 Missing ⚠️
src/main/java/org/ecocean/ia/IAException.java 0.00% 13 Missing ⚠️
src/main/java/org/ecocean/media/MediaAsset.java 0.00% 13 Missing ⚠️
src/main/java/org/ecocean/servlet/IAGateway.java 0.00% 5 Missing ⚠️
.../main/java/org/ecocean/shepherd/core/Shepherd.java 0.00% 4 Missing ⚠️
... and 2 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1328      +/-   ##
============================================
+ Coverage     11.43%   11.47%   +0.03%     
- Complexity     1266     1318      +52     
============================================
  Files           710      713       +3     
  Lines         72845    73477     +632     
  Branches      14014    14080      +66     
============================================
+ Hits           8331     8428      +97     
- Misses        63606    64103     +497     
- Partials        908      946      +38     
Flag Coverage Δ
backend 11.47% <9.78%> (+0.03%) ⬆️
frontend 11.47% <9.78%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vkirkl vkirkl requested a review from holmbergius December 12, 2025 01:16
@vkirkl vkirkl added this to the 10.9.5 milestone Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Groundwork for Vector and Embedding support in Wildbook

4 participants