feat: add TwelveLabs Marengo embeddings and Pegasus video loader by mohit-twelvelabs · Pull Request #227 · llm-tools/embedJs

mohit-twelvelabs · 2026-06-25T21:02:34Z

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

Description

This PR adds a new opt-in package, @llm-tools/embedjs-twelvelabs, that brings TwelveLabs video understanding to embedJs:

MarengoEmbeddings — a drop-in embedding model backed by the Marengo (marengo3.0) multimodal model. Marengo embeds text, image, audio and video into a single shared 512-dimensional latent space, which is a strong fit for RAG over video and mixed-media libraries.
TwelveLabsVideoLoader — a data source that analyses a video with the Pegasus video understanding model (it watches the visuals, motion and audio) and loads the resulting natural-language description into the RAG pipeline. This lets you index video content directly from a public URL, with no separate transcription step.

It mirrors the existing provider conventions (e.g. embedjs-openai, embedjs-cohere, embedjs-loader-youtube): same package scaffolding, Nx project.json, tsconfig/eslint config, BaseEmbeddings / BaseLoader extension, and docs.

Why it helps this project: it's the first multimodal/video-native option in embedJs — users can now build RAG apps over video without leaving the framework. It's fully opt-in and non-breaking — a new workspace package with its own dependency; no existing defaults or behaviour change.

Fixes # (no issue)

Type of change

New feature (non-breaking change which adds functionality)
Documentation update

How Has This Been Tested?

Built the package with Nx (nx build embedjs-twelvelabs) and exercised it against the live TwelveLabs API with a real key:

MarengoEmbeddings.getDimensions() returns 512; embedQuery and embedDocuments return real 512-dim vectors.
TwelveLabsVideoLoader construction, uniqueId and the chunk generator wiring verified; full Pegasus analysis of a public URL works server-side but is slow, so the request wiring is verified and a full run is left to the reviewer.
Both classes throw a clear error when no API key is supplied (via constructor or TWELVELABS_API_KEY).

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new tsc or eslint warnings
I have checked my code and corrected any misspellings

Adds a new opt-in @llm-tools/embedjs-twelvelabs package providing: - MarengoEmbeddings: 512-dim multimodal embeddings via the Marengo model - TwelveLabsVideoLoader: a data source that analyses a video with the Pegasus model and loads the description into the RAG pipeline Includes documentation and navigation entries. No existing behaviour is changed.

sonarqubecloud · 2026-06-25T21:03:19Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add TwelveLabs Marengo embeddings and Pegasus video loader#227

feat: add TwelveLabs Marengo embeddings and Pegasus video loader#227
mohit-twelvelabs wants to merge 1 commit into
llm-tools:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

mohit-twelvelabs commented Jun 25, 2026

Uh oh!

sonarqubecloud Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mohit-twelvelabs commented Jun 25, 2026

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

sonarqubecloud Bot commented Jun 25, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant