Text/Image Embeddings | Vector Search | Relevance Ranking
Supercharge your retrieval system with a deep understanding of natural language and imagery.
This is a high-performance semantic search module, specifically designed for the News Video Retrieval task. The project provides a powerful backend API, built on FastAPI, allowing users to search through a massive video archive (~300 hours) using complex natural language queries or sample images.
This project was built to address two primary challenges in the field of video retrieval:
- Handling Massive Data Volumes: With up to 300 hours of video content, the number of frames to be analyzed and indexed can run into the millions, demanding a scalable and computationally efficient solution.
- Understanding Complex and Nuanced Queries: Users often describe events using abstract language, involving multiple objects and actions. Traditional keyword-based search systems fail to grasp the context and intent behind these queries.
To overcome these challenges, we implemented an intelligent and efficient processing pipeline:
-
Semantic Vector Encoding: We leverage state-of-the-art Transformer models (such as CLIP variants) to convert both queries (text or image) and database frames into numerical vectors within a high-dimensional space.
-
Intelligent Keyframe Filtering: To manage the data volume, we apply a unique pre-processing step. By combining vector embeddings with a Sequential Filter algorithm, we successfully reduced the number of keyframes for indexing from 1.2 million down to 700,000, eliminating redundant frames and retaining only those with the highest semantic value.
-
Similarity Search: In this vector space, semantically similar items (e.g., a photo of a dog and the text "a dog playing in the park") are positioned closely together. The system performs searches based on geometric proximity within this space.
-
High-Speed Retrieval: To ensure query speed, the system utilizes FAISS (Facebook AI Similarity Search). Notably, we integrated Faiss-cuVS, an optimized new version of Faiss, delivering search speeds up to 10x faster than the standard
faiss-gpu. This allows for near-instantaneous result retrieval.
The results below were benchmarked after applying the Intelligent Keyframe Filtering (Sequential Filter), which reduced the number of indexed frames to ~266,000. The evaluation was performed on a set of manually crafted queries.
| Model Name | Time Response (s) | Recall@1 | Recall@5 | Recall@10 | Recall@20 | Recall@50 |
|---|---|---|---|---|---|---|
| align | 0.704 | 14.47% | 37.95% | 44.27% | 49.59% | 51.35% |
| coca-clip | 0.943 | 4.39% | 30.35% | 43.25% | 52.37% | 53.42% |
| apple-clip-384 | 0.943 | 15.53% | 47.46% | 50.88% | 54.74% | 57.89% |
| beit3 | 0.997 | 5.26% | 35.26% | 47.11% | 49.47% | 53.33% |
Observation: The apple-clip-384 model demonstrates superior performance across most Recall metrics, especially at Recall@5 and Recall@10, indicating its strong ability to rank the most relevant results at the top.
Prerequisites: conda and pip.
Install in 2 steps:
# 1) Create conda environment from environment.yml
conda env create --file=environment.yml
# 2) Install project wheel
pip install ./lib/dist/uniml-0.1-py3-none-any.whlBelow is a quick demo screenshot and a short example showing image-based retrieval.
Below is a concise, copy-paste friendly project tree:
📦semantic
┣ 📂data
┣ 📂lib
┣ 📂notebook
┃ ┣ 📜algo.ipynb
┃ ┣ 📜frames_info.txt
┃ ┣ ...
┣ 📂src
┃ ┣ 📂app
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜api.py
┃ ┃ ┗ 📜schema.py
┃ ┣ 📂common
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜path_loader.py
┃ ┃ ┣ 📜registry.py
┃ ┃ ┗ 📜utils.py
┃ ┣ 📂indexer
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜base.py
┃ ┃ ┣ 📜faiss_gpu_index_flat_l2.py
┃ ┃ ┗ 📜rmm_manager.py
┃ ┣ 📂searcher
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜base.py
┃ ┃ ┣ 📜fusion_sematic_searcher.py
┃ ┃ ┗ 📜single_semantic_searcher.py
┃ ┣ 📂semantic_extractor
┃ ┃ ┣ 📜__init__.py
┃ ┃ ┣ 📜align_extractor.py
┃ ┃ ┣ 📜apple_clip_384_extractor.py
┃ ┃ ┣ ...
┃ ┣ 📜__init__.py
┃ ┣ 📜__main__.py
┃ ┣ 📜concat_npy.py
┃ ┣ 📜create_mapping_file.py
┃ ┣ 📜demo.py
┃ ┣ 📜evaluating.py
┃ ┣ 📜indexing.py
┃ ┗ 📜remove_duplicate_frames.py
┣ 📜.env
┣ 📜.gitignore
┣ 📜README.md
┣ 📜environment.yml
┣ 📜evaluating.sh
┣ 📜evaluation.txt
┣ 📜evaluation_after_rm_duplicate_frames.txt
┣ 📜evaluation_after_rm_outlier.txt
┣ 📜indexing.sh
┣ 📜log.txt
┣ 📜rmm_log.txt
┗ 📜run.sh
Short notes:
- Place model wrappers in
src/semantic_extractorfor a common interface. - Keep FAISS index code under
src/indexerand high-level logic insrc/indexing.py. - Store mapping JSONs and indices under
data/for reproducibility.

