Skip to content

tamchamchi/semantic-search

Repository files navigation

✨ Semantic Search Module ✨

Text/Image Embeddings | Vector Search | Relevance Ranking
Supercharge your retrieval system with a deep understanding of natural language and imagery.


📖 Description

This is a high-performance semantic search module, specifically designed for the News Video Retrieval task. The project provides a powerful backend API, built on FastAPI, allowing users to search through a massive video archive (~300 hours) using complex natural language queries or sample images.

The Challenge

This project was built to address two primary challenges in the field of video retrieval:

  1. Handling Massive Data Volumes: With up to 300 hours of video content, the number of frames to be analyzed and indexed can run into the millions, demanding a scalable and computationally efficient solution.
  2. Understanding Complex and Nuanced Queries: Users often describe events using abstract language, involving multiple objects and actions. Traditional keyword-based search systems fail to grasp the context and intent behind these queries.

Our Approach

To overcome these challenges, we implemented an intelligent and efficient processing pipeline:

  1. Semantic Vector Encoding: We leverage state-of-the-art Transformer models (such as CLIP variants) to convert both queries (text or image) and database frames into numerical vectors within a high-dimensional space.

  2. Intelligent Keyframe Filtering: To manage the data volume, we apply a unique pre-processing step. By combining vector embeddings with a Sequential Filter algorithm, we successfully reduced the number of keyframes for indexing from 1.2 million down to 700,000, eliminating redundant frames and retaining only those with the highest semantic value.

  3. Similarity Search: In this vector space, semantically similar items (e.g., a photo of a dog and the text "a dog playing in the park") are positioned closely together. The system performs searches based on geometric proximity within this space.

  4. High-Speed Retrieval: To ensure query speed, the system utilizes FAISS (Facebook AI Similarity Search). Notably, we integrated Faiss-cuVS, an optimized new version of Faiss, delivering search speeds up to 10x faster than the standard faiss-gpu. This allows for near-instantaneous result retrieval.

📊 Benchmarks

The results below were benchmarked after applying the Intelligent Keyframe Filtering (Sequential Filter), which reduced the number of indexed frames to ~266,000. The evaluation was performed on a set of manually crafted queries.

Model Name Time Response (s) Recall@1 Recall@5 Recall@10 Recall@20 Recall@50
align 0.704 14.47% 37.95% 44.27% 49.59% 51.35%
coca-clip 0.943 4.39% 30.35% 43.25% 52.37% 53.42%
apple-clip-384 0.943 15.53% 47.46% 50.88% 54.74% 57.89%
beit3 0.997 5.26% 35.26% 47.11% 49.47% 53.33%

Observation: The apple-clip-384 model demonstrates superior performance across most Recall metrics, especially at Recall@5 and Recall@10, indicating its strong ability to rank the most relevant results at the top.

📦 Installation

Prerequisites: conda and pip.

Install in 2 steps:

# 1) Create conda environment from environment.yml
conda env create --file=environment.yml

# 2) Install project wheel
pip install ./lib/dist/uniml-0.1-py3-none-any.whl

🖼️ Demo

Below is a quick demo screenshot and a short example showing image-based retrieval.

Demo screenshotDemo screenshot

📂 Project Structure

Below is a concise, copy-paste friendly project tree:

📦semantic
 ┣ 📂data
 ┣ 📂lib
 ┣ 📂notebook
 ┃ ┣ 📜algo.ipynb
 ┃ ┣ 📜frames_info.txt
 ┃ ┣ ...
 ┣ 📂src
 ┃ ┣ 📂app
 ┃ ┃ ┣ 📜__init__.py
 ┃ ┃ ┣ 📜api.py
 ┃ ┃ ┗ 📜schema.py
 ┃ ┣ 📂common
 ┃ ┃ ┣ 📜__init__.py
 ┃ ┃ ┣ 📜path_loader.py
 ┃ ┃ ┣ 📜registry.py
 ┃ ┃ ┗ 📜utils.py
 ┃ ┣ 📂indexer
 ┃ ┃ ┣ 📜__init__.py
 ┃ ┃ ┣ 📜base.py
 ┃ ┃ ┣ 📜faiss_gpu_index_flat_l2.py
 ┃ ┃ ┗ 📜rmm_manager.py
 ┃ ┣ 📂searcher
 ┃ ┃ ┣ 📜__init__.py
 ┃ ┃ ┣ 📜base.py
 ┃ ┃ ┣ 📜fusion_sematic_searcher.py
 ┃ ┃ ┗ 📜single_semantic_searcher.py
 ┃ ┣ 📂semantic_extractor
 ┃ ┃ ┣ 📜__init__.py
 ┃ ┃ ┣ 📜align_extractor.py
 ┃ ┃ ┣ 📜apple_clip_384_extractor.py
 ┃ ┃ ┣ ...
 ┃ ┣ 📜__init__.py
 ┃ ┣ 📜__main__.py
 ┃ ┣ 📜concat_npy.py
 ┃ ┣ 📜create_mapping_file.py
 ┃ ┣ 📜demo.py
 ┃ ┣ 📜evaluating.py
 ┃ ┣ 📜indexing.py
 ┃ ┗ 📜remove_duplicate_frames.py
 ┣ 📜.env
 ┣ 📜.gitignore
 ┣ 📜README.md
 ┣ 📜environment.yml
 ┣ 📜evaluating.sh
 ┣ 📜evaluation.txt
 ┣ 📜evaluation_after_rm_duplicate_frames.txt
 ┣ 📜evaluation_after_rm_outlier.txt
 ┣ 📜indexing.sh
 ┣ 📜log.txt
 ┣ 📜rmm_log.txt
 ┗ 📜run.sh

Short notes:

  • Place model wrappers in src/semantic_extractor for a common interface.
  • Keep FAISS index code under src/indexer and high-level logic in src/indexing.py.
  • Store mapping JSONs and indices under data/ for reproducibility.

About

Semantic video retrieval system using CLIP-based embeddings and FAISS vector search for fast text or image queries over large video archives.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors