Skip to content

Janadasroor/viora-ai

Repository files navigation

Multi-Modal Embedding Service

Python service for generating multi-modal embeddings for social media content.

Features

  • CLIP Visual Embeddings (512-dim)
  • Text Embeddings (768-dim) from captions and OCR
  • OCR Extraction using EasyOCR
  • NSFW Classification and Content Type Detection
  • Video Support with frame extraction
  • Batch Processing

Quick Start

Local Development

pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000

API Endpoints

Health Check

GET /health

Multi-Modal Extraction

POST /extract-multimodal

  • files: Image/Video files
  • caption: Optional text

Legacy Endpoints

  • POST /extract-features (Images)
  • POST /extract-features-video (Videos)
  • POST /extract-features-text (Text only)
  • POST /extract-ocr (OCR only)
  • POST /classify-nsfw (NSFW only)

Configuration

Environment variables:

  • QDRANT_URL: Qdrant connection string
  • MEDIA_STORAGE_PATH: Path to media files
  • USE_GPU: Enable GPU acceleration (default: false)
  • PORT: Service port (default: 8000)

Models

  • CLIP: openai/clip-vit-base-patch32
  • Text: sentence-transformers/all-mpnet-base-v2
  • NSFW: JanadaSroor/vit-nsfw-classifier

Performance (CPU)

  • Image: ~500ms
  • OCR: ~1-2s
  • Text: ~50ms
  • Video: ~5-10s (10 frames)

License

Apache License 2.0.

About

Python-based multi-modal embedding service for social media content, providing image, video, and text embeddings with OCR, NSFW detection, and batch processing support for AI-driven feeds and recommendations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors