PlanetRead · mishradev1 · May 4, 2026 · May 4, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,70 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Virtual environments
+venv/
+env/
+.env/
+.venv/
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Model cache
+models/cache/
+*.h5
+*.tflite
+
+# Media files (test inputs/outputs)
+*.mp4
+*.avi
+*.mkv
+*.mov
+*.wav
+*.mp3
+*.srt
+!tests/fixtures/*.srt
+
+# Logs
+*.log
+logs/
+
+# Jupyter
+.ipynb_checkpoints/
diff --git a/README.md b/README.md
@@ -0,0 +1,150 @@
+# Intelligent Closed Caption (CC) Suggestion Tool
+
+An AI-powered tool that intelligently identifies moments in a video where a Closed Caption (CC) annotation is genuinely necessary — such as when a non-speech audio event meaningfully affects the speakers or the scene — and suggests contextually relevant CC text, without over-captioning routine or low-impact sounds.
+
+## Architecture
+
+```
+┌─────────────┐    ┌──────────────────┐    ┌──────────────────────┐
+│  Video File  │───▶│ Audio Extractor  │───▶│ Sound Event Detector │
+│  (input)     │    │ (ffmpeg/moviepy) │    │ (YAMNet)             │
+└──────┬───────┘    └──────────────────┘    └──────────┬───────────┘
+       │                                               │
+       │            ┌──────────────────┐               │
+       └───────────▶│ Frame Extractor  │               │
+                    │ (OpenCV)         │               │
+                    └────────┬─────────┘               │
+                             │                         │
+                    ┌────────▼─────────┐               │
+                    │ Reaction Detector│               │
+                    │ (MediaPipe)      │               │
+                    └────────┬─────────┘               │
+                             │                         │
+                    ┌────────▼─────────────────────────▼┐
+                    │      CC Decision Engine            │
+                    │  Combines audio + visual signals   │
+                    └────────────────┬───────────────────┘
+                                     │
+                            ┌────────▼────────┐
+                            │  SRT Generator  │
+                            └────────┬────────┘
+                                     │
+                            ┌────────▼────────┐
+                            │  output.srt     │
+                            └─────────────────┘
+```
+
+## Features
+
+- **Sound Event Detection** — Automatically detects and classifies non-speech audio events (honking, explosions, laughter, music, alarms, applause, etc.) with confidence scores and timestamps using YAMNet.
+- **Speaker Reaction Detection** — Analyzes video frames at detected event timestamps using MediaPipe to identify visible reactions (head turns, startled body language, facial expressions).
+- **Intelligent CC Decisions** — Combines audio and visual signals to determine whether a CC annotation is truly warranted, avoiding over-captioning of ambient sounds.
+- **SRT Output** — Generates standard SRT subtitle files with properly formatted timestamps and descriptive CC labels like `[honking]`, `[crowd cheering]`, `[gunshot]`.
+
+## Prerequisites
+
+- **Python 3.9+**
+- **FFmpeg** — Must be installed and available on your system PATH
+  - Windows: `choco install ffmpeg` or download from [ffmpeg.org](https://ffmpeg.org/download.html)
+  - macOS: `brew install ffmpeg`
+  - Linux: `sudo apt install ffmpeg`
+
+## Installation
+
+1. **Clone the repository**
+   ```bash
+   git clone https://github.com/PlanetRead/Intelligent-cc-generation.git
+   cd Intelligent-cc-generation
+   ```
+
+2. **Create a virtual environment**
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # Linux/macOS
+   venv\Scripts\activate     # Windows
+   ```
+
+3. **Install dependencies**
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+4. **Install in development mode** (optional)
+   ```bash
+   pip install -e .
+   ```
+
+## 🎯 Usage
+
+### Extract audio from a video file
+```python
+from src.utils.audio_extractor import AudioExtractor
+
+extractor = AudioExtractor()
+audio_path = extractor.extract("input_video.mp4")
+print(f"Audio saved to: {audio_path}")
+```
+
+### Full pipeline (coming soon)
+```bash
+python -m src.cli --input video.mp4 --output captions.srt
+```
+
+## Running Tests
+
+```bash
+pytest tests/ -v
+```
+
+## Project Structure
+
+```
+Intelligent-cc-generation/
+├── src/
+│   ├── __init__.py
+│   ├── cli.py                     # CLI entry point
+│   ├── utils/
+│   │   ├── __init__.py
+│   │   └── audio_extractor.py     # Video → Audio extraction
+│   ├── detectors/
+│   │   ├── __init__.py
+│   │   ├── sound_event_detector.py  # YAMNet-based audio analysis
+│   │   └── reaction_detector.py     # MediaPipe-based visual analysis
+│   ├── models/
+│   │   ├── __init__.py
+│   │   ├── event.py               # SoundEvent dataclass
+│   │   ├── reaction.py            # ReactionEvent dataclass
+│   │   └── cc_suggestion.py       # CCSuggestion dataclass
+│   ├── engine/
+│   │   ├── __init__.py
+│   │   └── decision_engine.py     # CC decision combiner
+│   └── output/
+│       ├── __init__.py
+│       └── srt_generator.py       # SRT file writer
+├── config/
+│   └── settings.py                # Configuration defaults
+├── tests/
+│   ├── __init__.py
+│   ├── test_audio_extractor.py
+│   └── fixtures/
+├── requirements.txt
+├── setup.py
+├── .gitignore
+└── README.md
+```
+
+## Tech Stack
+
+| Component | Technology |
+|-----------|-----------|
+| Language | Python 3.9+ |
+| Audio Event Detection | [YAMNet](https://tfhub.dev/google/yamnet/1) (TensorFlow Hub) |
+| Frame Extraction | [OpenCV](https://opencv.org/) |
+| Pose & Expression Analysis | [MediaPipe](https://mediapipe.dev/) |
+| Audio Extraction | [FFmpeg](https://ffmpeg.org/) via moviepy |
+| Output Format | SRT (SubRip Subtitle) |
+
+
+## License
+
+This project is part of the [Planet Read](https://www.planetread.org/) initiative under the DMP 2026 program.
diff --git a/config/__init__.py b/config/__init__.py
@@ -0,0 +1 @@
+"""Configuration package."""
diff --git a/config/settings.py b/config/settings.py
@@ -0,0 +1,115 @@
+"""Configuration settings for the Intelligent CC Suggestion Tool."""
+
+import os
+
+
+# =============================================================================
+# Audio Extraction Settings
+# =============================================================================
+
+# Default audio sample rate for extracted audio (Hz)
+AUDIO_SAMPLE_RATE = 16000
+
+# Default audio format for extracted files
+AUDIO_FORMAT = "wav"
+
+# Default output directory for extracted audio files
+AUDIO_OUTPUT_DIR = os.path.join(os.getcwd(), "output", "audio")
+
+
+# =============================================================================
+# Sound Event Detection Settings
+# =============================================================================
+
+# Minimum confidence threshold for a sound event to be considered
+SOUND_CONFIDENCE_THRESHOLD = 0.3
+
+# Analysis window size in seconds for the sound event detector
+ANALYSIS_WINDOW_SIZE = 0.96  # YAMNet default patch size
+
+# Hop length between analysis windows in seconds
+ANALYSIS_HOP_LENGTH = 0.48
+
+# Non-speech event categories to detect (YAMNet class names)
+# Full list: https://github.com/tensorflow/models/blob/master/research/audioset/yamnet/yamnet_class_map.csv
+TARGET_SOUND_EVENTS = [
+    "Gunshot, gunfire",
+    "Explosion",
+    "Glass",
+    "Breaking",
+    "Siren",
+    "Car alarm",
+    "Vehicle horn, car horn, honking",
+    "Screaming",
+    "Crying, sobbing",
+    "Laughter",
+    "Applause",
+    "Cheering",
+    "Crowd",
+    "Dog",
+    "Thunder",
+    "Alarm",
+    "Bell",
+    "Door",
+    "Knock",
+    "Telephone",
+    "Music",
+    "Singing",
+    "Drum",
+    "Fire",
+    "Water",
+    "Rain",
+    "Wind",
+]
+
+
+# =============================================================================
+# Reaction Detection Settings
+# =============================================================================
+
+# Number of frames to extract around each event timestamp
+REACTION_FRAME_COUNT = 10
+
+# Time window (seconds) before and after event to look for reactions
+REACTION_TIME_WINDOW = 1.5
+
+# Minimum confidence for a reaction to be considered significant
+REACTION_CONFIDENCE_THRESHOLD = 0.4
+
+# Head turn angle threshold (degrees) to consider as a reaction
+HEAD_TURN_THRESHOLD = 15.0
+
+# Pose change threshold (normalized) for startled body language
+POSE_CHANGE_THRESHOLD = 0.1
+
+
+# =============================================================================
+# CC Decision Engine Settings
+# =============================================================================
+
+# Weight for audio event confidence in the final decision
+AUDIO_WEIGHT = 0.6
+
+# Weight for visual reaction confidence in the final decision
+VISUAL_WEIGHT = 0.4
+
+# Combined confidence threshold for generating a CC annotation
+CC_DECISION_THRESHOLD = 0.5
+
+# Minimum duration (seconds) between consecutive CC annotations
+# to avoid overwhelming the viewer
+MIN_CC_GAP = 2.0
+
+
+# =============================================================================
+# Output Settings
+# =============================================================================
+
+# Default output format
+OUTPUT_FORMAT = "srt"
+
+# Default output directory for generated subtitle files
+OUTPUT_DIR = os.path.join(os.getcwd(), "output")
+
+# Default CC display duration (seconds) if not determined by event duration
+DEFAULT_CC_DURATION = 2.0
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,20 @@
+# Core dependencies
+moviepy>=1.0.3
+numpy>=1.24.0
+
+# Audio/Video processing
+librosa>=0.10.0
+soundfile>=0.12.0
+opencv-python>=4.8.0
+
+# ML Models
+tensorflow>=2.13.0
+tensorflow-hub>=0.14.0
+mediapipe>=0.10.0
+
+# Testing
+pytest>=7.4.0
+pytest-cov>=4.1.0
+
+# Utilities
+pydub>=0.25.1