Add offline speech-to-text transcription using faster-whisper by jonocodes · Pull Request #20 · jonocodes/stashcast

jonocodes · 2026-02-13T17:37:54Z

Summary

Adds optional offline speech-to-text (STT) transcription capability to automatically generate VTT subtitles for media items that don't already have them. Uses the faster-whisper library to run Whisper models locally without external API calls.

Key Changes

New transcription service (media/service/transcribe.py):
- Core transcribe() function that loads a Whisper model, transcribes audio/video to VTT format, and explicitly frees model memory after use
- _pick_device_and_compute() for auto-detecting GPU availability and optimal compute type
- _write_vtt() to format transcription segments as valid WebVTT files
- _format_timestamp() for VTT-compliant timestamp formatting (HH:MM:SS.mmm)
- TranscriptionResult dataclass to return transcription metadata (language, duration, output path)
New Huey task (media/tasks.py):
- transcribe_media() task that runs after media download if subtitles are missing and STT is enabled
- Chains to summary generation if configured
- Logs timing information for cost tracking
- Gracefully handles errors without failing the media item
Integration with media processing:
- Modified process_media() to enqueue transcription when subtitles are absent and STASHCAST_STT_MODEL is configured
- Modified process_media_batch() to apply the same logic for batch operations
Management command (media/management/commands/transcribe.py):
- Standalone CLI tool to manually transcribe files with configurable model, language, device, and compute type
Configuration (stashcast/settings.py):
- STASHCAST_STT_MODEL: Model size selection (tiny, base, small, medium, large-v3); empty to disable
- STASHCAST_STT_LANGUAGE: Language code or None for auto-detection
- STASHCAST_STT_DEVICE: Device selection (auto, cpu, cuda)
- STASHCAST_STT_COMPUTE_TYPE: Compute precision (auto, int8, float16, float32)
Comprehensive test coverage (media/test_service/test_transcribe.py and media/tests/test_unit.py):
- Unit tests for timestamp formatting, device/compute auto-detection, VTT writing
- Integration tests for the full transcription pipeline
- Task tests covering skip conditions, success paths, error handling, and chaining to summary generation

Notable Implementation Details

Memory management: Model is explicitly deleted and garbage collected after each transcription, with CUDA cache clearing if available. This prevents memory bloat when processing multiple items.
Graceful degradation: If transcription fails, the media item remains in READY status without subtitles rather than failing the entire item.
Flexible language handling: Supports explicit language specification or auto-detection via Whisper's built-in language detection.
Device auto-detection: Automatically selects CUDA with float16 if GPU is available, otherwise falls back to CPU with int8 quantization.
Logging integration: All operations log to the media item's log file for operator visibility into transcription cost and performance.

https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB

Adds offline STT for media without existing subtitles. Transcription runs as a Huey background task after download completes, then chains to summary generation. The model is loaded per-job and freed after, so memory is not held between runs. Enabled by setting STASHCAST_STT_MODEL (e.g. 'base', 'large-v3'). Supports language auto-detection or explicit language via STASHCAST_STT_LANGUAGE. Device/compute auto-detection picks GPU when available, falls back to CPU with int8 quantization. https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB

Service layer tests (19): dataclass, VTT timestamp formatting, device/compute auto-detection, VTT writing, transcribe function with mocked WhisperModel (success, auto-language, error handling, memory cleanup, directory creation). Task layer tests (10): settings existence, skip when disabled, skip when subtitles exist, skip when no content, nonexistent GUID, success updates subtitle_path, chains to summary generation, skips summary when zero sentences, failure keeps item READY, timing logged. https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB

https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB

coolify-dgtis · 2026-02-13T17:38:07Z

The preview deployment for jonocodes/stashcast:main-cco8kcsg4swogwsksw0kow0s is ready. 🟢

Open caddy | Open Build Logs | Open Application Logs

Last updated at: 2026-02-13 17:40:16 CET

coolify-dgtis · 2026-02-18T18:57:48Z