Add offline speech-to-text transcription using faster-whisper#20
Add offline speech-to-text transcription using faster-whisper#20
Conversation
Adds offline STT for media without existing subtitles. Transcription runs as a Huey background task after download completes, then chains to summary generation. The model is loaded per-job and freed after, so memory is not held between runs. Enabled by setting STASHCAST_STT_MODEL (e.g. 'base', 'large-v3'). Supports language auto-detection or explicit language via STASHCAST_STT_LANGUAGE. Device/compute auto-detection picks GPU when available, falls back to CPU with int8 quantization. https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB
Service layer tests (19): dataclass, VTT timestamp formatting, device/compute auto-detection, VTT writing, transcribe function with mocked WhisperModel (success, auto-language, error handling, memory cleanup, directory creation). Task layer tests (10): settings existence, skip when disabled, skip when subtitles exist, skip when no content, nonexistent GUID, success updates subtitle_path, chains to summary generation, skips summary when zero sentences, failure keeps item READY, timing logged. https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB
|
The preview deployment for jonocodes/stashcast:main-cco8kcsg4swogwsksw0kow0s is ready. 🟢 Open caddy | Open Build Logs | Open Application Logs Last updated at: 2026-02-13 17:40:16 CET |
|
The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢 Open caddy | Open Build Logs | Open Application Logs Last updated at: 2026-02-18 21:34:07 CET |
|
The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢 Open caddy | Open Build Logs | Open Application Logs Last updated at: 2026-02-18 21:40:15 CET |
|
The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢 Open caddy | Open Build Logs | Open Application Logs Last updated at: 2026-02-18 22:02:22 CET |
|
The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢 Open caddy | Open Build Logs | Open Application Logs Last updated at: 2026-02-18 23:03:15 CET |
|
The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢 Open caddy | Open Build Logs | Open Application Logs Last updated at: 2026-02-19 05:23:43 CET |
Summary
Adds optional offline speech-to-text (STT) transcription capability to automatically generate VTT subtitles for media items that don't already have them. Uses the faster-whisper library to run Whisper models locally without external API calls.
Key Changes
New transcription service (
media/service/transcribe.py):transcribe()function that loads a Whisper model, transcribes audio/video to VTT format, and explicitly frees model memory after use_pick_device_and_compute()for auto-detecting GPU availability and optimal compute type_write_vtt()to format transcription segments as valid WebVTT files_format_timestamp()for VTT-compliant timestamp formatting (HH:MM:SS.mmm)TranscriptionResultdataclass to return transcription metadata (language, duration, output path)New Huey task (
media/tasks.py):transcribe_media()task that runs after media download if subtitles are missing and STT is enabledIntegration with media processing:
process_media()to enqueue transcription when subtitles are absent andSTASHCAST_STT_MODELis configuredprocess_media_batch()to apply the same logic for batch operationsManagement command (
media/management/commands/transcribe.py):Configuration (
stashcast/settings.py):STASHCAST_STT_MODEL: Model size selection (tiny, base, small, medium, large-v3); empty to disableSTASHCAST_STT_LANGUAGE: Language code or None for auto-detectionSTASHCAST_STT_DEVICE: Device selection (auto, cpu, cuda)STASHCAST_STT_COMPUTE_TYPE: Compute precision (auto, int8, float16, float32)Comprehensive test coverage (
media/test_service/test_transcribe.pyandmedia/tests/test_unit.py):Notable Implementation Details
https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB