Skip to content

Add offline speech-to-text transcription using faster-whisper#20

Open
jonocodes wants to merge 4 commits intomainfrom
claude/add-speech-to-text-e61Q7
Open

Add offline speech-to-text transcription using faster-whisper#20
jonocodes wants to merge 4 commits intomainfrom
claude/add-speech-to-text-e61Q7

Conversation

@jonocodes
Copy link
Owner

Summary

Adds optional offline speech-to-text (STT) transcription capability to automatically generate VTT subtitles for media items that don't already have them. Uses the faster-whisper library to run Whisper models locally without external API calls.

Key Changes

  • New transcription service (media/service/transcribe.py):

    • Core transcribe() function that loads a Whisper model, transcribes audio/video to VTT format, and explicitly frees model memory after use
    • _pick_device_and_compute() for auto-detecting GPU availability and optimal compute type
    • _write_vtt() to format transcription segments as valid WebVTT files
    • _format_timestamp() for VTT-compliant timestamp formatting (HH:MM:SS.mmm)
    • TranscriptionResult dataclass to return transcription metadata (language, duration, output path)
  • New Huey task (media/tasks.py):

    • transcribe_media() task that runs after media download if subtitles are missing and STT is enabled
    • Chains to summary generation if configured
    • Logs timing information for cost tracking
    • Gracefully handles errors without failing the media item
  • Integration with media processing:

    • Modified process_media() to enqueue transcription when subtitles are absent and STASHCAST_STT_MODEL is configured
    • Modified process_media_batch() to apply the same logic for batch operations
  • Management command (media/management/commands/transcribe.py):

    • Standalone CLI tool to manually transcribe files with configurable model, language, device, and compute type
  • Configuration (stashcast/settings.py):

    • STASHCAST_STT_MODEL: Model size selection (tiny, base, small, medium, large-v3); empty to disable
    • STASHCAST_STT_LANGUAGE: Language code or None for auto-detection
    • STASHCAST_STT_DEVICE: Device selection (auto, cpu, cuda)
    • STASHCAST_STT_COMPUTE_TYPE: Compute precision (auto, int8, float16, float32)
  • Comprehensive test coverage (media/test_service/test_transcribe.py and media/tests/test_unit.py):

    • Unit tests for timestamp formatting, device/compute auto-detection, VTT writing
    • Integration tests for the full transcription pipeline
    • Task tests covering skip conditions, success paths, error handling, and chaining to summary generation

Notable Implementation Details

  • Memory management: Model is explicitly deleted and garbage collected after each transcription, with CUDA cache clearing if available. This prevents memory bloat when processing multiple items.
  • Graceful degradation: If transcription fails, the media item remains in READY status without subtitles rather than failing the entire item.
  • Flexible language handling: Supports explicit language specification or auto-detection via Whisper's built-in language detection.
  • Device auto-detection: Automatically selects CUDA with float16 if GPU is available, otherwise falls back to CPU with int8 quantization.
  • Logging integration: All operations log to the media item's log file for operator visibility into transcription cost and performance.

https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB

Adds offline STT for media without existing subtitles. Transcription
runs as a Huey background task after download completes, then chains
to summary generation. The model is loaded per-job and freed after,
so memory is not held between runs.

Enabled by setting STASHCAST_STT_MODEL (e.g. 'base', 'large-v3').
Supports language auto-detection or explicit language via
STASHCAST_STT_LANGUAGE. Device/compute auto-detection picks GPU
when available, falls back to CPU with int8 quantization.

https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB
Service layer tests (19): dataclass, VTT timestamp formatting,
device/compute auto-detection, VTT writing, transcribe function
with mocked WhisperModel (success, auto-language, error handling,
memory cleanup, directory creation).

Task layer tests (10): settings existence, skip when disabled,
skip when subtitles exist, skip when no content, nonexistent GUID,
success updates subtitle_path, chains to summary generation,
skips summary when zero sentences, failure keeps item READY,
timing logged.

https://claude.ai/code/session_01KgoueHATfKr7RgnXULcPcB
@coolify-dgtis
Copy link

coolify-dgtis bot commented Feb 13, 2026

The preview deployment for jonocodes/stashcast:main-cco8kcsg4swogwsksw0kow0s is ready. 🟢

Open caddy | Open Build Logs | Open Application Logs

Last updated at: 2026-02-13 17:40:16 CET

@coolify-dgtis
Copy link

coolify-dgtis bot commented Feb 18, 2026

The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢

Open caddy | Open Build Logs | Open Application Logs

Last updated at: 2026-02-18 21:34:07 CET

@coolify-dgtis
Copy link

coolify-dgtis bot commented Feb 18, 2026

The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢

Open caddy | Open Build Logs | Open Application Logs

Last updated at: 2026-02-18 21:40:15 CET

@coolify-dgtis
Copy link

coolify-dgtis bot commented Feb 18, 2026

The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢

Open caddy | Open Build Logs | Open Application Logs

Last updated at: 2026-02-18 22:02:22 CET

@coolify-dgtis
Copy link

coolify-dgtis bot commented Feb 18, 2026

The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢

Open caddy | Open Build Logs | Open Application Logs

Last updated at: 2026-02-18 23:03:15 CET

@coolify-dgtis
Copy link

coolify-dgtis bot commented Feb 19, 2026

The preview deployment for jonocodes/stashcast:main-bsok8ook4c4kggco40wkcg48 is ready. 🟢

Open caddy | Open Build Logs | Open Application Logs

Last updated at: 2026-02-19 05:23:43 CET

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants