Local audio/video transcription using Whisper. Produces TXT, SRT, VTT, or JSON and runs fully offline. Use the CLI for scripting or the web GUI for a point-and-click experience.
- Python 3.10+
- ffmpeg on PATH
pip install -r requirements.txtffmpeg:
- Windows (winget):
winget install --id Gyan.FFmpeg -e - macOS (brew):
brew install ffmpeg - Linux (apt):
sudo apt-get install ffmpeg
python transcribe.py INPUT [--model MODEL] [--language LANG]
[--format {txt,srt,vtt,json}] [--output PATH]Launch the web interface:
python app.pyOpens a browser tab at http://localhost:7860.
- Upload an audio or video file
- Pick a quality level — Fast (
tiny), Balanced (small), or Best (large-v3) - Click Transcribe
Advanced options (output format, language) are available under the expandable accordion. The transcript appears in a preview pane with a copy button, and a download link is provided for the output file.
--model: Whisper model name. Examples:tiny,base,small,medium,large-v3. Default:small.--language: Language code likeen,es. Omit to auto-detect.--format: Output format:txt,srt,vtt,json. Default:txt.--output: Output path. Defaults to input name with chosen extension.
# Basic transcription to TXT
python transcribe.py sample.mp3
# Force language and output SRT
python transcribe.py sample.mp4 --language en --format srt
# Use a larger model and custom output file
python transcribe.py sample.wav --model medium --output out/transcript.txt- Both the CLI and GUI prefer
faster-whisperand fall back toopenai-whisperif needed. - CLI output is written next to the input unless
--outputis provided. GUI output is available via the download link. - Recommend using the tiny/Fast model for larger audio files or low CPU availability.