Windows-first, double-click transcription app using WhisperX for ASR + alignment and Pyannote (via WhisperX diarize utilities) for speaker diarization.
- Simple Tkinter GUI with folder picker, Start/Stop, progress, and live log
- WhisperX transcription with alignment and diarization
- Portable offline cache in the app folder
- Batch mode with skip logic, done/failed moves, and summary output
- Install Python 3.10+
- Open PowerShell and create a virtual environment:
py -3.10 -m venv venv
.\venv\Scripts\Activate.ps1- Install dependencies:
pip install -U whisperx python-dotenv omegaconf pyannote.audio- Install FFmpeg:
winget install ffmpeg- Create and activate a venv:
python3 -m venv venv
source venv/bin/activate- Install dependencies:
pip install -U whisperx python-dotenv omegaconf pyannote.audio- Install FFmpeg:
sudo apt-get update
sudo apt-get install -y ffmpeg- Install Tkinter (GUI dependency):
sudo apt-get update
sudo apt-get install -y python3-tkCreate a token at:
https://huggingface.co/settings/tokens
Create a .env file in the app folder:
HF_TOKEN=your_token_here
The token is only used to authenticate and download the diarization model weights from HuggingFace. It does not incur usage-based charges by itself. Any compute cost comes from running the models on your own machine. If you use private or paid HuggingFace model repositories, you may still need an appropriate subscription or access plan.
Transcripto uses a portable cache in the app folder:
._hf_cache._torch_cache
On first run, it downloads and stores all required models. After that, it can run fully offline with the cached models.
To force a clean offline-ready cache, open the app and click Prepare Offline Models.
If the diarization models are not present and HF_TOKEN is missing, you will see a clear error in the app. The token is only required to download gated models the first time.
Double-click transcripto_gui.py (or run it from a terminal):
python transcripto_gui.pyBy default, Transcripto processes all supported files in the selected folder.
If you want to process a subset, switch the selection mode to Selected files and use the checkbox list.
You can populate the list in two ways:
- Load from folder (scans the current folder)
- Pick files... (multi-select individual files)
If you select files from multiple folders, summary.txt is written to the app folder.
For each input file example.mp4, Transcripto writes:
example.txtwith timestamped transcript linesexample.jsonwith structured segments
After each batch run it also writes:
summary.txtwith NotebookLM-ready concatenated transcripts
If you see an error mentioning “Unsupported global” or “safe globals”, the app logs a hint about extending the allowlist in transcripto_gui.py.
If diarization fails, confirm your token has access to the required models and that it is set in .env or pasted into the app.
If you see HuggingFace cache symlink warnings, you can ignore them or run the app with Developer Mode enabled in Windows.
- The app uses
TRANSFORMERS_NO_TORCHVISION=1to avoid torchvision import issues. - On GPU systems, PyTorch must be installed with CUDA support.
Install PyInstaller:
pip install -U pyinstallerBuild a Windows GUI EXE:
pyinstaller --onefile --windowed --name Transcripto transcripto_gui.pyThe EXE will be created in dist\\Transcripto.exe.
When the EXE runs, it creates portable caches next to the EXE:
_hf_cache_torch_cache
If you want to bundle ffmpeg.exe, place it at bin\\ffmpeg.exe in the same folder as the EXE.
Note: If you build inside WSL, PyInstaller produces a Linux binary. To generate a Windows .exe, run the build on Windows (PowerShell or CMD) using a Windows Python environment.
👉 ⚡Created by a neurodivergent mind — building tools that respect different brains. 🧠