The bash script does:
- Covert audio to 16 kHz mono WAV
- Runs whisper.cpp to generate plain text scripts
- Runs Pyannote-Whisper to add speaker labels to each segment
You end up with, for every audio/<name>.WAV:
audio/<name>.txt
audio/<name>_diarized.txt
- Clone the repo
#!/bin/bash
git clone https://github.com/your-org/whisper-transcription.git
cd whisper-transcription
- Build the whisper.cpp
#!/bin/bash
cd whisper.cpp
make
./models/download-ggml-model.sh large-v1
cd ..
- Install the python dependencies
#!/bin/bash
python3 -m venv venv
source venv/bin/activate
pip install pywhispercpp pyannote-audio pyannote-whisper
- Add the audio files into audio directory (wav/WAV files through git cp)
- Run the pipeline
#!/bin/bash
chmod +x run_pipeline.sh
./run_pipeline.sh en
Now under audio directory each WAV file should have a corresponding txt file