A high-performance batch transcription service using OpenAI's Whisper models with flexible configuration, optimized for GPU acceleration and designed for processing thousands of audio files efficiently.
# 1. Clone and setup
git clone https://github.com/Aeraxon/whisper-batch-api.git
cd whisper-batch-api
python3.10 -m venv venv && source venv/bin/activate
# 2. Install (choose your GPU)
# Standard GPUs (RTX 40XX and older):
pip install torch==2.6.0+cu118 torchaudio==2.6.0+cu118 --index-url https://download.pytorch.org/whl/cu118
# Blackwell GPUs (RTX 50XX):
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
# 3. Install dependencies and start
pip install -r requirements.txt
# Apply container restart fix (important for LXC/Docker)
echo 'VENV_SITE_PACKAGES="$VIRTUAL_ENV/lib/python3.10/site-packages"' >> venv/bin/activate
echo 'export LD_LIBRARY_PATH="${VENV_SITE_PACKAGES}/nvidia/cublas/lib:${VENV_SITE_PACKAGES}/nvidia/cudnn/lib:${VENV_SITE_PACKAGES}/ctranslate2.libs:/usr/local/cuda/lib64"' >> venv/bin/activate
python whisper_api.py
# 4. Test transcription
curl -X POST "http://localhost:8000/transcribe/batch" -F "files=@your_audio.wav"- π High Performance: Support for all Whisper models with automatic GPU optimization
- π Smart Language Detection: Automatic recognition with manual override option
- π¦ Batch Processing: Process up to 100 files per batch
- π Performance Metrics: Automatic tracking with CSV export
- π§ Flexible Configuration: Central YAML configuration for all settings
- π Lazy Loading: Models loaded on-demand, auto-unloaded after inactivity
- π Automatic Archiving: All transcripts saved with metadata
- π₯οΈ GPU Monitoring: VRAM tracking and utilization metrics
- GPU: NVIDIA GPU with 8GB+ VRAM (tested on RTX A2000 12GB, RTX 5080)
- RAM: Minimum 16GB system RAM
- Storage: SSD recommended
- CUDA: Compatible GPU with CUDA Compute Capability 6.0+
The system automatically tracks comprehensive performance metrics for every batch and saves them to ./output/metrics.csv. View real-time metrics:
# Real-time metrics via API
curl http://localhost:8000/metrics
# Check saved metrics
cat output/metrics.csvβ‘ Blackwell GPU Performance: RTX 50XX users get automatic performance optimization with mixed precision (int8_float16) for optimal speed.
large-v3- Latest and most accuratelarge-v3-turbo- Faster variant of v3large-v3-german- Optimized for German languagelarge-v2- Proven and stable
medium,small,base,tiny- Various speed/quality trade-offs
# Update system
sudo apt update && sudo apt upgrade -y
# Install development tools
sudo apt install -y build-essential cmake git wget curl
# Install Python 3.10 and venv
sudo apt install -y python3.10 python3.10-venv python3.10-dev python3.10-distutils
# Install audio system dependencies
sudo apt install -y ffmpegEnsure your NVIDIA driver and CUDA toolkit are properly installed:
# Check NVIDIA driver
nvidia-smi
# Check CUDA toolkit
nvcc --versionBoth commands should work without errors before proceeding.
# Clone the repository
git clone https://github.com/Aeraxon/whisper-batch-api.git
cd whisper-batch-api
# Create Python virtual environment
python3.10 -m venv venv
source venv/bin/activate
# Upgrade pip
pip install --upgrade pip
# Install PyTorch with CUDA 11.8 support FIRST
pip install torch==2.6.0+cu118 torchaudio==2.6.0+cu118 --index-url https://download.pytorch.org/whl/cu118
# Install remaining dependencies
pip install -r requirements.txtFor RTX 5090, 5080, 5070 and other Blackwell architecture GPUs:
# Clone the repository
git clone https://github.com/Aeraxon/whisper-batch-api.git
cd whisper-batch-api
# Create Python virtual environment
python3.10 -m venv venv
source venv/bin/activate
# Upgrade pip
pip install --upgrade pip
# Install PyTorch with CUDA 12.8 support (native for Blackwell GPUs)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu128
# Install remaining dependencies
pip install -r requirements.txtThis step is essential for the system to work properly:
# Add CUDA libraries to LD_LIBRARY_PATH permanently
echo '' >> ~/.bashrc
echo '# Whisper CUDA Libraries' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$(python -c "import site; print(site.getsitepackages()[0])" 2>/dev/null)/nvidia/cublas/lib:$(python -c "import site; print(site.getsitepackages()[0])" 2>/dev/null)/nvidia/cudnn/lib:$(python -c "import site; print(site.getsitepackages()[0])" 2>/dev/null)/ctranslate2.libs:/usr/local/cuda/lib64:$LD_LIBRARY_PATH"' >> ~/.bashrc
# Reload bash configuration
source ~/.bashrcFor CUDA 12.8+ with Blackwell GPUs:
# Add CUDA 12.8+ libraries to LD_LIBRARY_PATH
echo '' >> ~/.bashrc
echo '# Whisper CUDA Libraries for Blackwell GPUs' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$(python -c "import site; print(site.getsitepackages()[0])" 2>/dev/null)/nvidia/cublas/lib:$(python -c "import site; print(site.getsitepackages()[0])" 2>/dev/null)/nvidia/cudnn/lib:$(python -c "import site; print(site.getsitepackages()[0])" 2>/dev/null)/ctranslate2.libs:/usr/local/cuda-12.8/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH"' >> ~/.bashrc
# Reload bash configuration
source ~/.bashrcImportant for LXC/Docker users: After container restarts, you may encounter libcudnn_ops.so errors. Here's the permanent fix:
# Add CUDA paths directly to venv activation (recommended)
echo '' >> venv/bin/activate
echo '# Whisper CUDA Libraries (Container Restart Fix)' >> venv/bin/activate
echo 'VENV_SITE_PACKAGES="$VIRTUAL_ENV/lib/python3.10/site-packages"' >> venv/bin/activate
echo 'export LD_LIBRARY_PATH="${VENV_SITE_PACKAGES}/nvidia/cublas/lib:${VENV_SITE_PACKAGES}/nvidia/cudnn/lib:${VENV_SITE_PACKAGES}/ctranslate2.libs:/usr/local/cuda/lib64"' >> venv/bin/activateTest the fix:
deactivate
source venv/bin/activate
python -c "from faster_whisper import WhisperModel; print('β
CUDA Fix Working!')"Create convenient start script:
# Create start script in your home directory
cat > ~/start_whisper.sh << 'EOF'
#!/bin/bash
echo "π Starting Whisper Batch API..."
cd /path/to/your/whisper-batch-api
source venv/bin/activate
echo "β
CUDA paths set automatically"
python whisper_api.py
EOF
chmod +x ~/start_whisper.shNote: Replace
/path/to/your/whisper-batch-apiwith your actual project path.
# Test CUDA availability
python -c "
import torch
from faster_whisper import WhisperModel
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"N/A\"}')
print(f'CUDA version: {torch.version.cuda}')
print(f'GPU Compute capability: {torch.cuda.get_device_capability(0) if torch.cuda.is_available() else \"N/A\"}')
print('Faster-Whisper import successful!')
"For Blackwell GPUs, you should see:
CUDA available: True
GPU: NVIDIA GeForce RTX 5080 # or your specific Blackwell GPU
CUDA version: 12.8 # Native PyTorch CUDA 12.8 support
GPU Compute capability: (9, 0) # Blackwell architecture
Faster-Whisper import successful!
# Create directory for transcripts and metrics
mkdir -p outputThe system uses a central YAML configuration file (whisper_config.yaml) for all settings. Key options:
- Model:
default_model: "large-v2"(switch between models) - Language:
auto_detect: true(automatic language detection) - Output:
directory: "./output"(where transcripts are saved) - Batch Size:
batch_size: 4(adjust for your GPU)
See whisper_config.yaml for all configuration options.
fastapi==0.115.5
uvicorn==0.32.1
python-multipart==0.0.12
aiofiles==24.1.0
faster-whisper==1.1.0
pynvml==11.5.3
pyyaml==6.0.2
soundfile==0.12.1
Quick Start (with start script):
# If you created the start script during installation:
~/start_whisper.shManual Start:
# Always activate venv first (sets CUDA paths automatically)
source venv/bin/activate
python whisper_api.pyThe API will be available at http://localhost:8000
curl -X POST "http://localhost:8000/transcribe/batch" \
-F "files=@audio1.wav" \
-F "files=@audio2.mp3"curl -X POST "http://localhost:8000/transcribe/batch" \
-F "files=@english_audio.wav" \
-F "language=en"curl http://localhost:8000/healthcurl http://localhost:8000/metricsWAV, MP3, M4A, OGG, FLAC, AAC
Automatically saved to ./output/transcript_XXXXXX.txt with metadata:
# Whisper Transcription #000001
# Original file: audio.wav
# Audio length: 6.25 minutes
# Processing time: 54.30s
# Real-time factor: 6.9x
# Model: large-v2
# Language mode: auto-detect
# Detected language: de (confidence: 0.98)
# =====================================
[Transcript content here]
Saved to ./output/metrics.csv with processing times, throughput, GPU utilization, and capacity planning data.
Problem: After container restart: Unable to load libcudnn_ops.so.9
Solution:
# Check if CUDA libraries are found
source venv/bin/activate
echo $LD_LIBRARY_PATH
ls -la $(python -c "import site; print(site.getsitepackages()[0])")/nvidia/cudnn/lib/
# If empty or missing, apply the Container Restart Fix from installation sectionVerify fix works:
python -c "
import torch
from faster_whisper import WhisperModel
print(f'CUDA: {torch.cuda.is_available()}')
print(f'GPU: {torch.cuda.get_device_name(0)}')
model = WhisperModel('tiny', device='cuda')
print('β
All working!')
"nvidia-smi --query-gpu=name --format=csv,noheader,nounits
curl http://localhost:8000/health# Validate configuration
python -c "from config_manager import WhisperConfigManager; WhisperConfigManager().validate_config()"# Verify LD_LIBRARY_PATH after venv activation
source venv/bin/activate
echo $LD_LIBRARY_PATH
ldconfig -p | grep cudaIf your RTX 5080/5090 is slower than expected, the system should automatically use optimized mixed precision (int8_float16). Check the startup logs for "Blackwell optimization" messages.
- Check GPU utilization:
nvidia-smiduring transcription - Adjust batch size: Increase in config for more VRAM
- Use smaller models:
mediumorsmallfor faster processing
USE AT YOUR OWN RISK
This software is provided "as is" without warranty of any kind. The author(s) assume NO LIABILITY for any damages, losses, or issues that may occur from using this software, including but not limited to: data loss, hardware damage, system failures, network issues, natural disasters, cosmic events, or any other consequences whatsoever.
By using this software, you acknowledge that you use it entirely at your own risk and take full responsibility for any outcomes.
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
- β Personal use, education, research, open source projects
For commercial use, open an issue in this repository or contact via GitHub.
- OpenAI for the Whisper model
- SYSTRAN for the Faster-Whisper implementation
- FastAPI team for the excellent web framework