WhisperX API Server

A FastAPI server that exposes WhisperX as an OpenAI-compatible audio transcription API. Supports both a simple single-server mode and a horizontally scalable distributed mode backed by Kafka and S3.

Features

OpenAI-compatible — drop-in replacement for /v1/audio/transcriptions and /v1/audio/translations
Alignment & diarization — word-level timestamps and speaker labels out of the box
Multiple output formats — json, verbose_json, vtt_json, srt, vtt, aud, text
Distributed mode — offload GPU work to dedicated workers via Kafka + S3 (MinIO)
Pluggable backends — swap transcription, alignment, and diarization implementations per stage
API key auth — single key or a JSON key-map for multi-client setups

Quick Start

Standalone (single server)

# GPU (CUDA)
docker compose --profile cuda up

# CPU
docker compose --profile cpu up

The API is available at http://localhost:8000.

Distributed mode (Kafka + workers)

# Copy and edit credentials before first run
cp .env.example .env

# CUDA workers
docker compose -f compose-kafka.yaml --profile cuda up

# CPU workers
docker compose -f compose-kafka.yaml --profile cpu up

# Both worker types simultaneously
docker compose -f compose-kafka.yaml --profile cuda --profile cpu up

Workers process one job at a time per container. Scale horizontally by running multiple worker replicas.

Configuration

All settings are environment variables. Nested fields use __ as a delimiter (e.g. WHISPER__MODEL=large-v3).

All available settings are defined in config.py. Variables you'll most likely need to set:

Variable	Default	Description
`WHISPER__MODEL`	`large-v3`	Transcription model name
`WHISPER__COMPUTE_TYPE`	`default`	Quantization — `float16` for GPU, `float32` for CPU
`WHISPER__INFERENCE_DEVICE`	`auto`	`cpu`, `cuda`, or `auto`
`HF_TOKEN`	—	Hugging Face token (required for pyannote diarization)
`API_KEY`	—	Single API key for all requests
`API_KEYS_FILE`	—	Path to JSON file mapping key → client name
`MODE`	`direct`	`direct` or `kafka`

Additional variables for Kafka mode:

Variable	Default	Description
`KAFKA__BOOTSTRAP_SERVERS`	`localhost:9092`	Kafka broker address
`S3__ENDPOINT_URL`	`http://localhost:9000`	S3 / MinIO endpoint
`S3__BUCKET`	`whisperx-audio`	Bucket for audio uploads
`MINIO_ROOT_USER`	`minioadmin`	MinIO root user — change before deploying
`MINIO_ROOT_PASSWORD`	`minioadmin`	MinIO root password — change before deploying

API Reference

`POST /v1/audio/transcriptions`

Transcribe an audio file. Compatible with the OpenAI transcription API.

Form parameters

Parameter	Type	Default	Description
`file`	file	—	Audio file (required)
`model`	string	config default	Model name. `whisper-1` is aliased to the configured default.
`language`	string	config default	ISO-639-1 language code. Auto-detected if omitted.
`prompt`	string	—	Optional context/hotwords hint
`response_format`	string	`json`	`text`, `json`, `verbose_json`, `vtt_json`, `srt`, `vtt`, `aud`
`temperature`	float	`0.0`	Sampling temperature
`timestamp_granularities[]`	list	`["segment"]`	`segment`, `word`
`align`	bool	`true`	Enable word-level alignment (required for subtitle formats)
`diarize`	bool	`false`	Enable speaker diarization (requires `align=true`)
`speaker_embeddings`	bool	`false`	Include speaker embeddings in diarization output
`highlight_words`	bool	`false`	Highlight words in `vtt`/`srt` output
`suppress_numerals`	bool	`true`	Spell out numbers
`hotwords`	string	—	Comma-separated hotwords to bias toward
`batch_size`	int	config default	Inference batch size
`chunk_size`	int	config default	VAD chunk size in seconds

Response formats

Format	Content-Type	Body
`json`	`application/json`	`{"text": "..."}`
`verbose_json`	`application/json`	Full transcript with segments and timestamps
`vtt_json`	`application/json`	`verbose_json` + `"vtt_text"` field
`text` / `srt` / `aud`	`text/plain`	Raw text / subtitle file
`vtt`	`text/vtt`	WebVTT subtitle file

`POST /v1/audio/translations`

Translate audio to English. Same parameters as /v1/audio/transcriptions, minus language, align, diarize, and diarization-related fields.

`GET /healthcheck`

Returns {"status": "healthy"}. Not protected by API key auth.

Model management

Endpoint	Description
`GET /models/list`	List loaded transcription models
`POST /models/load`	Load a model (`model` param)
`POST /models/unload`	Unload a model (`model` param)
`GET /align_models/list`	List loaded alignment models
`POST /align_models/load`	Load an alignment model (`language` param)
`POST /align_models/unload`	Unload an alignment model (`language` param)
`GET /diarize_models/list`	List loaded diarization models
`POST /diarize_models/load`	Load a diarization model (`model` param)
`POST /diarize_models/unload`	Unload a diarization model (`model` param)

Pluggable Backends

Each pipeline stage (transcription, alignment, diarization) can use a different backend. Set the active backend via environment variables:

BACKENDS__TRANSCRIPTION=whisperx
BACKENDS__ALIGNMENT=whisperx
BACKENDS__DIARIZATION=whisperx

Only the whisperx backend ships by default. Custom backends can be registered via the backend registry at src/whisperx_api_server/backends/.

Compose Files

File	Purpose
`compose.yaml`	Standalone server — use `--profile cuda` or `--profile cpu`
`compose-kafka.yaml`	Distributed stack — API server + Kafka + MinIO + workers via `--profile cuda` / `--profile cpu`

Workers are opt-in via profiles so docker compose up never accidentally starts a GPU process on a machine that doesn't have one.

Contributing

Issues, forks, and pull requests are welcome.

License

GNU General Public License v3.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github/workflows		.github/workflows
mcp-whisperx		mcp-whisperx
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile.cpu		Dockerfile.cpu
Dockerfile.cuda		Dockerfile.cuda
Dockerfile.worker-cpu		Dockerfile.worker-cpu
Dockerfile.worker-cuda		Dockerfile.worker-cuda
LICENSE		LICENSE
README.md		README.md
compose-kafka.yaml		compose-kafka.yaml
compose.yaml		compose.yaml
cuda-docker-entrypoint.sh		cuda-docker-entrypoint.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperX API Server

Features

Quick Start

Standalone (single server)

Distributed mode (Kafka + workers)

Configuration

API Reference

`POST /v1/audio/transcriptions`

`POST /v1/audio/translations`

`GET /healthcheck`

Model management

Pluggable Backends

Compose Files

Contributing

License

About

Uh oh!

Releases 24

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WhisperX API Server

Features

Quick Start

Standalone (single server)

Distributed mode (Kafka + workers)

Configuration

API Reference

POST /v1/audio/transcriptions

POST /v1/audio/translations

GET /healthcheck

Model management

Pluggable Backends

Compose Files

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`POST /v1/audio/transcriptions`

`POST /v1/audio/translations`

`GET /healthcheck`

Packages