echoline is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speech-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with
echoline. - Audio generation (chat completions endpoint) | OpenAI Documentation
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
- Text-to-Speech via
kokoroandpipermodels. - GPU and CPU support.
- Deployable via Docker Compose / Docker
- Realtime API for interactive voice conversations
- Voice Activity Detection (batch and streaming)
- Highly configurable via environment variables
Please create an issue if you find a bug, have a question, or a feature suggestion.
Echoline is published as a container image to GitHub Container Registry:
ghcr.io/vowel/echoline
Available tags:
latest-cuda- CUDA-enabled image (recommended for GPU)latest-cpu- CPU-only imagelatest-cuda-12.6.3- Specific CUDA 12.6.3 versionlatest-cuda-12.4.1- Specific CUDA 12.4.1 version- Versioned tags (e.g.,
v0.9.0-cuda,v0.9.0-cpu)
Download the necessary Docker Compose files:
CUDA:
curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.cuda.yaml
export COMPOSE_FILE=compose.cuda.yamlCPU:
curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.cpu.yaml
export COMPOSE_FILE=compose.cpu.yamlStart the service:
docker compose up --detachCUDA:
docker run \
--rm \
--detach \
--publish 8000:8000 \
--name echoline \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
--gpus=all \
ghcr.io/vowel/echoline:latest-cudaCPU:
docker run \
--rm \
--detach \
--publish 8000:8000 \
--name echoline \
--volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
ghcr.io/vowel/echoline:latest-cpugit clone https://github.com/vowel/echoline.git
cd echoline
uv python install
uv venv
source .venv/bin/activate
uv sync
uvicorn --factory --host 0.0.0.0 echoline.main:create_appBefore using Echoline, you need to download a model for your specific task.
List available models:
export ECHOLINE_BASE_URL="http://localhost:8000"
# List all available models
uvx echoline-cli registry ls
# Filter by task (e.g., automatic-speech-recognition, text-to-speech)
uvx echoline-cli registry ls --task automatic-speech-recognitionDownload a model:
# Download a speech-to-text model
uvx echoline-cli model download Systran/faster-distil-whisper-small.en
# Download a text-to-speech model
uvx echoline-cli model download speaches-ai/Kokoro-82M-v1.0-ONNXUsing cURL:
export ECHOLINE_BASE_URL="http://localhost:8000"
export TRANSCRIPTION_MODEL_ID="Systran/faster-distil-whisper-small.en"
curl -s "$ECHOLINE_BASE_URL/v1/audio/transcriptions" \
-F "file=@audio.wav" \
-F "model=$TRANSCRIPTION_MODEL_ID"Using Python with OpenAI SDK:
from pathlib import Path
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
with Path("audio.wav").open("rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="Systran/faster-distil-whisper-small.en",
file=audio_file
)
print(transcription.text)Using cURL:
export ECHOLINE_BASE_URL="http://localhost:8000"
export SPEECH_MODEL_ID="speaches-ai/Kokoro-82M-v1.0-ONNX"
export VOICE_ID="af_heart"
curl "$ECHOLINE_BASE_URL/v1/audio/speech" \
-s \
-H "Content-Type: application/json" \
--output audio.mp3 \
--data @- << EOF
{
"input": "Hello World!",
"model": "$SPEECH_MODEL_ID",
"voice": "$VOICE_ID"
}
EOFUsing Python with OpenAI SDK:
from pathlib import Path
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
model_id = "speaches-ai/Kokoro-82M-v1.0-ONNX"
voice_id = "af_heart"
response = client.audio.speech.create(
model=model_id,
voice=voice_id,
input="Hello, world!",
response_format="mp3"
)
with Path("output.mp3").open("wb") as f:
f.write(response.content)Echoline implements the OpenAI Realtime API for interactive voice conversations. See the docs/usage/realtime-api.md file for full details.
Prerequisites:
- Set
CHAT_COMPLETION_BASE_URLto an OpenAI-compatible endpoint (e.g., Ollama, OpenAI) - Download both a STT model and a TTS model
Example WebSocket connection:
const ws = new WebSocket(
"ws://localhost:8000/v1/realtime?model=your-model&intent=conversation"
);
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Received:', data);
};Echoline provides VAD capabilities via REST API and WebSocket streaming.
Batch API:
curl "$ECHOLINE_BASE_URL/v1/audio/speech/timestamps" \
-F "file=@audio.wav"WebSocket Streaming:
import websockets
import json
import base64
async def test_vad_stream():
uri = "ws://localhost:8000/v1/vad/stream?session_id=test-123"
async with websockets.connect(uri) as ws:
# Send audio chunk (PCM16 16kHz mono)
audio = get_audio_chunk()
await ws.send(json.dumps({
"type": "audio",
"audio": base64.b64encode(audio).decode(),
"timestamp_ms": 1000
}))
# Receive VAD events
async for msg in ws:
event = json.loads(msg)
print(f"VAD Event: {event}")Echoline is highly configurable via environment variables. See the docs/configuration.md file for all available options.
Key configuration options:
ECHOLINE_API_KEY- API key for authentication (optional)CHAT_COMPLETION_BASE_URL- Base URL for chat completion proxyCHAT_COMPLETION_API_KEY- API key for chat completion providerHF_TOKEN- Hugging Face token for accessing private modelsECHOLINE_LOG_LEVEL- Log level (DEBUG, INFO, WARNING, ERROR)
Comprehensive documentation is available in the docs/ directory of this repository. To view the documentation locally with MkDocs:
# Install docs dependencies
uv sync --group docs
# Serve documentation locally
mkdocs serve
# Or build static site
mkdocs buildDocumentation includes:
- Detailed installation guides
- API reference
- Configuration options
- Integration guides (Open WebUI, etc.)
- Troubleshooting
- Realtime STT API: Currently experimental. The real-time streaming transcription API is under active development and may change.
- Voxtral Real-time Transcription: Integration of Mistral AI's Voxtral models for improved real-time streaming transcription. Voxtral Realtime supports 13 languages with ultra-low latency (sub-200ms), and Voxtral Mini Transcribe V2 offers batch transcription with speaker diarization.
- NVIDIA Parakeet: Adding support for NVIDIA's Parakeet ASR models (TDT 0.6B v2/v3, RNNT 1.1B) for enhanced speech-to-text accuracy. Parakeet TDT 0.6B v2 ranks #1 on the Hugging Face ASR Leaderboard with 6.05% WER and 50x faster inference.
- Moonshine STT: Integration of UsefulSensors' Moonshine model for efficient on-device ASR. Moonshine Tiny offers 5x compute reduction vs Whisper tiny-en with no WER increase, optimized for live transcription and voice commands.
- Omni TTS: Integration of OmniVoice, a multilingual zero-shot TTS model supporting 600+ languages with voice cloning and voice design capabilities. Features include speaker attribute control (gender, age, pitch, dialect) and ultra-fast inference (0.025 RTF - 40x faster than real-time).
See the docs/usage/realtime-api.md file for current limitations and next steps regarding the Realtime API.
MIT License - See LICENSE for details.
