Echoline

Echoline is a continuation of speaches sponsored by Vowel.

echoline is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speech-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.

Features

OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with echoline.
Audio generation (chat completions endpoint) | OpenAI Documentation
- Generate a spoken audio summary of a body of text (text in, audio out)
- Perform sentiment analysis on a recording (audio in, text out)
- Async speech to speech interactions with a model (audio in, audio out)
Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
Text-to-Speech via kokoro and piper models.
GPU and CPU support.
Deployable via Docker Compose / Docker
Realtime API for interactive voice conversations
Voice Activity Detection (batch and streaming)
Highly configurable via environment variables

Please create an issue if you find a bug, have a question, or a feature suggestion.

Container Image

Echoline is published as a container image to GitHub Container Registry:

ghcr.io/vowel/echoline

Available tags:

latest-cuda - CUDA-enabled image (recommended for GPU)
latest-cpu - CPU-only image
latest-cuda-12.6.3 - Specific CUDA 12.6.3 version
latest-cuda-12.4.1 - Specific CUDA 12.4.1 version
Versioned tags (e.g., v0.9.0-cuda, v0.9.0-cpu)

Installation

Docker Compose (Recommended)

Download the necessary Docker Compose files:

CUDA:

curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.cuda.yaml
export COMPOSE_FILE=compose.cuda.yaml

CPU:

curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.yaml
curl --silent --remote-name https://raw.githubusercontent.com/vowel/echoline/master/compose.cpu.yaml
export COMPOSE_FILE=compose.cpu.yaml

Start the service:

docker compose up --detach

Docker

CUDA:

docker run \
  --rm \
  --detach \
  --publish 8000:8000 \
  --name echoline \
  --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
  --gpus=all \
  ghcr.io/vowel/echoline:latest-cuda

CPU:

docker run \
  --rm \
  --detach \
  --publish 8000:8000 \
  --name echoline \
  --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub \
  ghcr.io/vowel/echoline:latest-cpu

Python (requires `uv` package manager)

git clone https://github.com/vowel/echoline.git
cd echoline
uv python install
uv venv
source .venv/bin/activate
uv sync
uvicorn --factory --host 0.0.0.0 echoline.main:create_app

Usage

Model Discovery and Download

Before using Echoline, you need to download a model for your specific task.

List available models:

export ECHOLINE_BASE_URL="http://localhost:8000"

# List all available models
uvx echoline-cli registry ls

# Filter by task (e.g., automatic-speech-recognition, text-to-speech)
uvx echoline-cli registry ls --task automatic-speech-recognition

Download a model:

# Download a speech-to-text model
uvx echoline-cli model download Systran/faster-distil-whisper-small.en

# Download a text-to-speech model
uvx echoline-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX

Speech-to-Text (Transcription)

Using cURL:

export ECHOLINE_BASE_URL="http://localhost:8000"
export TRANSCRIPTION_MODEL_ID="Systran/faster-distil-whisper-small.en"

curl -s "$ECHOLINE_BASE_URL/v1/audio/transcriptions" \
  -F "file=@audio.wav" \
  -F "model=$TRANSCRIPTION_MODEL_ID"

Using Python with OpenAI SDK:

from pathlib import Path
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")

with Path("audio.wav").open("rb") as audio_file:
    transcription = client.audio.transcriptions.create(
        model="Systran/faster-distil-whisper-small.en",
        file=audio_file
    )

print(transcription.text)

Text-to-Speech

Using cURL:

export ECHOLINE_BASE_URL="http://localhost:8000"
export SPEECH_MODEL_ID="speaches-ai/Kokoro-82M-v1.0-ONNX"
export VOICE_ID="af_heart"

curl "$ECHOLINE_BASE_URL/v1/audio/speech" \
  -s \
  -H "Content-Type: application/json" \
  --output audio.mp3 \
  --data @- << EOF
{
  "input": "Hello World!",
  "model": "$SPEECH_MODEL_ID",
  "voice": "$VOICE_ID"
}
EOF

Using Python with OpenAI SDK:

from pathlib import Path
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")

model_id = "speaches-ai/Kokoro-82M-v1.0-ONNX"
voice_id = "af_heart"

response = client.audio.speech.create(
    model=model_id,
    voice=voice_id,
    input="Hello, world!",
    response_format="mp3"
)

with Path("output.mp3").open("wb") as f:
    f.write(response.content)

Realtime API (Voice Chat)

Echoline implements the OpenAI Realtime API for interactive voice conversations. See the docs/usage/realtime-api.md file for full details.

Prerequisites:

Set CHAT_COMPLETION_BASE_URL to an OpenAI-compatible endpoint (e.g., Ollama, OpenAI)
Download both a STT model and a TTS model

Example WebSocket connection:

const ws = new WebSocket(
  "ws://localhost:8000/v1/realtime?model=your-model&intent=conversation"
);

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Received:', data);
};

Voice Activity Detection (VAD)

Echoline provides VAD capabilities via REST API and WebSocket streaming.

Batch API:

curl "$ECHOLINE_BASE_URL/v1/audio/speech/timestamps" \
  -F "file=@audio.wav"

WebSocket Streaming:

import websockets
import json
import base64

async def test_vad_stream():
    uri = "ws://localhost:8000/v1/vad/stream?session_id=test-123"
    async with websockets.connect(uri) as ws:
        # Send audio chunk (PCM16 16kHz mono)
        audio = get_audio_chunk()
        await ws.send(json.dumps({
            "type": "audio",
            "audio": base64.b64encode(audio).decode(),
            "timestamp_ms": 1000
        }))

        # Receive VAD events
        async for msg in ws:
            event = json.loads(msg)
            print(f"VAD Event: {event}")

Configuration

Echoline is highly configurable via environment variables. See the docs/configuration.md file for all available options.

Key configuration options:

ECHOLINE_API_KEY - API key for authentication (optional)
CHAT_COMPLETION_BASE_URL - Base URL for chat completion proxy
CHAT_COMPLETION_API_KEY - API key for chat completion provider
HF_TOKEN - Hugging Face token for accessing private models
ECHOLINE_LOG_LEVEL - Log level (DEBUG, INFO, WARNING, ERROR)

Documentation

Comprehensive documentation is available in the docs/ directory of this repository. To view the documentation locally with MkDocs:

# Install docs dependencies
uv sync --group docs

# Serve documentation locally
mkdocs serve

# Or build static site
mkdocs build

Documentation includes:

Detailed installation guides
API reference
Configuration options
Integration guides (Open WebUI, etc.)
Troubleshooting

Roadmap

Current Status

Realtime STT API: Currently experimental. The real-time streaming transcription API is under active development and may change.

Planned Features

Voxtral Real-time Transcription: Integration of Mistral AI's Voxtral models for improved real-time streaming transcription. Voxtral Realtime supports 13 languages with ultra-low latency (sub-200ms), and Voxtral Mini Transcribe V2 offers batch transcription with speaker diarization.
NVIDIA Parakeet: Adding support for NVIDIA's Parakeet ASR models (TDT 0.6B v2/v3, RNNT 1.1B) for enhanced speech-to-text accuracy. Parakeet TDT 0.6B v2 ranks #1 on the Hugging Face ASR Leaderboard with 6.05% WER and 50x faster inference.
Moonshine STT: Integration of UsefulSensors' Moonshine model for efficient on-device ASR. Moonshine Tiny offers 5x compute reduction vs Whisper tiny-en with no WER increase, optimized for live transcription and voice commands.
Omni TTS: Integration of OmniVoice, a multilingual zero-shot TTS model supporting 600+ languages with voice cloning and voice design capabilities. Features include speaker attribute control (gender, age, pitch, dialect) and ultra-fast inference (0.025 RTF - 40x faster than real-time).

See the docs/usage/realtime-api.md file for current limitations and next steps regarding the Realtime API.

License

MIT License - See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 726 Commits
.github		.github
.vscode		.vscode
assets		assets
configuration		configuration
docs		docs
examples		examples
packages/echoline-cli		packages/echoline-cli
realtime-console/dist		realtime-console/dist
scripts		scripts
src/echoline		src/echoline
tests		tests
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Taskfile.yaml		Taskfile.yaml
audio.wav		audio.wav
compose.cpu.yaml		compose.cpu.yaml
compose.cuda-cdi.yaml		compose.cuda-cdi.yaml
compose.cuda.yaml		compose.cuda.yaml
compose.observability.yaml		compose.observability.yaml
compose.yaml		compose.yaml
contributing.md		contributing.md
flake.lock		flake.lock
flake.nix		flake.nix
mkdocs.yml		mkdocs.yml
model_aliases.json		model_aliases.json
pyproject.toml		pyproject.toml
pyrefly.toml		pyrefly.toml
renovate.json5		renovate.json5
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Echoline

Features

Container Image

Installation

Docker Compose (Recommended)

Docker

Python (requires `uv` package manager)

Usage

Model Discovery and Download

Speech-to-Text (Transcription)

Text-to-Speech

Realtime API (Voice Chat)

Voice Activity Detection (VAD)

Configuration

Documentation

Roadmap

Current Status

Planned Features

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Echoline

Features

Container Image

Installation

Docker Compose (Recommended)

Docker

Python (requires uv package manager)

Usage

Model Discovery and Download

Speech-to-Text (Transcription)

Text-to-Speech

Realtime API (Voice Chat)

Voice Activity Detection (VAD)

Configuration

Documentation

Roadmap

Current Status

Planned Features

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Python (requires `uv` package manager)

Packages