Realtime Transcriptor

A real-time speech-to-text transcription system using a Python WebSocket server powered by Vosk, and a modern browser-based client using Vite and vanilla JavaScript.

Supports Multiple languages based on the used model. Designed for low-latency and efficient bandwidth usage with silence detection on the client side.

🗂 Project Structure


./
├── client/                     # Vite-based client
│   ├── index.html              # Entry HTML file
│   ├── package.json            # Vite + dependencies
│   ├── package-lock.json
│   └── src/
│       ├── main.js             # App entrypoint
│       └── style.css           # App styling
├── main.py                     # Python WebSocket server (Vosk-based)
├── pyproject.toml              # Python dependency spec
├── uv.lock                     # Lock file (e.g., for uv or pip)
├── README.md

⚙️ Installation Guide

1. Clone the Repo

git clone https://github.com/Abdulkhalek-1/realtime-transcriptor.git
cd realtime-transcriptor

2. Install Python Dependencies

It's recommended to use uv:

pip install uv
uv sync

3. Download and Prepare Vosk Model

Models are not included due to their size. Download a suitable model for your language from the official Vosk models page:

Arabic: https://alphacephei.com/vosk/models/vosk-model-ar-0.22-linto-1.1.0.zip
English (small): https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
English (large, more accurate): https://alphacephei.com/vosk/models/vosk-model-en-us-0.42-gigaspeech.zip
All Models: https://alphacephei.com/vosk/models

After downloading, extract the model zip to a folder named model inside your project root:

unzip ./<your-model-name>
mv ./<your-model-name> ./model

💻 Running the Server

uv run main.py

The server will start listening on WebSocket ws://localhost:8765.

🖥️ Running the Client (Vite App)

cd client
npm install
npm run dev

Open your browser at the URL printed by Vite (usually http://localhost:5173).

🔍 Why This Design?

❓ Why EOF Handling in Server?

The server calls recognizer.FinalResult() when a client disconnects to finalize the last transcribed sentence. This ensures no spoken words are lost due to streaming buffering.

❓ Why Silence Detection in Client?

To reduce bandwidth and improve transcription segmentation:

The client detects silence by monitoring audio levels.
When silence is detected for more than 1 second, the client disconnects and reconnects the WebSocket.
This triggers the server to finalize and send the last transcription segment.
This approach keeps server logic simple and offloads smart behavior to the client.

📄 License

MIT License — feel free to use and modify.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realtime Transcriptor

🗂 Project Structure

⚙️ Installation Guide

1. Clone the Repo

2. Install Python Dependencies

3. Download and Prepare Vosk Model

💻 Running the Server

🖥️ Running the Client (Vite App)

🔍 Why This Design?

❓ Why EOF Handling in Server?

❓ Why Silence Detection in Client?

📄 License

🙌 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
client		client
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Realtime Transcriptor

🗂 Project Structure

⚙️ Installation Guide

1. Clone the Repo

2. Install Python Dependencies

3. Download and Prepare Vosk Model

💻 Running the Server

🖥️ Running the Client (Vite App)

🔍 Why This Design?

❓ Why EOF Handling in Server?

❓ Why Silence Detection in Client?

📄 License

🙌 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages