Voice-to-text dictation for Linux, Windows, and macOS. Press (or hold) a global hotkey, speak, release/press again — your words are typed into whatever app is focused. Runs silently in the system tray with a transparent floating overlay that shows recording state.
Built with Tauri 2 + Rust + Svelte.
Pre-built installers for every release are published on the GitHub Releases page:
| Platform | File |
|---|---|
| Linux (Debian/Ubuntu) | .deb |
| Linux (Fedora/RHEL) | .rpm |
| Linux (any distro) | .AppImage |
| Windows | .msi or .exe (NSIS installer) |
macOS: not currently distributed via Releases. Build from source (see Setup & run).
Windows first launch: SmartScreen may show "Windows protected your PC" — click More info → Run anyway.
If you'd rather build from source, see Setup & run below.
Hotkey trigger Hotkey trigger
(tap or hold-down) (tap again, or release)
↓ ↓
start recording stop recording
↓ ↓
cpal captures mic noise gate applied
at 16 kHz mono ↓
VAD trims silence
↓
resample → 16 kHz
↓
┌────────────┴────────────┐
Groq backend Local backend
(cloud) (offline)
↓ ↓
POST WAV to Groq whisper-rs runs
whisper-large-v3-turbo selected GGML model
└────────────┬────────────┘
↓
transcript → clipboard
↓
xdotool / ydotool / enigo
simulates Ctrl+V (Cmd+V on Mac)
The pill overlay (transparent, always-on-top) updates in real time via Tauri events from the main app window.
sudo apt-get install -y \
pkg-config \
libwebkit2gtk-4.1-dev \
libappindicator3-dev \
librsvg2-dev \
libgtk-3-dev \
libssl-dev \
libasound2-dev \
cmake \
libclang-dev \
xdotool
# or: ydotool # Wayland paste (also needs ydotoold daemon)
cmakeandlibclang-devare required to compilewhisper.cpp(the local Whisper backend) from source.xdotoolis for X11 paste injection.
Install the Visual Studio Build Tools with the Desktop development with C++ workload — this provides MSVC, CMake, and the Windows SDK.
# Or install via winget:
winget install Microsoft.VisualStudio.2022.BuildTools
winget install Kitware.CMake
winget install LLVM.LLVM # provides libclang for bindgenAfter installing LLVM, set the env var so bindgen can find it:
[System.Environment]::SetEnvironmentVariable("LIBCLANG_PATH", "C:\Program Files\LLVM\bin", "User")xcode-select --install # Xcode command-line tools (clang, make)
brew install cmake # CMake for building whisper.cpp# 1. Clone
git clone https://github.com/xeven777/yap
cd yap
# 2. Install JS dependencies
pnpm install
# 3. Dev mode — hot-reload on both Rust and Svelte changes
pnpm tauri devThe app starts in the system tray. Click the tray icon to open the settings window.
- Get a free API key at console.groq.com
- Open Settings (tray icon or ⚙ button) → paste key → Save
- Open Settings → switch backend to Local Whisper
- Pick a model and click Download — models are fetched from HuggingFace
| Model | Size | Notes |
|---|---|---|
| Tiny (Q5) | 31 MB | Fastest; good for quick phrases |
| Base | 142 MB | Balanced speed and accuracy |
| Small (Q5) | 181 MB | Better accuracy, still fast |
| Small | 466 MB | Full-precision small model |
| Large Turbo (Q5) | 547 MB | Recommended — same arch as Groq, quantized |
| Large Turbo | 874 MB | Full-precision, highest quality |
- Once downloaded, select the model and click Save
- The model is cached in memory after first use so subsequent transcriptions are fast
Model files are stored in the app data directory:
- Linux:
~/.local/share/com.yap.app/models/ - Windows:
%APPDATA%\com.yap.app\models\ - macOS:
~/Library/Application Support/com.yap.app/models/
| Action | Result |
|---|---|
Hotkey (default: Ctrl+Shift+Space) |
Trigger recording — behaviour depends on mode |
| Tap mode — press once | Starts recording — pill overlay appears |
| Tap mode — press again | Stops recording, transcribes, pastes |
| Hold mode — press and hold | Records while held (push-to-talk) |
| Hold mode — release | Stops recording, transcribes, pastes |
| Click tray icon | Show settings window |
| Click − in settings window | Hide window back to tray |
| Click × on settings window | Also hides to tray (does not quit) |
| Tray right-click → Quit | Exit the app |
Pick whichever feels natural in Settings → Hotkey:
- Tap — press the chord once to start, press again to stop. Best for longer dictations where you don't want to keep a finger down.
- Hold — push-to-talk. Press the chord, talk while holding, release to transcribe. Best for quick phrases (Slack messages, short replies) where you don't want to think about toggling.
The pill overlay (bottom-centre of screen, transparent) shows:
- Red + pulsing dot — Recording
- Blue — Transcribing
- Green — Done
- Disappears when idle
Open via tray icon or the ⚙ button in the main window.
| Setting | Description |
|---|---|
| Backend | Groq (cloud, needs API key) or Local Whisper (offline, needs a downloaded model) |
| Groq API Key | Your gsk_... key from console.groq.com |
| Model | Which downloaded GGML model to use for local transcription |
| Hotkey | Click the field, then press your shortcut to capture it (e.g. Ctrl+Shift+D, Alt+Space). Any combo your OS recognises will work. |
| Hotkey mode | Tap (press to toggle) or Hold (push-to-talk — record while held) |
| Language | ISO code (en, fr, de…) — skips auto-detect, saves ~100 ms. Leave blank for auto. |
Settings are saved to the OS app-config directory:
- Linux:
~/.config/com.yap.app/ - Windows:
%APPDATA%\com.yap.app\ - macOS:
~/Library/Application Support/com.yap.app/
pnpm tauri buildOutput is in src-tauri/target/release/bundle/:
| Platform | Output |
|---|---|
| Linux | .deb + .AppImage |
| Windows | .msi + .exe (NSIS installer) |
| macOS | .dmg + .app |
The release profile uses LTO + dead-code stripping (opt-level = "z", strip = true) to keep the binary small.
Tauri does not support cross-compilation out of the box — build on the target OS. For CI, use:
- Linux: Ubuntu 22.04+ runner
- Windows:
windows-latestrunner - macOS:
macos-latestrunner (arm64) ormacos-13(x86_64)
A GitHub Actions workflow at .github/workflows/release.yml builds installers for Linux and Windows in parallel and uploads them to a GitHub Release. (macOS builds are currently disabled — build locally if needed.)
To publish a new version:
# 1. Bump the version in package.json AND src-tauri/tauri.conf.json AND src-tauri/Cargo.toml
# 2. Commit and tag
git commit -am "Release v0.2.0"
git tag v0.2.0
git push origin main --tagsThe workflow runs automatically on any tag matching v*. It creates a draft release — review the artifacts on GitHub, then click Publish release to make it visible.
You can also trigger the workflow manually via Actions → Release → Run workflow (useful for testing without cutting a real version).
Builds are unsigned. macOS / Windows users will see a Gatekeeper / SmartScreen warning the first time they run the app. To distribute signed builds, add
APPLE_*andWINDOWS_CERTIFICATEsecrets per the tauri-action docs.
yap/
├── index.html # Main window entry
├── pill.html # Overlay window entry
├── src/
│ ├── App.svelte # Main UI — settings, hotkey, backend/model selector
│ ├── Pill.svelte # Floating overlay — listens for yap://state events
│ ├── main.ts # Svelte mount for main window
│ ├── pill.ts # Svelte mount for pill window
│ └── app.css # Global reset
├── src-tauri/
│ ├── src/
│ │ ├── lib.rs # Tauri builder, tray setup, close-to-tray handler
│ │ └── commands.rs # All Tauri commands + audio pipeline + transcription
│ ├── Cargo.toml # Rust deps + release profile
│ ├── tauri.conf.json # Window config (main + pill), bundle targets
│ └── capabilities/
│ └── default.json # Tauri 2 permission grants for both windows
├── vite.config.ts # Multi-page build (main + pill entry points)
├── svelte.config.js
└── package.json
| Crate / Package | Purpose |
|---|---|
tauri 2 |
Native window, IPC bridge, system tray |
cpal 0.15 |
Cross-platform audio capture (ALSA / PipeWire / WASAPI / CoreAudio) |
hound 3 |
Encode raw PCM → WAV (Groq path) |
reqwest 0.12 |
HTTP client for Groq API + model downloads |
whisper-rs 0.14 |
Rust bindings to whisper.cpp for local offline transcription |
tauri-plugin-global-shortcut |
System-wide configurable hotkey |
tauri-plugin-clipboard-manager |
Write transcript to clipboard |
enigo 0.2 |
Keyboard injection on Windows + macOS |
@tauri-apps/api |
JS ↔ Rust IPC + event bus |
svelte 5 |
Reactive UI (runes mode) |
- Capture —
cpalopens the default input device at 16 kHz mono. Falls back to native rate + resamples afterward. - Noise gate — Samples below 0.5% amplitude are zeroed during capture to cut background hiss.
- VAD trim — After recording stops, silence is trimmed from both ends (threshold 1%, 150 ms padding).
- Resample — Linear interpolation to 16 kHz if the device captured at a different rate.
- Transcribe (Groq) —
houndwrites a 16-bit PCM WAV to/tmp/yap_recording.wav;reqwestPOSTs it to Groq's/v1/audio/transcriptions. - Transcribe (Local) — Raw
f32samples are passed directly towhisper-rs(no WAV write). TheWhisperContextis cached in memory so the model loads only once per session. Inference runs in aspawn_blockingthread to avoid blocking the async runtime.
App doesn't appear after launching It starts in the system tray. Look for the Yap icon in your taskbar/tray area.
Hotkey not registering
Another app may have claimed the combo. Open Settings → click the hotkey field → press a different chord (e.g. Ctrl+Shift+D or Alt+F9) → Save. If the chord is rejected, the error from the OS will be shown below the hotkey field.
Hold mode releases too early / cuts off audio Some keyboards repeat-fire modifier releases when other keys are pressed. If push-to-talk feels flaky, switch to Tap mode in Settings → Hotkey, or pick a chord that doesn't share a modifier with keys you're typing.
Pill overlay not showing
Make sure the main window is running. Check the terminal for [yap] log lines.
No audio / "No input device"
Run arecord -l (Linux) to list capture devices. Ensure PipeWire or PulseAudio is running: systemctl --user status pipewire.
xdotool paste not working (X11)
sudo apt-get install xdotoolWayland paste not working
sudo apt-get install ydotool
sudo ydotoold &Build fails: Unable to find libclang
# Linux
sudo apt-get install libclang-dev
# macOS
brew install llvm
export LIBCLANG_PATH="$(brew --prefix llvm)/lib"
# Windows — install LLVM from https://releases.llvm.org and set:
# LIBCLANG_PATH=C:\Program Files\LLVM\binBuild fails: cmake not found
# Linux
sudo apt-get install cmake
# macOS
brew install cmake
# Windows
winget install Kitware.CMakeLocal model download stuck / slow HuggingFace occasionally rate-limits. Click Cancel and retry. Large models (547 MB+) take a few minutes on a typical connection.
Local transcription is slow on first use The model is being loaded from disk into memory — this is a one-time cost per session. Subsequent transcriptions use the cached context and are much faster.
Groq 401 error API key is wrong or expired. Generate a new one at console.groq.com.
Groq 413 error (payload too large) Recording was very long. The free tier accepts up to 25 MB (~25 min at 16 kHz). VAD trimming reduces this automatically for typical short dictations.
- Streaming transcription — pipe audio chunks to a WebSocket ASR service; partial transcript visible while speaking
- OS keychain for API key — use the
keyringcrate instead of a plaintext config file - Auto-start on login —
tauri-plugin-autostart - Transcript history — store last N transcripts with timestamps via
tauri-plugin-store - Configurable VAD threshold — expose noise gate and silence threshold in settings