Yap

Voice-to-text dictation for Linux, Windows, and macOS. Press (or hold) a global hotkey, speak, release/press again — your words are typed into whatever app is focused. Runs silently in the system tray with a transparent floating overlay that shows recording state.

Built with Tauri 2 + Rust + Svelte.

Download

Pre-built installers for every release are published on the GitHub Releases page:

Platform	File
Linux (Debian/Ubuntu)	`.deb`
Linux (Fedora/RHEL)	`.rpm`
Linux (any distro)	`.AppImage`
Windows	`.msi` or `.exe` (NSIS installer)

macOS: not currently distributed via Releases. Build from source (see Setup & run).

Windows first launch: SmartScreen may show "Windows protected your PC" — click More info → Run anyway.

If you'd rather build from source, see Setup & run below.

How it works

Hotkey trigger              Hotkey trigger
(tap or hold-down)          (tap again, or release)
      ↓                           ↓
 start recording             stop recording
      ↓                           ↓
 cpal captures mic         noise gate applied
 at 16 kHz mono                   ↓
                           VAD trims silence
                                  ↓
                           resample → 16 kHz
                                  ↓
                     ┌────────────┴────────────┐
                  Groq backend           Local backend
                  (cloud)                (offline)
                     ↓                        ↓
              POST WAV to Groq        whisper-rs runs
              whisper-large-v3-turbo  selected GGML model
                     └────────────┬────────────┘
                                  ↓
                           transcript → clipboard
                                  ↓
                           xdotool / ydotool / enigo
                           simulates Ctrl+V (Cmd+V on Mac)

The pill overlay (transparent, always-on-top) updates in real time via Tauri events from the main app window.

Prerequisites

All platforms

Rust stable toolchain — install with rustup
Node.js v18+
pnpm — npm i -g pnpm

Linux (Debian / Ubuntu / MX Linux)

sudo apt-get install -y \
  pkg-config \
  libwebkit2gtk-4.1-dev \
  libappindicator3-dev \
  librsvg2-dev \
  libgtk-3-dev \
  libssl-dev \
  libasound2-dev \
  cmake \
  libclang-dev \
  xdotool
  # or: ydotool   # Wayland paste (also needs ydotoold daemon)

cmake and libclang-dev are required to compile whisper.cpp (the local Whisper backend) from source. xdotool is for X11 paste injection.

Windows

Install the Visual Studio Build Tools with the Desktop development with C++ workload — this provides MSVC, CMake, and the Windows SDK.

# Or install via winget:
winget install Microsoft.VisualStudio.2022.BuildTools
winget install Kitware.CMake
winget install LLVM.LLVM   # provides libclang for bindgen

After installing LLVM, set the env var so bindgen can find it:

[System.Environment]::SetEnvironmentVariable("LIBCLANG_PATH", "C:\Program Files\LLVM\bin", "User")

macOS

xcode-select --install        # Xcode command-line tools (clang, make)
brew install cmake            # CMake for building whisper.cpp

Setup & run

# 1. Clone
git clone https://github.com/xeven777/yap
cd yap

# 2. Install JS dependencies
pnpm install

# 3. Dev mode — hot-reload on both Rust and Svelte changes
pnpm tauri dev

The app starts in the system tray. Click the tray icon to open the settings window.

First launch

Using Groq (cloud, default)

Get a free API key at console.groq.com
Open Settings (tray icon or ⚙ button) → paste key → Save

Using Local Whisper (offline)

Open Settings → switch backend to Local Whisper
Pick a model and click Download — models are fetched from HuggingFace

Model	Size	Notes
Tiny (Q5)	31 MB	Fastest; good for quick phrases
Base	142 MB	Balanced speed and accuracy
Small (Q5)	181 MB	Better accuracy, still fast
Small	466 MB	Full-precision small model
Large Turbo (Q5)	547 MB	Recommended — same arch as Groq, quantized
Large Turbo	874 MB	Full-precision, highest quality

Once downloaded, select the model and click Save
The model is cached in memory after first use so subsequent transcriptions are fast

Model files are stored in the app data directory:

Linux: ~/.local/share/com.yap.app/models/
Windows: %APPDATA%\com.yap.app\models\
macOS: ~/Library/Application Support/com.yap.app/models/

Usage

Action	Result
Hotkey (default: `Ctrl+Shift+Space`)	Trigger recording — behaviour depends on mode
Tap mode — press once	Starts recording — pill overlay appears
Tap mode — press again	Stops recording, transcribes, pastes
Hold mode — press and hold	Records while held (push-to-talk)
Hold mode — release	Stops recording, transcribes, pastes
Click tray icon	Show settings window
Click − in settings window	Hide window back to tray
Click × on settings window	Also hides to tray (does not quit)
Tray right-click → Quit	Exit the app

Tap vs Hold

Pick whichever feels natural in Settings → Hotkey:

Tap — press the chord once to start, press again to stop. Best for longer dictations where you don't want to keep a finger down.
Hold — push-to-talk. Press the chord, talk while holding, release to transcribe. Best for quick phrases (Slack messages, short replies) where you don't want to think about toggling.

The pill overlay (bottom-centre of screen, transparent) shows:

Red + pulsing dot — Recording
Blue — Transcribing
Green — Done
Disappears when idle

Settings

Open via tray icon or the ⚙ button in the main window.

Setting	Description
Backend	Groq (cloud, needs API key) or Local Whisper (offline, needs a downloaded model)
Groq API Key	Your `gsk_...` key from console.groq.com
Model	Which downloaded GGML model to use for local transcription
Hotkey	Click the field, then press your shortcut to capture it (e.g. `Ctrl+Shift+D`, `Alt+Space`). Any combo your OS recognises will work.
Hotkey mode	Tap (press to toggle) or Hold (push-to-talk — record while held)
Language	ISO code (`en`, `fr`, `de`…) — skips auto-detect, saves ~100 ms. Leave blank for auto.

Settings are saved to the OS app-config directory:

Linux: ~/.config/com.yap.app/
Windows: %APPDATA%\com.yap.app\
macOS: ~/Library/Application Support/com.yap.app/

Build for distribution

pnpm tauri build

Output is in src-tauri/target/release/bundle/:

Platform	Output
Linux	`.deb` + `.AppImage`
Windows	`.msi` + `.exe` (NSIS installer)
macOS	`.dmg` + `.app`

The release profile uses LTO + dead-code stripping (opt-level = "z", strip = true) to keep the binary small.

Cross-compilation notes

Tauri does not support cross-compilation out of the box — build on the target OS. For CI, use:

Linux: Ubuntu 22.04+ runner
Windows: windows-latest runner
macOS: macos-latest runner (arm64) or macos-13 (x86_64)

Cutting a release

A GitHub Actions workflow at .github/workflows/release.yml builds installers for Linux and Windows in parallel and uploads them to a GitHub Release. (macOS builds are currently disabled — build locally if needed.)

To publish a new version:

# 1. Bump the version in package.json AND src-tauri/tauri.conf.json AND src-tauri/Cargo.toml
# 2. Commit and tag
git commit -am "Release v0.2.0"
git tag v0.2.0
git push origin main --tags

The workflow runs automatically on any tag matching v*. It creates a draft release — review the artifacts on GitHub, then click Publish release to make it visible.

You can also trigger the workflow manually via Actions → Release → Run workflow (useful for testing without cutting a real version).

Builds are unsigned. macOS / Windows users will see a Gatekeeper / SmartScreen warning the first time they run the app. To distribute signed builds, add APPLE_* and WINDOWS_CERTIFICATE secrets per the tauri-action docs.

Project structure

yap/
├── index.html                  # Main window entry
├── pill.html                   # Overlay window entry
├── src/
│   ├── App.svelte              # Main UI — settings, hotkey, backend/model selector
│   ├── Pill.svelte             # Floating overlay — listens for yap://state events
│   ├── main.ts                 # Svelte mount for main window
│   ├── pill.ts                 # Svelte mount for pill window
│   └── app.css                 # Global reset
├── src-tauri/
│   ├── src/
│   │   ├── lib.rs              # Tauri builder, tray setup, close-to-tray handler
│   │   └── commands.rs         # All Tauri commands + audio pipeline + transcription
│   ├── Cargo.toml              # Rust deps + release profile
│   ├── tauri.conf.json         # Window config (main + pill), bundle targets
│   └── capabilities/
│       └── default.json        # Tauri 2 permission grants for both windows
├── vite.config.ts              # Multi-page build (main + pill entry points)
├── svelte.config.js
└── package.json

Key dependencies

Crate / Package	Purpose
`tauri 2`	Native window, IPC bridge, system tray
`cpal 0.15`	Cross-platform audio capture (ALSA / PipeWire / WASAPI / CoreAudio)
`hound 3`	Encode raw PCM → WAV (Groq path)
`reqwest 0.12`	HTTP client for Groq API + model downloads
`whisper-rs 0.14`	Rust bindings to whisper.cpp for local offline transcription
`tauri-plugin-global-shortcut`	System-wide configurable hotkey
`tauri-plugin-clipboard-manager`	Write transcript to clipboard
`enigo 0.2`	Keyboard injection on Windows + macOS
`@tauri-apps/api`	JS ↔ Rust IPC + event bus
`svelte 5`	Reactive UI (runes mode)

Audio pipeline detail

Capture — cpal opens the default input device at 16 kHz mono. Falls back to native rate + resamples afterward.
Noise gate — Samples below 0.5% amplitude are zeroed during capture to cut background hiss.
VAD trim — After recording stops, silence is trimmed from both ends (threshold 1%, 150 ms padding).
Resample — Linear interpolation to 16 kHz if the device captured at a different rate.
Transcribe (Groq) — hound writes a 16-bit PCM WAV to /tmp/yap_recording.wav; reqwest POSTs it to Groq's /v1/audio/transcriptions.
Transcribe (Local) — Raw f32 samples are passed directly to whisper-rs (no WAV write). The WhisperContext is cached in memory so the model loads only once per session. Inference runs in a spawn_blocking thread to avoid blocking the async runtime.

Troubleshooting

App doesn't appear after launching It starts in the system tray. Look for the Yap icon in your taskbar/tray area.

Hotkey not registering Another app may have claimed the combo. Open Settings → click the hotkey field → press a different chord (e.g. Ctrl+Shift+D or Alt+F9) → Save. If the chord is rejected, the error from the OS will be shown below the hotkey field.

Hold mode releases too early / cuts off audio Some keyboards repeat-fire modifier releases when other keys are pressed. If push-to-talk feels flaky, switch to Tap mode in Settings → Hotkey, or pick a chord that doesn't share a modifier with keys you're typing.

Pill overlay not showing Make sure the main window is running. Check the terminal for [yap] log lines.

No audio / "No input device" Run arecord -l (Linux) to list capture devices. Ensure PipeWire or PulseAudio is running: systemctl --user status pipewire.

xdotool paste not working (X11)

sudo apt-get install xdotool

Wayland paste not working

sudo apt-get install ydotool
sudo ydotoold &

Build fails: Unable to find libclang

# Linux
sudo apt-get install libclang-dev

# macOS
brew install llvm
export LIBCLANG_PATH="$(brew --prefix llvm)/lib"

# Windows — install LLVM from https://releases.llvm.org and set:
# LIBCLANG_PATH=C:\Program Files\LLVM\bin

Build fails: cmake not found

# Linux
sudo apt-get install cmake

# macOS
brew install cmake

# Windows
winget install Kitware.CMake

Local model download stuck / slow HuggingFace occasionally rate-limits. Click Cancel and retry. Large models (547 MB+) take a few minutes on a typical connection.

Local transcription is slow on first use The model is being loaded from disk into memory — this is a one-time cost per session. Subsequent transcriptions use the cached context and are much faster.

Groq 401 error API key is wrong or expired. Generate a new one at console.groq.com.

Groq 413 error (payload too large) Recording was very long. The free tier accepts up to 25 MB (~25 min at 16 kHz). VAD trimming reduces this automatically for typical short dictations.

Remaining optimisations

Streaming transcription — pipe audio chunks to a WebSocket ASR service; partial transcript visible while speaking
OS keychain for API key — use the keyring crate instead of a plaintext config file
Auto-start on login — tauri-plugin-autostart
Transcript history — store last N transcripts with timestamps via tauri-plugin-store
Configurable VAD threshold — expose noise gate and silence threshold in settings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yap

Download

How it works

Prerequisites

All platforms

Linux (Debian / Ubuntu / MX Linux)

Windows

macOS

Setup & run

First launch

Using Groq (cloud, default)

Using Local Whisper (offline)

Usage

Tap vs Hold

Settings

Build for distribution

Cross-compilation notes

Cutting a release

Project structure

Key dependencies

Audio pipeline detail

Troubleshooting

Remaining optimisations

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
public/favicon_io		public/favicon_io
src-tauri		src-tauri
src		src
web		web
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package.json		package.json
pill.html		pill.html
pnpm-lock.yaml		pnpm-lock.yaml
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
yap.md		yap.md

Folders and files

Latest commit

History

Repository files navigation

Yap

Download

How it works

Prerequisites

All platforms

Linux (Debian / Ubuntu / MX Linux)

Windows

macOS

Setup & run

First launch

Using Groq (cloud, default)

Using Local Whisper (offline)

Usage

Tap vs Hold

Settings

Build for distribution

Cross-compilation notes

Cutting a release

Project structure

Key dependencies

Audio pipeline detail

Troubleshooting

Remaining optimisations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages