VoiceTerm is a voice-first terminal overlay for Codex and Claude.
Gemini preset support remains experimental and is currently nonfunctional.
It runs Whisper on your machine and types what you say into your existing CLI.
Your tools still run in a normal PTY; VoiceTerm just adds a HUD on top.
Use push-to-talk or wake phrases (hey codex, hey claude), then say
send / submit for hands-free delivery.
Whisper runs locally by default. No cloud API keys required. Release history: dev/CHANGELOG.md.
- Hands-Free Quick Start
- Install and Start
- Requirements
- Features
- Supported Backends
- IDE Support
- Controls
- Guides Index
- Documentation
- Support
Install one supported AI CLI first:
Codex:
npm install -g @openai/codexClaude Code:
curl -fsSL https://claude.ai/install.sh | bashThen choose one VoiceTerm setup path:
Homebrew (recommended)
brew tap jguida941/voiceterm
brew install voiceterm
cd ~/your-project
voicetermIf needed, authenticate once:
voiceterm --login --codex
voiceterm --login --claudePyPI (pipx / pip)
pipx install voiceterm
# or: python3 -m pip install --user voiceterm
cd ~/your-project
voicetermIf needed, authenticate once:
voiceterm --login --codex
voiceterm --login --claudeFrom source
Requires Rust toolchain. See Install Guide for details.
git clone https://github.com/jguida941/voiceterm.git
cd voiceterm
./scripts/install.shIf you are running from source while developing, run:
python3 dev/scripts/devctl.py check --profile cimacOS App
Double-click app/macos/VoiceTerm.app, pick a folder, and it opens Terminal
with VoiceTerm running.
For model options and startup/IDE tuning:
VoiceTerm listens to your mic, converts speech to text on your machine, and types the result into your AI CLI input.
- macOS or Linux (Windows needs WSL2)
- Microphone access
- ~0.5 GB disk for the default small model (base is ~142 MB, medium is ~1.5 GB)
| Feature | What it does |
|---|---|
| Local speech-to-text | Whisper runs on your machine (no cloud calls) |
| Fast voice-to-text | Local Whisper turns speech into text quickly |
| Keep your CLI as-is | Your backend CLI layout and behavior stay the same |
| Auto voice mode | Keep listening on so you can talk instead of typing |
| Wake mode + voice send | Say hey codex/hey claude, then say send/submit in insert mode |
| Image prompts | Use Ctrl+X for one-shot screenshot prompts, or enable persistent image mode for HUD [rec] (IMG badge) |
| Transcript queue | If the CLI is busy, VoiceTerm waits and sends text when ready |
| Codex + Claude support | Primary support for Codex and Claude Code |
- Voice macros: expand phrases from
.voiceterm/macros.yaml(toggle in Settings) - Voice navigation: spoken
scroll,send,show last error,copy last error,explain last error - Dev mode tools: launch with
--devfirst (look forDEVbadge), then useCtrl+Dfor Dev panel tools; add--dev-logfor JSONL diagnostics - Prompt-safe HUD: VoiceTerm suppresses HUD rows for high-confidence Codex/Claude approval prompts and fences PTY scrolling above the HUD so the active input row stays visible
- Latency clarity: latency badges show completed-turn STT timing and hide while actively recording/processing
- Transcript history:
Ctrl+Hto search and replay past text - Notification history:
Ctrl+Nto review recent status messages - Saved settings: stored in
~/.config/voiceterm/config.toml - Built-in themes: 11 themes including ChatGPT, Catppuccin, Dracula, Nord, Tokyo Night, and Gruvbox
- Style-pack border settings:
VOICETERM_STYLE_PACK_JSONsupportscomponents.overlay_borderandcomponents.hud_border(HUD applies when border mode istheme)
For full behavior details and controls, see guides/USAGE.md.
Important: if you did not launch with --dev, Ctrl+D is forwarded to the
wrapped CLI as EOF (0x04) and can close/exit that CLI session.
Dev panel usage guide: guides/DEV_MODE.md
VoiceTerm is optimized for Codex and Claude Code. For full backend status and setup details, see Usage Guide -> Backend Support.
Use the same workflow and controls documented for backend support in guides/USAGE.md.
Active verified hosts are Cursor terminal and JetBrains terminals. AntiGravity is deferred and not supported in current releases.
| IDE host | Codex | Claude Code | Status |
|---|---|---|---|
| Cursor terminal | Fully supported | Fully supported | Recommended primary host |
JetBrains terminals (IntelliJ, PyCharm, WebStorm, CLion) |
Fully supported | Fully supported | Supported on current release; see troubleshooting for rare host-specific edge cases |
| AntiGravity | Not supported | Not supported | Deferred until runtime fingerprint evidence exists (not supported in current releases) |
| Other IDE terminals | Unverified | Unverified | Treat as experimental until listed here |
JetBrains + Claude rare edge case (long parallel turns): after very long parallel tool calls or parallel web-search turns, HUD/transcript overlap can appear at turn completion. Quick workaround: resize the terminal once (even by 1 row/column) to force layout recalculation. During these high-churn turns, VoiceTerm already applies a single-line full-HUD fallback for JetBrains+Claude to keep controls reachable while redraw settles. Details: Troubleshooting -> JetBrains + Claude overlay overlap after long parallel output.
Canonical matrix: Usage Guide -> IDE Compatibility.
voiceterm --auto-voice --wake-word --voice-send-mode insertThink of this like Alexa for your terminal:
- Say the wake phrase (
hey codexorhey claude) - Speak your prompt
- Say
send(orsubmit)
Press Ctrl+Y to open Theme Studio and choose Theme picker.
Use Ctrl+G to cycle themes quickly.
Use Tab / Shift+Tab to move between Theme Studio pages (Home, Colors,
Borders, Components, Preview, Export).
For editor details, see Themes.
For theme-file flags/env vars, see CLI Flags.
Mouse control is enabled by default. Open Settings with Ctrl+O.
Cursor note: when Mouse is ON, wheel/touchpad scrolling may not move chat
history, but the scrollbar can still be dragged. If you prefer touchpad/wheel
scrolling, set Mouse to OFF and use keyboard focus (Tab/arrows) + Enter
for HUD buttons.
For details, use:
Use Ctrl+H to open transcript history, type to filter, and press Enter to
replay into the active CLI input. Mouse click selection is also supported.
History rows are labeled by source (mic, you, ai); only mic and you
rows are replayable, and ai rows are output-only.
Detailed behavior: Transcript History.
Press ? to open grouped shortcuts (Recording, Mode, Appearance,
Sensitivity, Navigation) with clickable Docs/Troubleshooting links on
terminals that support clickable links. Details: Core Controls.
For shortcuts and behavior, see:
For CLI flags and command-line options:
voiceterm --help(orvoiceterm -h)- CLI Flags
Voice macros are project-local shortcuts in .voiceterm/macros.yaml.
Turn macros on in Settings when you want phrase expansion.
Setup and examples: Project Voice Macros.
| Audience | Document |
|---|---|
| User | Quick Start |
| User | Guides Index |
| User | Install Guide |
| User | Usage Guide |
| User | CLI Flags |
| User | Troubleshooting |
| Developer | Developer Index |
| Developer | Project Integrations Playbook |
| Developer | Engineering History |
- Troubleshooting: guides/TROUBLESHOOTING.md
- Bug reports and feature requests: GitHub Issues
- Security concerns: .github/SECURITY.md
PRs welcome. See CONTRIBUTING.md. Before opening a PR, run:
python3 dev/scripts/devctl.py check --profile prepushpython3 dev/scripts/devctl.py hygiene
MIT - LICENSE




