A Mac-native voice-to-text application with high-accuracy transcription powered by OpenAI Whisper.
Documentation • Installation • Usage • Contributing • Support
- Menu Bar Application: Resides in the menu bar for quick access
- Push-to-Talk Hotkey: Hold your configured hotkey to record, release to transcribe (default:
⌘+⌥) - Recording Status Indicator: The indicator clearly shows recording/processing/warning states
- Session-Safe Finalization: If you start another recording before a previous one finishes, each recording finalizes independently in stop order (no mixed chunks)
- High-Accuracy Transcription: Powered by OpenAI Whisper with optimized settings
- Automatic Text Input: Automatically types transcribed text at the cursor position
- Audio Preprocessing: Noise reduction with spectral subtraction and normalization
- History Management: Access past transcriptions from the history menu
- Audio File Import: Import
wav/mp3files for transcription - Auto-Launch: Toggle auto-start at login
- First-Time Setup: Comprehensive setup wizard for permissions and dependencies
- Open Source: Fully open-source under MIT License
- Install KotoType and FFmpeg (
brew install ffmpeg) - Complete all checks in the Initial Setup window
- Click a text field, hold hotkey to record, release to transcribe
- Download the latest KotoType.dmg
- Double-click the downloaded DMG file
- Drag KotoType.app to your Applications folder
- On first launch, click "Open" when prompted by the security warning
Update note: After upgrading KotoType to a newer version, macOS permissions must be granted again. Re-open KotoType and re-allow Accessibility, Microphone, and Screen Recording in
System Settings > Privacy & Securityif prompted.
KotoType validates ffmpeg during the initial setup and cannot proceed without it.
# Homebrew (recommended)
brew install ffmpeg
# Confirm availability
ffmpeg -versionAlternative package manager:
# MacPorts
sudo port install ffmpeg- macOS 13.0 or later
- Xcode 15.0 or later
- Python 3.13
- uv package manager
- Clone the repository
git clone https://github.com/ymuichiro/koto-type.git
cd koto-type- Install dependencies (including dev dependencies)
make install-deps- Build the application (Python + Swift)
make build-all- Create the .app bundle
cd KotoType
./scripts/create_app.sh- (Optional) Create the .dmg disk image
./scripts/create_dmg.shAll operations can be executed through the Makefile. To see available commands:
make helpmake run-app- Launch the Swift applicationmake run-server- Start the Python server (for testing)
make test-transcription- Run audio preprocessing and transcription-related unit testsmake test-user-dictionary- Run user dictionary unit testsmake test-smoke-server- Run packaged server smoke testsmake test-benchmark- Legacy alias formake test-smoke-servermake test-all- Run all tests
make build-server- Build Python server binary (PyInstaller)make build-app- Build Swift applicationmake build-all- Build both Python and Swiftmake install-deps- Install Python dependencies (including dev)
make clean- Remove temporary filesmake view-log- View server logsmake capture-artifacts- Collect crash-investigation artifacts intoartifacts/runtime/
When validating production-like behavior (especially recording start/stop), preserve evidence before and after each run:
# 1) Capture baseline
make capture-artifacts
# 2) Execute scenario (launch app, start recording, stop recording)
# 3) Capture post-run evidence
make capture-artifactsIf the machine crashes/reboots, run make capture-artifacts immediately after login.
Then append a summary entry to:
docs/testing/test-ledger.mddocs/operations/restart-storm-runbook.md
If installed from DMG:
- Launch KotoType from Launchpad
- Or run:
open /Applications/KotoType.app
If built from source:
# Using Makefile (recommended)
make run-app
# Or run directly
cd KotoType
swift runOn first launch, the "Initial Setup" screen will appear to verify the following:
- Accessibility Permissions (for keyboard input simulation)
- Microphone Permissions (for recording)
- Screen Recording Permissions (for screen context support)
- FFmpeg Command Availability
- Launch KotoType. The
Initial Setupwindow opens automatically. - Click Grant Accessibility.
- In
System Settings > Privacy & Security > Accessibility, enable KotoType. - Return to KotoType and click Re-check.
- If Accessibility does not switch to passed, wait a few seconds and click Re-check again. If still not reflected, click Restart App in the setup window and reopen.
- Click Grant Microphone, then choose Allow when macOS asks for permission.
- Click Grant Screen Recording, enable KotoType in
System Settings > Privacy & Security > Screen Recording, then return to the app. - If FFmpeg is not detected, install it:
brew install ffmpeg
ffmpeg -version- Click Re-check until all required items become passed.
- Click Finish setup and start to enter normal menu bar mode.
- Click the text field where you want output.
- Hold your hotkey while speaking.
- Release the hotkey and wait for automatic text insertion.
If you update KotoType, macOS permissions need to be assigned again for the new app version.
- Reopen KotoType after the update finishes.
- Re-check Accessibility, Microphone, and Screen Recording in
System Settings > Privacy & Security. - If KotoType shows the setup window again, complete the permission checks before using dictation.
- App blocked and won’t open: System Settings > Privacy & Security >
Open Anyway - Hotkey does not start recording: Re-enable KotoType in Accessibility
- Recorded but text not inserted: Confirm editable cursor focus + Accessibility permission
- FFmpeg check failed:
brew install ffmpegthenffmpeg -version, reopen app
If the FFmpeg check fails, install or reinstall FFmpeg and restart KotoType:
brew install ffmpeg
ffmpeg -versionNote: Due to licensing considerations, FFmpeg is not bundled with the distribution.
You must haveffmpeginstalled on your system (e.g.,brew install ffmpeg).The distributed
.app/.dmgincludes thewhisper_serverbinary, so Python and uv are not required for end users. During development only, if the bundled binary is not found, it falls back touv run/.venvexecution.
- Confirm setup is complete and the KotoType icon is visible in the menu bar.
- Open the app where you want text input (Notes, Slack, browser form, etc.).
- Click the target text field so the cursor is active.
- Hold the configured hotkey (default: ⌘+⌥) to begin recording.
- Keep holding the hotkey while speaking.
- Release the hotkey to stop recording and trigger finalization.
- After you release the hotkey, KotoType finalizes transcription and inserts only finalized text.
- The live-dictation processing timer starts only after you release the hotkey. It covers queue wait, transcription, and final text insertion. Time while you are still holding the hotkey and recording is not counted.
- The default post-recording finalize timeout is 10 minutes, and values above 10 minutes are not available. You can change it in "Settings... > Batch > Post-recording finalize timeout".
- If you start another recording before a previous finalize completes, KotoType queues both and finalizes each recording independently in stop order.
- If no text is inserted, re-check Accessibility permission and confirm the cursor is focused in an editable field.
- (Optional) Change hotkey: "Settings... > Hotkey"
- (Optional) Transcribe files: "Import Audio File..." (
wav/mp3) - (Optional) Review past results: "History..."
- Quit app: "Quit" or Cmd+Q
This app is currently distributed without Apple Developer ID signing or notarization, so you may see a Gatekeeper warning on first launch.
Use the steps for your macOS version when KotoType is blocked.
| macOS version | What to do |
|---|---|
| 26 | 1) Try to open KotoType once from Finder. 2) Open System Settings > Privacy & Security. 3) In Security, click Open Anyway and confirm. |
| 15 / 14 | 1) Try to open KotoType once from Finder. 2) Open System Settings > Privacy & Security. 3) In Security, click Open Anyway and confirm. |
Apple references:
Open Anywayis a Gatekeeper setting in Privacy & Security. Firewall settings are not part of this launch flow in Apple’s documented steps.
Method 1: Right-click and "Open"
- Right-click or Ctrl+click on the app
- Select "Open"
Method 2: Allow in Privacy & Security
- Go to System Settings, then open Privacy & Security
- Click "Open"
This is normal behavior for apps that are not Developer ID signed and notarized. After approval, the app will launch normally.
koto-type/
├── python/
│ └── whisper_server.py # Whisper server
├── tests/
│ └── python/ # Python tests
│ ├── smoke_whisper_server_binary.py
│ ├── test_audio_preprocess.py
│ └── test_user_dictionary.py
├── KotoType/ # Swift app
│ ├── Sources/KotoType/
│ │ ├── App/ # Entry point and path resolution
│ │ ├── Audio/ # Recording
│ │ ├── Input/ # Hotkey/input handling
│ │ ├── Transcription/ # Python process communication & batch control
│ │ ├── UI/ # Menu bar/settings UI
│ │ └── Support/ # Logger/settings/permissions
│ ├── Package.swift
│ └── scripts/
│ ├── create_app.sh # App creation script
│ └── create_dmg.sh # DMG creation script
├── pyproject.toml # Python dependencies
├── LICENSE # MIT License
└── README.md # This file
# Transcription tests
make test-transcription
# Packaged server smoke tests
make test-smoke-server
# Compatibility alias for the packaged server smoke test
make test-benchmark
# Run all tests
make test-all# Audio preprocessing / transcription-related tests
uv run python tests/python/test_audio_preprocess.py
# User dictionary tests
uv run python tests/python/test_user_dictionary.py
# Packaged server smoke test (after `make build-server`)
uv run python tests/python/smoke_whisper_server_binary.py dist/whisper_server# Using Makefile
make view-log
# Or directly
tail -100 ~/Library/Application\ Support/koto-type/server.logNoise reduction is enabled by default in audio preprocessing. To disable it for compatibility reasons:
export KOTOTYPE_ENABLE_NOISE_REDUCTION=0Enabled by default. Automatically amplifies quiet audio before transcription.
export KOTOTYPE_AUTO_GAIN_ENABLED=1Adjust threshold and amplification limits as needed:
export KOTOTYPE_AUTO_GAIN_WEAK_THRESHOLD_DBFS=-18
export KOTOTYPE_AUTO_GAIN_TARGET_PEAK_DBFS=-10
export KOTOTYPE_AUTO_GAIN_MAX_DB=18By default, VAD is set slightly stricter for noisy environments. To revert to traditional settings:
export KOTOTYPE_VAD_STRICT=0# Type checking (ty)
.venv/bin/ty check python/
# Linting (ruff)
.venv/bin/ruff check python tests/python
# Formatting
.venv/bin/ruff format python tests/python# Using Makefile (recommended)
make build-all # Build both Python and Swift
make build-server # Build Python server binary only
make build-app # Build Swift app only
# Or run directly
cd KotoType
swift build# Using Makefile (recommended)
make run-app
# Or run directly
cd KotoType
swift run# Using Makefile (recommended)
make install-deps
# Or run directly
uv sync --extra dev# Using Makefile
make clean# Complete build process
make install-deps # Install dependencies
make build-all # Build Python + Swift
cd KotoType
./scripts/create_app.sh # Create .app bundle
./scripts/create_update_zip.sh # Create Sparkle update .zip
./scripts/create_dmg.sh # Create .dmg disk image (optional)- Pushing to
mainbranch triggers GitHub Actions to create a tag in formatv<VERSION>.<run_number> - Tag push triggers
.github/workflows/release.ymlto build.dmg/.zip/appcast.xml - Generated artifacts are automatically attached to the GitHub Release
appcast.xmlmust be regenerated for every release (Sparkle update feed)- Required GitHub secrets for Sparkle signing:
SPARKLE_PUBLIC_ED_KEY(embedded in app bundle)SPARKLE_PRIVATE_ED_KEY(used only in CI for appcast/update signing)
- Release DMG does not include FFmpeg, so the initial setup requires system
ffmpeg
Update the VERSION file to reflect in future tags and distribution versions.
The Python server is packaged as a single executable using PyInstaller:
- Command:
uv run --extra dev pyinstaller --onefile --name whisper_server ... - Output:
dist/whisper_server - Embedded at:
.app/Contents/Resources/whisper_server - C Extensions: faster-whisper and ctranslate2 are automatically collected
Allow microphone access on first launch when prompted.
The Whisper model (large-v3-turbo) is downloaded on first launch. Expect a several-GB initial download.
Enable KotoType in System Settings > Privacy & Security > Accessibility, and confirm you are holding the configured hotkey (default: ⌘+⌥).
When creating distribution .app/.dmg, run make build-server to create dist/whisper_server before running ./scripts/create_app.sh.
Ensure dev dependencies are properly installed:
make install-depsRelease binaries are available on the Releases page.
We welcome contributions from the community! Please see CONTRIBUTING.md for guidelines on how to contribute.
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/koto-type.git
cd koto-type
# Install dependencies
make install-deps
# Run tests
make test-all
# Start development
make run-appSee CONTRIBUTING.md for detailed information on:
- Development environment setup
- Code style guidelines
- Testing requirements
- Pull request process
- Reporting bugs and feature requests
MIT License © 2025 KotoType Contributors
- CHANGELOG.md: Version history and release notes
- DESIGN.md: Technical design documentation
- SECURITY.md: Security policy and reporting
- SUPPORT.md: Getting help and troubleshooting
- docs/operations/sparkle-release-runbook.md: Sparkle release and key-management runbook
- OpenAI Whisper for the speech recognition model
- faster-whisper for the optimized Whisper implementation
- The open-source community for various tools and libraries
Made with ❤️ by KotoType Contributors
- Microphone Permission: Must be granted in System Settings
- Accessibility Permission: Required for hotkeys and keyboard simulation
- Whisper Model:
large-v3-turbodownload on first launch (several GB)
- Multiple transcription language options
- Customizable keyboard shortcuts
- Manual update checks (Sparkle + appcast feed)
- Automatic update checks and installation
- Enhanced settings UI
