Universal AI Launcher

A voice-controlled desktop application for Windows, macOS, and Linux — open apps, close browser tabs, run scripts, and ask AI questions, all by speaking

"The best interface is no interface."

1. Introduction — What This Project Is

Desktop launchers have existed for decades. Every major operating system ships one. Not one of them solves the problem correctly. Spotlight requires typing. PowerToys Run requires typing. Albert and Rofi require typing. They reduce mouse dependency but do not eliminate it, and none of them can close a specific browser tab, execute a developer script, or answer a question — all of which are things developers do dozens of times a day.

Universal AI Launcher is a voice-controlled desktop application built natively for Windows, macOS, and Linux. The user presses a single hotkey, speaks a command, and the application executes it — opening applications, closing individual browser tabs without touching the browser itself, running custom scripts, or querying an AI assistant — then confirms the result aloud. The entire critical path works completely offline. AI features are additive, not required.

This is not an Electron application wrapped in a shell. It is a native Python application compiled to a standalone executable for each platform, with no runtime dependencies required from the end user.

What Makes This Different

Three decisions separate this project from every other launcher in the space. First, the entire interaction model is voice-first — not voice-added. Every component was designed for spoken commands from the ground up. Second, the application is tab-aware: saying "close YouTube" closes the YouTube tab, not the browser. This requires connecting to the browser's internal debug protocol, and no other open-source launcher implements it. Third, each platform — Windows, macOS, Linux — has its own dedicated codebase using the correct native mechanisms for that platform. Nothing is cross-platform by assumption.

Who This Is For

Universal AI Launcher is built for developers. Specifically, developers who spend their working day switching between tools, running scripts, and managing browser sessions — and who want to do all of it without touching a mouse or breaking their flow.

What This Repository Contains

Current status: This repository is the complete theoretical and architectural foundation of Universal AI Launcher. The application has not yet been built. Every design decision, module responsibility, platform-specific implementation detail, and failure mode is fully documented here. When the application is built, it will be published as a versioned GitHub Release attached to this repository.

2. Version Overview

Universal AI Launcher ships in two versions.

Version 1 — Core Launcher is a completely offline, voice-controlled desktop launcher. It opens and closes applications, manages browser tabs, and provides spoken feedback — with no AI, no external APIs, and no network dependency of any kind.

Version 2 — Developer Edition inherits everything from Version 1 and adds GPT-4o for AI question answering, Sarvam AI for human-quality voice responses, custom script execution bound to voice commands, multi-app command groups, and a full settings panel with encrypted API key storage.

Feature	V1 Core	V2 Developer
Global hotkey activation	✅	✅
Offline voice recognition (Vosk)	✅	✅
Open installed applications	✅	✅
Close running applications	✅	✅
Open websites in new browser tab	✅	✅
Close specific browser tabs	✅	✅
Robotic TTS voice feedback (pyttsx3)	✅	✅
Visual waveform during listening	✅	✅
GPT-4o AI assistant	❌	✅
Sarvam AI human voice output	❌	✅
Custom script upload (.py and .bat)	❌	✅
Voice command groups	❌	✅
Settings and customisation panel	❌	✅
Encrypted API key storage	❌	✅
Configurable hotkey	❌	✅

3. How It Works

The user presses the global hotkey — Ctrl+Space on Windows and Linux, Cmd+Shift+Space on macOS. The launcher window appears at the centre of the screen and the microphone activates. The user speaks a command. Vosk transcribes it offline. The Intent Parser classifies it as one of four types — OPEN, CLOSE, RUN, or ASK — and routes it to the correct handler. The result is spoken aloud and the window closes after three seconds.

Figure 1 — The complete command pipeline from hotkey activation to spoken confirmation, showing all four intent paths and the shared input layer.

The four command types work as follows. An OPEN command first checks whether the target is a website (by domain pattern) or an installed application (by name match), then either opens a new browser tab or launches the application — focusing it instead if it is already running. A CLOSE command checks whether the target is an application or a website tab, then either terminates the process or closes matching tabs via the Chrome DevTools Protocol without touching the browser itself. A RUN command (Developer Edition) looks up the spoken phrase in commands.json, executes the bound script silently in the background, and speaks any captured output. An ASK command (Developer Edition) sends the question to GPT-4o — with real system data injected for PC-related questions — and speaks the response in Sarvam AI's human voice.

4. Platform Support

Each platform has its own source directory. The shared module contains voice engine, intent parser, and audio output logic that is identical across all three. Platform-specific modules handle app scanning, process management, window focus, hotkey registration, and browser control using the correct native mechanisms for each operating system.

Capability	Windows	macOS	Linux
App scanning	Registry + Start Menu `.lnk` files	`/Applications` + Spotlight	`.desktop` files + Flatpak + Snap
Launch application	`subprocess`	`open -a` command	`subprocess`
Focus running app	`win32gui`	AppleScript	`xdotool`
Close application	`taskkill`	`osascript` quit + `pkill`	`pkill`
Browser tab control	CDP via `pychrome`	CDP via `pychrome`	CDP via `pychrome`
Global hotkey	`keyboard` library	`pynput`	`pynput` (X11) / `ydotool` (Wayland)
TTS voice engine	SAPI5 via `pyttsx3`	`say` command via `pyttsx3`	`espeak-ng` via `pyttsx3`
Packaged output	`.exe`	`.app`	`.AppImage`

5. Tech Stack

Layer	Technology	Purpose
Voice input	Vosk	Offline speech recognition — no internet required
Intent classification	Custom rule-based parser	OPEN / CLOSE / RUN / ASK detection
App discovery	`winreg`, `plistlib`, `configparser`	Platform-native app scanning
Process management	`psutil`	Cross-platform process detection
App launch and close	`subprocess`, `win32gui`, AppleScript, `xdotool`	Platform-appropriate app control
Browser tab control	`pychrome` (Chrome DevTools Protocol)	Tab-level browser control
Hotkey registration	`keyboard` (Windows), `pynput` (macOS/Linux), `ydotool` (Wayland)	Global keyboard shortcut
Voice output (V1)	`pyttsx3`	Offline robotic TTS feedback
AI assistant (V2)	OpenAI GPT-4o	Natural language question answering
Human voice (V2)	Sarvam AI TTS API	Human-quality voice responses
API key storage (V2)	`cryptography` (Fernet) + `keyring`	Encrypted key storage in OS keychain
UI	PyQt6	Native cross-platform desktop UI
Packaging	PyInstaller	Single executable for each platform

6. Repository Structure

universal-ai-launcher/
│
├── README.md                         # This file
├── CHANGELOG.md                      # Version history
├── LICENSE                           # MIT License
│
├── docs/
│   ├── 01-overview.md                # Project overview and version comparison
│   ├── 02-architecture.md            # Full system architecture and module specs
│   ├── 03-version-roadmap.md         # Staged delivery plan with validation criteria
│   ├── 04-how-it-works.md            # Complete operational flow for all command types
│   ├── 05-windows.md                 # Windows-specific implementation
│   ├── 06-macos.md                   # macOS-specific implementation
│   ├── 07-linux.md                   # Linux-specific implementation
│   ├── 08-ai-layer.md                # Developer Edition AI features
│   ├── 09-developer-edition.md       # Scripts, command groups, and settings panel
│   └── 10-fallbacks-and-solutions.md # All 20 failure modes with exact responses
│
├── shared/
│   ├── voiceEngine.py                # Vosk base class
│   ├── intentParser.py               # Intent classification logic
│   └── audioOutput.py               # Audio queue management
│
├── windows/src/                      # Windows implementation
├── macos/src/                        # macOS implementation
├── linux/src/                        # Linux implementation
│
└── assets/
    └── flowcharts/                   # Architecture and flow diagrams

7. Documentation

The complete specification is in the docs/ directory. The recommended reading order for a new contributor is: overview → architecture → how it works → your target platform → fallbacks. The version roadmap and developer edition documents are relevant when extending the application beyond Version 1.

Document	What It Covers
`01-overview.md`	What the project is, who it is for, version comparison
`02-architecture.md`	Three-layer architecture, all module specifications, threading model
`03-version-roadmap.md`	Nine staged delivery plan with validation criteria per stage
`04-how-it-works.md`	Full command flow trace for OPEN, CLOSE, RUN, and ASK
`05-windows.md`	Registry scanning, win32gui, taskkill, CDP, PyInstaller on Windows
`06-macos.md`	Permissions, plistlib, AppleScript, pynput, Gatekeeper on macOS
`07-linux.md`	.desktop parsing, Wayland/X11 hotkey, espeak, AppImage on Linux
`08-ai-layer.md`	GPT-4o integration, Sarvam TTS, encrypted key storage, all AI failures
`09-developer-edition.md`	Script manager, command groups, settings panel, complete feature reference
`10-fallbacks-and-solutions.md`	All 20 failure modes — cause, detection, and exact user-facing response

8. Current Status and Roadmap

This project is currently in the architecture specification phase. No code has been written. The documentation in this repository is the complete pre-build specification.

The build plan is structured in nine stages across two versions. Each stage has a specific validation criterion — a concrete, observable outcome that confirms the stage is complete.

Version 1 — Core Launcher (Stages 1–5)

Stage 1 — Voice Input and Hotkey: global hotkey activates Vosk, spoken text returned correctly on all three platforms. Stage 2 — App Discovery and Control: apps open, focus, and close by voice on all three platforms. Stage 3 — Browser Tab Control: websites open in new tabs, specific tabs close without closing the browser. Stage 4 — Error Handling: all 20 failure modes produce their defined spoken responses with no crashes. Stage 5 — Packaging: single executable for each platform passes all validation criteria on a clean machine.

Version 2 — Developer Edition (Stages 6–9)

Stage 6 — AI Assistant and Human Voice: GPT-4o answers questions via Sarvam AI voice, with pyttsx3 fallback. Stage 7 — Custom Script Execution: scripts upload, bind, execute, and return spoken output correctly. Stage 8 — Settings Panel: all configurable options apply instantly without restart. Stage 9 — V2 Packaging: Developer Edition executable passes all V1 and V2 validation criteria.

9. License

MIT License. See LICENSE for full terms.

Document Version: 1.0 — April 2026 Status: Architecture specification — application not yet built Contributions, questions, and challenges to the architecture are welcome via GitHub Issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal AI Launcher

A voice-controlled desktop application for Windows, macOS, and Linux — open apps, close browser tabs, run scripts, and ask AI questions, all by speaking

1. Introduction — What This Project Is

What Makes This Different

Who This Is For

What This Repository Contains

2. Version Overview

3. How It Works

4. Platform Support

5. Tech Stack

6. Repository Structure

7. Documentation

8. Current Status and Roadmap

9. License

About

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets/flowchart		assets/flowchart
docs		docs
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Universal AI Launcher

A voice-controlled desktop application for Windows, macOS, and Linux — open apps, close browser tabs, run scripts, and ask AI questions, all by speaking

1. Introduction — What This Project Is

What Makes This Different

Who This Is For

What This Repository Contains

2. Version Overview

3. How It Works

4. Platform Support

5. Tech Stack

6. Repository Structure

7. Documentation

8. Current Status and Roadmap

9. License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!