A complete AI voice agent for Android — say anything, it does it.
AndroidAutopilot turns your Android phone into an AI-driven autopilot. You speak a command and the app:
- Listens to your voice via the built-in microphone
- Reads the screen via the Accessibility Service (full UI hierarchy dump)
- Thinks using Claude Opus 4.6 with Extended Thinking (via Anthropic API or Orbit Provider)
- Acts — taps, scrolls, types, opens apps, searches the web, all without any human interaction
- Speaks the result back to you via Text-to-Speech
- Loops until the task is complete (multi-step agent loop)
When the Claude context window fills up, Gemini automatically summarises all past conversation/actions into a compact memory block so the session continues seamlessly.
AndroidAutopilot/
├── api/
│ ├── ClaudeApiClient.kt — Claude Opus 4.6 + Extended Thinking
│ └── GeminiApiClient.kt — Gemini for memory compression
├── managers/
│ ├── MemoryManager.kt — Raw conversation history + auto-compression
│ ├── VoiceManager.kt — STT (SpeechRecognizer) + TTS (TextToSpeech)
│ └── SettingsManager.kt — SharedPreferences persistence
├── models/
│ ├── ConversationMessage.kt — Chat data models + Action types
│ └── ApiModels.kt — Claude / Gemini request-response DTOs
├── services/
│ ├── AutopilotAccessibilityService.kt — Screen reader + gesture dispatcher
│ ├── AgentService.kt — Foreground service orchestrating the agent loop
│ └── OverlayService.kt — Floating mic bubble overlay
├── MainActivity.kt — Main UI with chat history
└── SettingsActivity.kt — API keys + configuration
git clone <repo-url>
# Open the root folder in Android Studio Hedgehog or laterOpen the app → tap the gear icon → enter:
| Setting | Value |
|---|---|
| API Provider | Anthropic (official) or Orbit Provider |
| Anthropic API Key | sk-ant-... from console.anthropic.com |
| Orbit API Key | Your key from orbit-provider.com/dashboard/billing |
| Claude Model | claude-opus-4-6 (default) |
| Thinking Budget | 10 000 tokens (adjust up for harder tasks) |
| Max Context Tokens | 150 000 (triggers Gemini compression when reached) |
| Gemini API Key | AIza... from aistudio.google.com |
| Gemini Model | gemini-2.5-pro-exp-03-25 |
The app will guide you through:
- Microphone — for voice input
- Accessibility Service — go to Settings → Accessibility → AndroidAutopilot Agent → Enable
- Draw Over Other Apps — for the floating mic bubble
./gradlew assembleDebug
adb install app/build/outputs/apk/debug/app-debug.apkEvery conversation turn is stored raw as the full message text:
[User] "Search for the best pizza near me"
[AI] "I'll search for that right now. ..." + JSON actions
[User] "[System: Actions executed. Updated screen below...]"
[AI] "I can see the search results. ..." + JSON actions
...
When the estimated token count reaches maxContextTokens (default 150 000), Gemini is called to compress the entire history into a short summary. Claude then continues from this summary — no context is permanently lost.
Claude returns a JSON block with actions:
| Action | Description |
|---|---|
TAP |
Tap screen coordinate (x, y) |
LONG_PRESS |
Long-press at (x, y) |
SWIPE |
Swipe from (x,y) to (endX,endY) |
TYPE_TEXT |
Type text into focused field |
CLEAR_TEXT |
Clear focused field |
SCROLL_UP/DOWN/LEFT/RIGHT |
Scroll screen |
PRESS_BACK/HOME/RECENTS |
System navigation |
OPEN_APP |
Launch app by package name |
OPEN_URL |
Open URL in default browser |
SEARCH_WEB |
Google search directly |
FIND_AND_TAP |
Find UI element by text, tap it |
SPEAK |
Say text via TTS |
WAIT |
Pause for N milliseconds |
TAKE_SCREENSHOT |
Capture screen (next turn gets fresh UI) |
- "Search for the weather in London"
- "Open YouTube and search for relaxing jazz music"
- "Send a WhatsApp message to John saying I'm running late"
- "Take a screenshot and tell me what's on screen"
- "Open Settings and turn on airplane mode"
- "Go to my emails and read the latest unread message"
- Android 8.0+ (API 26+)
- Internet connection
- Claude API key (Anthropic or Orbit)
- Gemini API key (Google AI Studio — free tier available)