AndroidAutopilot

A complete AI voice agent for Android — say anything, it does it.

What it does

AndroidAutopilot turns your Android phone into an AI-driven autopilot. You speak a command and the app:

Listens to your voice via the built-in microphone
Reads the screen via the Accessibility Service (full UI hierarchy dump)
Thinks using Claude Opus 4.6 with Extended Thinking (via Anthropic API or Orbit Provider)
Acts — taps, scrolls, types, opens apps, searches the web, all without any human interaction
Speaks the result back to you via Text-to-Speech
Loops until the task is complete (multi-step agent loop)

When the Claude context window fills up, Gemini automatically summarises all past conversation/actions into a compact memory block so the session continues seamlessly.

Architecture

AndroidAutopilot/
├── api/
│   ├── ClaudeApiClient.kt       — Claude Opus 4.6 + Extended Thinking
│   └── GeminiApiClient.kt       — Gemini for memory compression
├── managers/
│   ├── MemoryManager.kt         — Raw conversation history + auto-compression
│   ├── VoiceManager.kt          — STT (SpeechRecognizer) + TTS (TextToSpeech)
│   └── SettingsManager.kt       — SharedPreferences persistence
├── models/
│   ├── ConversationMessage.kt   — Chat data models + Action types
│   └── ApiModels.kt             — Claude / Gemini request-response DTOs
├── services/
│   ├── AutopilotAccessibilityService.kt  — Screen reader + gesture dispatcher
│   ├── AgentService.kt          — Foreground service orchestrating the agent loop
│   └── OverlayService.kt        — Floating mic bubble overlay
├── MainActivity.kt              — Main UI with chat history
└── SettingsActivity.kt          — API keys + configuration

Setup

1. Clone and open in Android Studio

git clone <repo-url>
# Open the root folder in Android Studio Hedgehog or later

2. Configure API Keys

Open the app → tap the gear icon → enter:

Setting	Value
API Provider	Anthropic (official) or Orbit Provider
Anthropic API Key	`sk-ant-...` from console.anthropic.com
Orbit API Key	Your key from orbit-provider.com/dashboard/billing
Claude Model	`claude-opus-4-6` (default)
Thinking Budget	10 000 tokens (adjust up for harder tasks)
Max Context Tokens	150 000 (triggers Gemini compression when reached)
Gemini API Key	`AIza...` from aistudio.google.com
Gemini Model	`gemini-2.5-pro-exp-03-25`

3. Grant Permissions

The app will guide you through:

Microphone — for voice input
Accessibility Service — go to Settings → Accessibility → AndroidAutopilot Agent → Enable
Draw Over Other Apps — for the floating mic bubble

4. Build and Install

./gradlew assembleDebug
adb install app/build/outputs/apk/debug/app-debug.apk

How Memory Works

Every conversation turn is stored raw as the full message text:

[User]  "Search for the best pizza near me"
[AI]    "I'll search for that right now. ..." + JSON actions
[User]  "[System: Actions executed. Updated screen below...]"
[AI]    "I can see the search results. ..." + JSON actions
...

When the estimated token count reaches maxContextTokens (default 150 000), Gemini is called to compress the entire history into a short summary. Claude then continues from this summary — no context is permanently lost.

Action Types

Claude returns a JSON block with actions:

Action	Description
`TAP`	Tap screen coordinate (x, y)
`LONG_PRESS`	Long-press at (x, y)
`SWIPE`	Swipe from (x,y) to (endX,endY)
`TYPE_TEXT`	Type text into focused field
`CLEAR_TEXT`	Clear focused field
`SCROLL_UP/DOWN/LEFT/RIGHT`	Scroll screen
`PRESS_BACK/HOME/RECENTS`	System navigation
`OPEN_APP`	Launch app by package name
`OPEN_URL`	Open URL in default browser
`SEARCH_WEB`	Google search directly
`FIND_AND_TAP`	Find UI element by text, tap it
`SPEAK`	Say text via TTS
`WAIT`	Pause for N milliseconds
`TAKE_SCREENSHOT`	Capture screen (next turn gets fresh UI)

Example Commands

"Search for the weather in London"
"Open YouTube and search for relaxing jazz music"
"Send a WhatsApp message to John saying I'm running late"
"Take a screenshot and tell me what's on screen"
"Open Settings and turn on airplane mode"
"Go to my emails and read the latest unread message"

Requirements

Android 8.0+ (API 26+)
Internet connection
Claude API key (Anthropic or Orbit)
Gemini API key (Google AI Studio — free tier available)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
app		app
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AndroidAutopilot

What it does

Architecture

Setup

1. Clone and open in Android Studio

2. Configure API Keys

3. Grant Permissions

4. Build and Install

How Memory Works

Action Types

Example Commands

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AndroidAutopilot

What it does

Architecture

Setup

1. Clone and open in Android Studio

2. Configure API Keys

3. Grant Permissions

4. Build and Install

How Memory Works

Action Types

Example Commands

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages