Skip to content

ENZIO3/ANDROID-AUTOPILOT-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AndroidAutopilot

A complete AI voice agent for Android — say anything, it does it.

What it does

AndroidAutopilot turns your Android phone into an AI-driven autopilot. You speak a command and the app:

  1. Listens to your voice via the built-in microphone
  2. Reads the screen via the Accessibility Service (full UI hierarchy dump)
  3. Thinks using Claude Opus 4.6 with Extended Thinking (via Anthropic API or Orbit Provider)
  4. Acts — taps, scrolls, types, opens apps, searches the web, all without any human interaction
  5. Speaks the result back to you via Text-to-Speech
  6. Loops until the task is complete (multi-step agent loop)

When the Claude context window fills up, Gemini automatically summarises all past conversation/actions into a compact memory block so the session continues seamlessly.


Architecture

AndroidAutopilot/
├── api/
│   ├── ClaudeApiClient.kt       — Claude Opus 4.6 + Extended Thinking
│   └── GeminiApiClient.kt       — Gemini for memory compression
├── managers/
│   ├── MemoryManager.kt         — Raw conversation history + auto-compression
│   ├── VoiceManager.kt          — STT (SpeechRecognizer) + TTS (TextToSpeech)
│   └── SettingsManager.kt       — SharedPreferences persistence
├── models/
│   ├── ConversationMessage.kt   — Chat data models + Action types
│   └── ApiModels.kt             — Claude / Gemini request-response DTOs
├── services/
│   ├── AutopilotAccessibilityService.kt  — Screen reader + gesture dispatcher
│   ├── AgentService.kt          — Foreground service orchestrating the agent loop
│   └── OverlayService.kt        — Floating mic bubble overlay
├── MainActivity.kt              — Main UI with chat history
└── SettingsActivity.kt          — API keys + configuration

Setup

1. Clone and open in Android Studio

git clone <repo-url>
# Open the root folder in Android Studio Hedgehog or later

2. Configure API Keys

Open the app → tap the gear icon → enter:

Setting Value
API Provider Anthropic (official) or Orbit Provider
Anthropic API Key sk-ant-... from console.anthropic.com
Orbit API Key Your key from orbit-provider.com/dashboard/billing
Claude Model claude-opus-4-6 (default)
Thinking Budget 10 000 tokens (adjust up for harder tasks)
Max Context Tokens 150 000 (triggers Gemini compression when reached)
Gemini API Key AIza... from aistudio.google.com
Gemini Model gemini-2.5-pro-exp-03-25

3. Grant Permissions

The app will guide you through:

  1. Microphone — for voice input
  2. Accessibility Service — go to Settings → Accessibility → AndroidAutopilot Agent → Enable
  3. Draw Over Other Apps — for the floating mic bubble

4. Build and Install

./gradlew assembleDebug
adb install app/build/outputs/apk/debug/app-debug.apk

How Memory Works

Every conversation turn is stored raw as the full message text:

[User]  "Search for the best pizza near me"
[AI]    "I'll search for that right now. ..." + JSON actions
[User]  "[System: Actions executed. Updated screen below...]"
[AI]    "I can see the search results. ..." + JSON actions
...

When the estimated token count reaches maxContextTokens (default 150 000), Gemini is called to compress the entire history into a short summary. Claude then continues from this summary — no context is permanently lost.


Action Types

Claude returns a JSON block with actions:

Action Description
TAP Tap screen coordinate (x, y)
LONG_PRESS Long-press at (x, y)
SWIPE Swipe from (x,y) to (endX,endY)
TYPE_TEXT Type text into focused field
CLEAR_TEXT Clear focused field
SCROLL_UP/DOWN/LEFT/RIGHT Scroll screen
PRESS_BACK/HOME/RECENTS System navigation
OPEN_APP Launch app by package name
OPEN_URL Open URL in default browser
SEARCH_WEB Google search directly
FIND_AND_TAP Find UI element by text, tap it
SPEAK Say text via TTS
WAIT Pause for N milliseconds
TAKE_SCREENSHOT Capture screen (next turn gets fresh UI)

Example Commands

  • "Search for the weather in London"
  • "Open YouTube and search for relaxing jazz music"
  • "Send a WhatsApp message to John saying I'm running late"
  • "Take a screenshot and tell me what's on screen"
  • "Open Settings and turn on airplane mode"
  • "Go to my emails and read the latest unread message"

Requirements

  • Android 8.0+ (API 26+)
  • Internet connection
  • Claude API key (Anthropic or Orbit)
  • Gemini API key (Google AI Studio — free tier available)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages