Open Source · MIT License

Tell your phone
what to do.

AppClaw is an AI agent that automates any mobile app. Describe a goal in plain English — it sees the screen, reasons about what to do, and executes actions across Android and iOS.

Works with Claude, GPT-4, Gemini, Groq, and Ollama

AppClaw demo — AI automating a mobile app in real time
appclaw
$ npm start "Send a WhatsApp message to Mom saying good morning"   1/30 launch → "com.whatsapp" (Open WhatsApp) 2/30 find_and_tap → "Mom" (Find Mom in chat list) 3/30 type → "message_input" text="good morning" 4/30 submit_message (Find and tap Send button) 5/30 done (Message sent successfully)   ✓ Goal completed in 5 steps   ── Token Usage ────────────── Model: gemini-2.0-flash Total: 15,425 tokens Est. cost: $0.001874

Works with any LLM provider

A
Anthropic
O
OpenAI
G
Google
G
Groq
O
Ollama
Capabilities

Built for real-world
mobile automation

An agentic AI loop that perceives, reasons, and acts on any mobile app — no selectors, no scripting.

Visual Perception

Parses native UI hierarchy (XML) into structured elements. Understands buttons, inputs, toggles, and text — no brittle selectors needed.

Natural Language Goals

Describe what you want in plain English. The AI figures out the sequence of taps, types, scrolls, and swipes to get it done.

Self-Healing Recovery

Stuck detection, checkpoint rollback, and alternative path suggestions. Adapts when UI changes or actions fail.

Cross-Platform

Android via UiAutomator2 and iOS via XCUITest. Same agent, same goals — different platforms handled transparently.

Record & Replay

Record goal executions and replay without LLM costs. Adaptive replayer handles layout changes across runs.

Human-in-the-Loop

Pauses for OTP codes, CAPTCHAs, or ambiguous choices. Asks the user and resumes seamlessly.

The Loop

Perceive. Reason. Act.

Every step follows the same agentic loop until the goal is done or max steps are reached.

1

Perceive

Reads the live screen via appium-mcp's page source. Parses native XML into a structured list of UI elements.

2

Reason

Sends the goal + current screen state to an LLM. Gets back a JSON action decision — what to tap, type, or swipe next.

3

Act

Executes the chosen action via appium-mcp tools — click, set_value, scroll, swipe, launch, and 26 more.

4

Repeat

Loops back to step 1 with the new screen state. Continues until the goal is achieved or max steps reached.

Architecture

Pure agentic brain,
zero device logic

AppClaw is the decision-maker. appium-mcp handles the device. Clean separation via MCP protocol.

Powered by the Model Context Protocol

AppClaw consumes appium-mcp's 32 tools over stdio or SSE. It never touches the device directly — it only decides which tool to call next.

32 MCP Tools

Tap, type, swipe, screenshot, launch, install, and more

Multi-Provider LLMs

Swap between Claude, GPT-4, Gemini, Groq, or local Ollama

Goal Decomposition

Complex multi-app goals are split into sequential sub-goals

Cloud or Local Devices

USB, emulator, simulator, or remote device farms via SSE

AppClaw (TypeScript)
Agentic Loop Perception Skills
LLM Layer Recovery Recorder
MCP Protocol
appium-mcp (32 tools)
tap type swipe screenshot launch ...
Appium
UiAutomator2 XCUITest
Android / iOS Device
Power Features

Beyond simple automation

Built-in intelligence for the hardest parts of mobile automation.

Smart Actions

16 Agent Actions with Built-in Skills

From simple taps to multi-step compound operations. Smart typing detects non-editable wrappers and finds the real input. submit_message works across WhatsApp, Telegram, Slack, and more.

tap type swipe find_and_tap read_screen ask_user
// Smart Type: auto-detects real input
smart_type "search_field"
  Target is non-editable wrapper
  Clicking to navigate...
  Re-reading page source...
  Found EditText (id: input_23)
  ✓ Typed into real input field
Recording

Record & Adaptive Replay

Record any goal execution and replay it without LLM costs. The replayer doesn't blindly repeat coordinates — it reads the current screen, matches elements, and adapts to layout changes.

--record --replay Adaptive
Planning

Goal Decomposition

Complex multi-app tasks are automatically broken into sequential sub-goals. "Copy the weather and send it on Slack" becomes 4 focused steps, each tracked and executed independently.

--plan Multi-app Sub-goals
32
MCP tools consumed
16
Agent actions
5
LLM providers
2
Platforms (iOS + Android)
Get Started

Up and running
in 60 seconds

Install, configure, connect a device, and run. Four steps to your first AI-automated mobile goal.

1

Clone and install

Requires Node.js 18+. Clone the repo, install dependencies, and set up Appium with the driver for your platform.

2

Configure your LLM

Create a .env file and set your provider and API key. Works with Anthropic, OpenAI, Google, Groq, or local Ollama.

3

Connect a device

Plug in an Android device (or start an emulator/iOS simulator). Verify with adb devices and start the Appium server.

4

Run a goal

Pass any goal in plain English, run a declarative YAML flow, or use interactive mode. AppClaw connects to your device and executes autonomously.

# Clone, install, and set up Appium $ git clone https://github.com/AppiumTestDistribution/appclaw $ cd appclaw && npm install $ npm i -g appium && appium driver install uiautomator2   # Create .env with your LLM provider $ echo "LLM_PROVIDER=gemini" > .env $ echo "LLM_API_KEY=your-key" >> .env   # Verify device is connected $ adb devices   # Run a goal in plain English $ npm start "Open Settings and turn on WiFi"   # Or run a declarative YAML flow (no LLM needed) $ npm start -- --flow examples/flows/google-search.yaml   # Record, replay, or decompose complex goals $ npm start -- --record "Send hello on WhatsApp" $ npm start -- --replay recordings/rec-*.json $ npm start -- --plan "Copy weather and send on Slack"

Ready to automate
your mobile apps?

Open source. MIT licensed. Free forever. Start automating any Android or iOS app with AI today.

Star on GitHub
$ git clone https://github.com/AppiumTestDistribution/appclaw

MIT License · Free forever · Community driven