Open source · Apache 2.0

ways AppClaw
automates mobile .

Describe a step in plain English. AppClaw figures out what to tap, type, and swipe on real Android & iOS devices, from the CLI or the Cursor & VS Code extension.

Get Started View on GitHub

Android & iOS Real devices & emulators MCP-native BYO LLM key

9:41

MyApp

Welcome back

name@email.com

••••••••

resolves the Login button, not the “Log in” text link

Agentic goal

Give it a goal. It plans the rest.

Describe a high-level goal in plain English. AppClaw breaks it into sub-goals, runs a perceive–reason–act loop on each, and verifies the screen before it ever says “done”, recovering on its own when the screen fights back.

Decomposes complex goals into ordered sub-goals

Verifies the screen before accepting “done”

Detects stuck loops and recovers on its own

Pauses to ask you for OTPs & CAPTCHAs

$ appclaw "text Mom on WhatsApp: on my way"

plan · decomposed into 4 sub-goals

✓ Launch WhatsApp
✓ Open Mom's chat
⟳ recovered from a contact popup
✓ Type "On my way"
▶ Tap send

verified before done · 5 steps · $0.003

Memory

Context-engineered. And it remembers.

AppClaw works from two context sources, the knowledge you hand it, and the experience it earns. Both feed the same context window, so the agent gets the right detail exactly when it needs it.

Context engineering · AppGuides

Knowledge you give it

Per-app navigation maps injected into the agent's context, where the compose button lives, which gesture archives, how to reach any setting. No trial-and-error exploration.

GmailYouTubeWhatsAppChromeSettings + your app

.appclaw/guides/<appId>.md

Episodic memory

Knowledge it earns

After every successful run, AppClaw records the winning steps. On the next run it retrieves the most relevant past trajectories, matched by screen and goal, and injects them as a compact hint.

run 1explores from scratch9 steps

run 2recalls what worked4 steps

~/.appclaw/trajectories.json

Perception & locators

Resolves the right element, on any screen.

Two halves of one job. AppClaw reads what's on the screen, then pinpoints exactly which element you mean, no brittle selectors.

01Locatorsresolve the exact element

A “Login” button and a “Log in” text link on the same screen? Skip brittle selectors. Reference the one you mean by its position relative to another element, AppClaw resolves it geometrically.

belowaboveleft ofright ofnearwithin

Password anchor

below

tap login button below the password field

02Perceptiontwo ways to see the screen

It reads the structured accessibility tree when it's there, fast, exact, cheap. When it isn't (Canvas, Flutter, games, custom-rendered UI), it falls back to AI vision on the raw screenshot. The same tap, either way.

DOM mode · structured

Accessibility tree

Parsed straight from page source

</> page_source.xml

</FrameLayout>

Exact match. Zero vision tokens.

One tap

Vision mode · pixels

Screenshot + AI

When there's no DOM to read

screenshot.png

match 0.94

Locates by pixels when the tree is empty.

DOM first · vision fallback · chosen automatically

Parallel

Run your flow on every device at once.

Add one line and AppClaw fans your suite across the whole rack, each device with its own screen, step tracking, and pass/fail.

parallel: 4 per-device progress live pass/fail suite mode

24 devices 18 running 4 passed 2 queued

Any LLM provider AnthropicOpenAIGoogleGroqOllama

Modes

Every way to automate, covered.

An AI agent for messy goals, deterministic YAML for repeatable tests, or a live playground to build flows by hand.

agent

From prompt to done

Describe a goal in plain English. AppClaw reads the screen, reasons, and executes until it's finished.

$ appclaw "Turn on WiFi"

1/3 launch → Settings

2/3 tap → Network

3/3 tap → Wi-Fi

✓ Done · 8s · $0.002

yaml

Deterministic, $0 at runtime

Write plain-English YAML. No AI needed when you run it, fast, repeatable, with vision fallback.

steps:

- open YouTube

- tap search

- type "Appium"

done: "results up"

playground

Build flows live

A REPL on a real device. Type commands, watch them run, export straight to YAML.

pg> tap Settings

✓ Tapped it

pg> scroll General

✓ Visible

pg> /export y.yaml

How it works

Three steps from intent to execution.

Describe your goal

Plain English as a prompt, a YAML file, or a playground command. No selectors.

It perceives & reasons

AppClaw reads the screen, sends it to your LLM, and gets back a structured plan. Adapts to popups.

Execute & verify

Actions run through Appium via MCP. It checks after every step until the goal is met.

Architecture

Pure agentic brain. Zero device logic.

Every device action is delegated over the open MCP protocol, no drivers, no vendor lock-in.

AppClaw

the agentic brain

Reads the screen, decides the next move, and verifies the result, looping until the goal is met. That's all it does.

Perceive→Reason→Act↺ repeat

MCP▸

↓

open JSON-RPC · the only interface it speaks

Device land

none of this lives in AppClaw

appium-mcp

32 tools, tap · type · swipe

Device

Android · iOS · real or emulator

Automation is shifting from brittle scripts to agents that understand.

From a prompt to a passing test, AppClaw drives the whole flow, plus a playground and per-app AppGuides for the steps that need a hand.

“AppClaw automates workflows that used to take hours of scripting, in minutes. Describe it, and it runs.”

Mobile QA Engineers, teams shipping with AppClaw

The logbook

From the blog

How the agent reads a screen, when to reach for YAML, and where AI is taking mobile testing.

All posts

Engineering · Jul 11, 2026

Is AppClaw a token burner? A straight answer on cost

The honest worry: if an LLM drives every step, every run costs money. It doesn't have to. Here's what actually happens behind each step, why the model in DOM mode only parses language and never sees the screen, and how AppClaw caches and heals locators so repeat runs skip it entirely.

Read →

Engineering · Jun 30, 2026

Two ways to see a screen: the accessibility tree and AI vision

AppClaw reads the structured accessibility tree when it exists (fast, exact, cheap) and falls back to vision on the raw pixels when it doesn't. Here's how the agent decides, and why the same tap works either way.

Read →

Product · Jun 18, 2026

Say what you mean: why plain English beats brittle selectors

Selectors describe how an app is built. Instructions describe what you want. AppClaw closes the gap by resolving intent on the live screen instead of matching a fixed path.

Read →

Ship mobile automation faster.

Open source, extensible, BYO LLM key. Start in under a minute, terminal or IDE.

$ npx appclaw

Get Started GitHub

ways AppClawautomates mobile .

Welcome back

Give it a goal. It plans the rest.

Context-engineered. And it remembers.

Knowledge you give it

Knowledge it earns

Resolves the right element, on any screen.

Accessibility tree

Screenshot + AI

Run your flow on every device at once.

Every way to automate, covered.

From prompt to done

Deterministic, $0 at runtime

Build flows live

Three steps from intent to execution.

Describe your goal

It perceives & reasons

Execute & verify

Pure agentic brain. Zero device logic.

AppClaw

appium-mcp

Device

From the blog

Is AppClaw a token burner? A straight answer on cost

Two ways to see a screen: the accessibility tree and AI vision

Say what you mean: why plain English beats brittle selectors

Ship mobile automation faster.

ways AppClaw
automates mobile .