Quick Start

Install AppClaw globally and start automating in seconds.

Terminal
# Install
npm install -g appclaw

# Run with a natural language goal
appclaw "Open Settings and turn on Wi-Fi"

# Run a YAML flow
appclaw --flow my-test.yaml

# Interactive playground
appclaw --playground

CLI Options

All the flags you can pass to appclaw.

Platform & Device

FlagDescription
--platform <os>Target platform: android or ios
--device-type <type>iOS only: simulator or real
--device <name>Device name (partial match, e.g. "iPhone 17 Pro")
--udid <udid>Device UDID (skips the device picker)

Execution

FlagDescription
--flow <file>Run a declarative YAML flow file
--env <name>Environment for variable/secret resolution
--playgroundLaunch the interactive REPL for building flows
--recordRecord a goal execution for later replay
--replay <file>Replay a previously recorded session
--planDecompose a complex goal into sub-goals
--jsonJSON output mode (for IDE extensions)

Explorer (Test Generation)

FlagDescription
--explore <prd>Generate test flows from a PRD document
--num-flows <N>Number of flows to generate (default: 5)
--no-crawlSkip device crawling, use PRD only
--output-dir <dir>Output directory (default: generated-flows)
--max-screens <N>Max screens to crawl (default: 10)
--max-depth <N>Max navigation depth (default: 3)

Execution Modes

AppClaw has three distinct ways to automate mobile apps.

Agent Mode

Give AppClaw a goal in plain English. The AI agent takes a screenshot, reasons about what it sees, and decides what to tap, type, or swipe — step by step until the goal is complete.

Agent Mode
appclaw "Search for 'Appium 3.0' on YouTube and find the TestMu AI video"

YAML Flows

Define repeatable, version-controlled test flows in YAML. Each step is a natural language instruction — no element selectors, no brittle locators.

YAML Flow
appclaw --flow tests/youtube-search.yaml --env dev

Playground

An interactive REPL where you type one instruction at a time and see it execute immediately. Great for exploring an app and building flows interactively.

Playground
appclaw --playground --platform ios --device-type simulator

Designing YAML Flows

YAML flows are the heart of AppClaw's repeatable automation. Write your test steps in plain English — AppClaw figures out how to execute them on the device. No XPath, no accessibility IDs, no brittle selectors.

Key Idea

Each step is a natural language instruction like tap Login or wait for the home screen to be visible. AppClaw uses AI to find the right elements on screen.

Flat Format

The simplest YAML structure — a metadata header separated by --- from a flat list of steps.

settings-wifi.yaml
name: Turn on Wi-Fi
platform: android
---
- open Settings app
- tap Connections
- wait 1s
- tap Wi-Fi
- verify Wi-Fi is visible
- done

Metadata Fields

FieldDescription
nameDisplay name for the flow
descriptionOptional description of what the flow does
platformandroid or ios — fallback if no --platform CLI flag
appIdApp bundle/package ID for launchApp steps
envEnvironment name — resolves variables from .appclaw/env/<name>.yaml

Phased Format

For structured tests, organize your steps into three phases: setup, steps, and assertions. This gives clearer reporting and separates initialization from the actual test logic.

youtube-search.yaml
name: YouTube Search
description: Searches YouTube and verifies video results
platform: android
env: dev
---
setup:
  - open ${variables.app_name} app
  - wait until search icon is visible

steps:
  - click on search icon
  - type '${secrets.search_query}'
  - wait 3s
  - click on the first result from the list
  - wait for the search results to be visible
  - scroll down

assertions:
  - verify ${variables.expected_channel} is visible

Phases Explained

PhasePurpose
setupInitialization — launch the app, navigate to starting screen, dismiss popups. Failures here skip the test.
stepsThe main test actions — the interactions you're actually testing.
assertionsVerification checks — confirm the expected outcome. You can also mix in actions here if needed.

Variables & Secrets

Keep your flows flexible and secure with variable interpolation.

Variables ${variables.X}

Loaded from environment files. Values appear in logs.

Secrets ${secrets.X}

Resolved from shell environment variables at runtime. Always shown as *** in logs.

Environment File

Create .appclaw/env/<name>.yaml in your project root:

.appclaw/env/dev.yaml
variables:
  app_name: youtube
  expected_channel: TestMu AI
  timeout: 30
  locale: en-US

Then reference it in your YAML header with env: dev, or pass --env dev on the CLI.

Inline Variables

For self-contained flows, embed variables directly in the YAML header:

Inline env block
name: Self-contained flow
env:
  variables:
    app_name: youtube
    search_term: appium 3.0
---
- open ${variables.app_name} app
- type ${variables.search_term}
Resolution Order

--env CLI flag wins over the YAML env: field, which wins over inline env: blocks. Secrets always come from shell environment variables.

Tap / Click

Tap on an element by describing its label. AppClaw matches it against visible text and elements on screen.

Natural language
- tap Login
- click on the search icon
- press Submit
- select the first item
- choose English
- pick the blue option
- navigate to Settings
- toggle Dark Mode
- enable Notifications
- close the popup
- dismiss the dialog

All of these are equivalent — they find the element and tap it. Use whichever reads most naturally.

Structured YAML
- tap: "Login Button"

Type Text

Type text into the currently focused field, or specify a target field.

Natural language
# Type into focused field
- type "hello world"
- enter text "user@example.com"

# Type into a specific field
- type "john@example.com" in email field
- enter "password123" into password field

# Search (types the text)
- search for "Appium 3.0"
- look for "restaurants nearby"
Structured YAML
- type: "hello world"

Wait / Pause

Pause execution for a fixed duration.

Natural language
- wait 3s
- wait 1.5 seconds
- sleep 500ms
- pause 2 sec
- wait               # defaults to 2 seconds
- wait a moment      # defaults to 2 seconds
Structured YAML
- wait: 3          # seconds

Wait Until

Wait dynamically until a condition is met. Polls the screen every 500ms up to a timeout (default 10s). Uses AI vision to understand the screen — you can describe what you expect to see in plain English.

Wait for something to appear
- wait until search icon is visible
- wait for the search results to be visible
- wait for the home screen to be loaded
- wait until "Welcome back" appears
- wait 15s until login button is visible  # custom timeout
Wait for something to disappear
- wait until loading spinner is gone
- wait for the popup to be hidden
- wait until progress bar disappeared
Wait for screen to stabilize
- wait until screen is loaded
- wait until screen is stable
- wait 5s until screen is ready
Structured YAML
# With custom timeout
- waitUntil: "Login button"
  timeout: 15

# Wait for element to disappear
- waitUntilGone: "Loading spinner"
  timeout: 20

# Screen loaded (DOM stability check)
- waitUntil: "screen loaded"
Smart Vision

When you write something descriptive like wait for the search results to be visible, AppClaw uses AI vision to understand the screen holistically — it checks whether results are actually shown, not just whether the literal words "search results" appear. You can describe what you expect to see naturally.

Scroll / Swipe

Scroll or swipe in any direction, optionally repeating multiple times or scrolling until an element is found.

Basic scroll / swipe
- scroll down
- scroll up 3 times
- swipe left
- swipe right 2 times
Scroll until found
# Scroll until an element appears
- scroll down until "Terms & Conditions" is visible
- scroll down 5 times to find "Accept"
- scroll down to see "Load More"
Structured YAML
- scrollAssert: "Terms & Conditions"
  direction: down
  maxScrolls: 5

Assert / Verify

Verify that something is visible on screen. Works with both literal text and visual/semantic descriptions via AI vision.

Natural language
- verify "Welcome back" is visible
- assert Dashboard is visible
- check that the login button is on the screen
- verify TestMu AI is visible
Structured YAML
- assert: "Welcome back"
- verify: "Dashboard"    # alias for assert
- check:  "Login button"  # alias for assert

Full Command Reference

Every supported step kind at a glance.

KindParametersDescription
openAppqueryOpen an app by name
launchAppLaunch app defined in appId metadata
taplabelTap element by visible text/label
typetext, target?Type text, optionally into a named field
enterPress Enter / Return key
backPress the Back button
homePress the Home button
waitsecondsPause for a fixed duration
waitUntilcondition, text?, timeoutPoll until visible/gone/screenLoaded
swipedirection, repeat?Swipe up/down/left/right
asserttextVerify text or description is visible
scrollAsserttext, direction, maxScrollsScroll until text found
getInfoqueryAsk the AI a question about the screen
donemessage?Signal flow completion

Vision Modes

Control how AppClaw locates elements on screen.

Agent Mode AGENT_MODE

ValueBehavior
domDefault. Uses the app's DOM/accessibility tree to find elements.
visionUses AI vision (screenshots + LLM) as the primary strategy for all interactions.

Vision Mode VISION_MODE

ValueBehavior
fallbackDefault. Try DOM first, fall back to vision if no match found.
alwaysSkip DOM entirely, use vision for every interaction.
neverDOM only. No vision fallback.

Environment Variables

All environment variables recognized by AppClaw. These are especially useful for CI/CD pipelines.

LLM Configuration

VariableDescription
LLM_PROVIDERLLM provider: anthropic, openai, gemini, groq, ollama
LLM_API_KEYAPI key for the chosen provider
LLM_MODELSpecific model name to use
LLM_THINKINGExtended thinking: on or off (default: on)
LLM_THINKING_BUDGETMax thinking tokens: 1–10000 (default: 128)
LLM_SCREENSHOT_MAX_EDGE_PXDownscale screenshots to this max edge (0 = disabled)

Device & Platform

VariableDescription
PLATFORMSame as --platform flag
DEVICE_TYPESame as --device-type flag
DEVICE_UDIDSame as --udid flag
DEVICE_NAMESame as --device flag

Vision

VariableDescription
VISION_MODEalways, fallback, or never
AGENT_MODEdom or vision
GEMINI_API_KEYGemini API key for Stark vision — only needed when LLM_PROVIDER is not gemini and AGENT_MODE=vision. If provider is already Gemini, LLM_API_KEY is reused automatically.

Execution Tuning

VariableDescription
MAX_STEPSMax steps per goal (default: 30)
STEP_DELAYDelay between steps in ms (default: 500)
MAX_ELEMENTSMax DOM elements to parse (default: 40)
MAX_HISTORY_STEPSMax action history retained (default: 10)

MCP Connection

VariableDescription
MCP_TRANSPORTstdio or sse (default: stdio)
MCP_HOSTMCP server host (default: localhost)
MCP_PORTMCP server port (default: 8080)