Quick Start
Install AppClaw globally and start automating in seconds.
# Install npm install -g appclaw # Run with a natural language goal appclaw "Open Settings and turn on Wi-Fi" # Run a YAML flow appclaw --flow my-test.yaml # Interactive playground appclaw --playground
CLI Options
All the flags you can pass to appclaw.
Platform & Device
| Flag | Description |
|---|---|
| --platform <os> | Target platform: android or ios |
| --device-type <type> | iOS only: simulator or real |
| --device <name> | Device name (partial match, e.g. "iPhone 17 Pro") |
| --udid <udid> | Device UDID (skips the device picker) |
Execution
| Flag | Description |
|---|---|
| --flow <file> | Run a declarative YAML flow file |
| --env <name> | Environment for variable/secret resolution |
| --playground | Launch the interactive REPL for building flows |
| --record | Record a goal execution for later replay |
| --replay <file> | Replay a previously recorded session |
| --plan | Decompose a complex goal into sub-goals |
| --json | JSON output mode (for IDE extensions) |
Explorer (Test Generation)
| Flag | Description |
|---|---|
| --explore <prd> | Generate test flows from a PRD document |
| --num-flows <N> | Number of flows to generate (default: 5) |
| --no-crawl | Skip device crawling, use PRD only |
| --output-dir <dir> | Output directory (default: generated-flows) |
| --max-screens <N> | Max screens to crawl (default: 10) |
| --max-depth <N> | Max navigation depth (default: 3) |
Execution Modes
AppClaw has three distinct ways to automate mobile apps.
Agent Mode
Give AppClaw a goal in plain English. The AI agent takes a screenshot, reasons about what it sees, and decides what to tap, type, or swipe — step by step until the goal is complete.
appclaw "Search for 'Appium 3.0' on YouTube and find the TestMu AI video"
YAML Flows
Define repeatable, version-controlled test flows in YAML. Each step is a natural language instruction — no element selectors, no brittle locators.
appclaw --flow tests/youtube-search.yaml --env dev
Playground
An interactive REPL where you type one instruction at a time and see it execute immediately. Great for exploring an app and building flows interactively.
appclaw --playground --platform ios --device-type simulator
Designing YAML Flows
YAML flows are the heart of AppClaw's repeatable automation. Write your test steps in plain English — AppClaw figures out how to execute them on the device. No XPath, no accessibility IDs, no brittle selectors.
Each step is a natural language instruction like tap Login or wait for the home screen to be visible. AppClaw uses AI to find the right elements on screen.
Flat Format
The simplest YAML structure — a metadata header separated by --- from a flat list of steps.
name: Turn on Wi-Fi platform: android --- - open Settings app - tap Connections - wait 1s - tap Wi-Fi - verify Wi-Fi is visible - done
Metadata Fields
| Field | Description |
|---|---|
| name | Display name for the flow |
| description | Optional description of what the flow does |
| platform | android or ios — fallback if no --platform CLI flag |
| appId | App bundle/package ID for launchApp steps |
| env | Environment name — resolves variables from .appclaw/env/<name>.yaml |
Phased Format
For structured tests, organize your steps into three phases: setup, steps, and assertions. This gives clearer reporting and separates initialization from the actual test logic.
name: YouTube Search description: Searches YouTube and verifies video results platform: android env: dev --- setup: - open ${variables.app_name} app - wait until search icon is visible steps: - click on search icon - type '${secrets.search_query}' - wait 3s - click on the first result from the list - wait for the search results to be visible - scroll down assertions: - verify ${variables.expected_channel} is visible
Phases Explained
| Phase | Purpose |
|---|---|
| setup | Initialization — launch the app, navigate to starting screen, dismiss popups. Failures here skip the test. |
| steps | The main test actions — the interactions you're actually testing. |
| assertions | Verification checks — confirm the expected outcome. You can also mix in actions here if needed. |
Variables & Secrets
Keep your flows flexible and secure with variable interpolation.
Variables ${variables.X}
Loaded from environment files. Values appear in logs.
Secrets ${secrets.X}
Resolved from shell environment variables at runtime. Always shown as *** in logs.
Environment File
Create .appclaw/env/<name>.yaml in your project root:
variables: app_name: youtube expected_channel: TestMu AI timeout: 30 locale: en-US
Then reference it in your YAML header with env: dev, or pass --env dev on the CLI.
Inline Variables
For self-contained flows, embed variables directly in the YAML header:
name: Self-contained flow env: variables: app_name: youtube search_term: appium 3.0 --- - open ${variables.app_name} app - type ${variables.search_term}
--env CLI flag wins over the YAML env: field, which wins over inline env: blocks. Secrets always come from shell environment variables.
Tap / Click
Tap on an element by describing its label. AppClaw matches it against visible text and elements on screen.
- tap Login - click on the search icon - press Submit - select the first item - choose English - pick the blue option - navigate to Settings - toggle Dark Mode - enable Notifications - close the popup - dismiss the dialog
All of these are equivalent — they find the element and tap it. Use whichever reads most naturally.
- tap: "Login Button"
Type Text
Type text into the currently focused field, or specify a target field.
# Type into focused field - type "hello world" - enter text "user@example.com" # Type into a specific field - type "john@example.com" in email field - enter "password123" into password field # Search (types the text) - search for "Appium 3.0" - look for "restaurants nearby"
- type: "hello world"
Wait / Pause
Pause execution for a fixed duration.
- wait 3s - wait 1.5 seconds - sleep 500ms - pause 2 sec - wait # defaults to 2 seconds - wait a moment # defaults to 2 seconds
- wait: 3 # seconds
Wait Until
Wait dynamically until a condition is met. Polls the screen every 500ms up to a timeout (default 10s). Uses AI vision to understand the screen — you can describe what you expect to see in plain English.
- wait until search icon is visible - wait for the search results to be visible - wait for the home screen to be loaded - wait until "Welcome back" appears - wait 15s until login button is visible # custom timeout
- wait until loading spinner is gone - wait for the popup to be hidden - wait until progress bar disappeared
- wait until screen is loaded - wait until screen is stable - wait 5s until screen is ready
# With custom timeout - waitUntil: "Login button" timeout: 15 # Wait for element to disappear - waitUntilGone: "Loading spinner" timeout: 20 # Screen loaded (DOM stability check) - waitUntil: "screen loaded"
When you write something descriptive like wait for the search results to be visible, AppClaw uses AI vision to understand the screen holistically — it checks whether results are actually shown, not just whether the literal words "search results" appear. You can describe what you expect to see naturally.
Scroll / Swipe
Scroll or swipe in any direction, optionally repeating multiple times or scrolling until an element is found.
- scroll down - scroll up 3 times - swipe left - swipe right 2 times
# Scroll until an element appears - scroll down until "Terms & Conditions" is visible - scroll down 5 times to find "Accept" - scroll down to see "Load More"
- scrollAssert: "Terms & Conditions" direction: down maxScrolls: 5
Assert / Verify
Verify that something is visible on screen. Works with both literal text and visual/semantic descriptions via AI vision.
- verify "Welcome back" is visible - assert Dashboard is visible - check that the login button is on the screen - verify TestMu AI is visible
- assert: "Welcome back" - verify: "Dashboard" # alias for assert - check: "Login button" # alias for assert
Full Command Reference
Every supported step kind at a glance.
| Kind | Parameters | Description |
|---|---|---|
| openApp | query | Open an app by name |
| launchApp | — | Launch app defined in appId metadata |
| tap | label | Tap element by visible text/label |
| type | text, target? | Type text, optionally into a named field |
| enter | — | Press Enter / Return key |
| back | — | Press the Back button |
| home | — | Press the Home button |
| wait | seconds | Pause for a fixed duration |
| waitUntil | condition, text?, timeout | Poll until visible/gone/screenLoaded |
| swipe | direction, repeat? | Swipe up/down/left/right |
| assert | text | Verify text or description is visible |
| scrollAssert | text, direction, maxScrolls | Scroll until text found |
| getInfo | query | Ask the AI a question about the screen |
| done | message? | Signal flow completion |
Vision Modes
Control how AppClaw locates elements on screen.
Agent Mode AGENT_MODE
| Value | Behavior |
|---|---|
| dom | Default. Uses the app's DOM/accessibility tree to find elements. |
| vision | Uses AI vision (screenshots + LLM) as the primary strategy for all interactions. |
Vision Mode VISION_MODE
| Value | Behavior |
|---|---|
| fallback | Default. Try DOM first, fall back to vision if no match found. |
| always | Skip DOM entirely, use vision for every interaction. |
| never | DOM only. No vision fallback. |
Environment Variables
All environment variables recognized by AppClaw. These are especially useful for CI/CD pipelines.
LLM Configuration
| Variable | Description |
|---|---|
| LLM_PROVIDER | LLM provider: anthropic, openai, gemini, groq, ollama |
| LLM_API_KEY | API key for the chosen provider |
| LLM_MODEL | Specific model name to use |
| LLM_THINKING | Extended thinking: on or off (default: on) |
| LLM_THINKING_BUDGET | Max thinking tokens: 1–10000 (default: 128) |
| LLM_SCREENSHOT_MAX_EDGE_PX | Downscale screenshots to this max edge (0 = disabled) |
Device & Platform
| Variable | Description |
|---|---|
| PLATFORM | Same as --platform flag |
| DEVICE_TYPE | Same as --device-type flag |
| DEVICE_UDID | Same as --udid flag |
| DEVICE_NAME | Same as --device flag |
Vision
| Variable | Description |
|---|---|
| VISION_MODE | always, fallback, or never |
| AGENT_MODE | dom or vision |
| GEMINI_API_KEY | Gemini API key for Stark vision — only needed when LLM_PROVIDER is not gemini and AGENT_MODE=vision. If provider is already Gemini, LLM_API_KEY is reused automatically. |
Execution Tuning
| Variable | Description |
|---|---|
| MAX_STEPS | Max steps per goal (default: 30) |
| STEP_DELAY | Delay between steps in ms (default: 500) |
| MAX_ELEMENTS | Max DOM elements to parse (default: 40) |
| MAX_HISTORY_STEPS | Max action history retained (default: 10) |
MCP Connection
| Variable | Description |
|---|---|
| MCP_TRANSPORT | stdio or sse (default: stdio) |
| MCP_HOST | MCP server host (default: localhost) |
| MCP_PORT | MCP server port (default: 8080) |