Usage Guide

Quick Start

Install AppClaw globally and start automating in seconds.

Terminal

# Install
npm install -g appclaw

# Run with a natural language goal
appclaw "Open Settings and turn on Wi-Fi"

# Run a YAML flow
appclaw --flow my-test.yaml

# Interactive playground
appclaw --playground

CLI Options

All the flags you can pass to appclaw.

Platform & Device

Flag	Description
--platform <os>	Target platform: `android` or `ios`
--device-type <type>	iOS only: `simulator` or `real`
--device <name>	Device name (partial match, e.g. "iPhone 17 Pro")
--udid <udid>	Device UDID (skips the device picker)

Execution

Flag	Description
--flow <file>	Run a declarative YAML flow file
--env <name>	Environment for variable/secret resolution
--playground	Launch the interactive REPL for building flows
--record	Record a goal execution for later replay
--replay <file>	Replay a previously recorded session
--plan	Decompose a complex goal into sub-goals
--json	JSON output mode (for IDE extensions)

Explorer (Test Generation)

Flag	Description
--explore <prd>	Generate test flows from a PRD document
--num-flows <N>	Number of flows to generate (default: 5)
--no-crawl	Skip device crawling, use PRD only
--output-dir <dir>	Output directory (default: generated-flows)
--max-screens <N>	Max screens to crawl (default: 10)
--max-depth <N>	Max navigation depth (default: 3)

Execution Modes

AppClaw has three distinct ways to automate mobile apps.

Agent Mode

Give AppClaw a goal in plain English. The AI agent takes a screenshot, reasons about what it sees, and decides what to tap, type, or swipe — step by step until the goal is complete.

Agent Mode

appclaw "Search for 'Appium 3.0' on YouTube and find the TestMu AI video"

YAML Flows

Define repeatable, version-controlled test flows in YAML. Each step is a natural language instruction — no element selectors, no brittle locators.

YAML Flow

appclaw --flow tests/youtube-search.yaml --env dev

Playground

An interactive REPL where you type one instruction at a time and see it execute immediately. Great for exploring an app and building flows interactively.

Playground

appclaw --playground --platform ios --device-type simulator

Designing YAML Flows

YAML flows are the heart of AppClaw's repeatable automation. Write your test steps in plain English — AppClaw figures out how to execute them on the device. No XPath, no accessibility IDs, no brittle selectors.

Key Idea

Each step is a natural language instruction like tap Login or wait for the home screen to be visible. AppClaw uses AI to find the right elements on screen.

Flat Format

The simplest YAML structure — a metadata header separated by --- from a flat list of steps.

settings-wifi.yaml

name: Turn on Wi-Fi
platform: android
---
- open Settings app
- tap Connections
- wait 1s
- tap Wi-Fi
- verify Wi-Fi is visible
- done

Metadata Fields

Field	Description
name	Display name for the flow
description	Optional description of what the flow does
platform	`android` or `ios` — fallback if no `--platform` CLI flag
appId	App bundle/package ID for `launchApp` steps
env	Environment name — resolves variables from `.appclaw/env/<name>.yaml`

Phased Format

For structured tests, organize your steps into three phases: setup, steps, and assertions. This gives clearer reporting and separates initialization from the actual test logic.

youtube-search.yaml

name: YouTube Search
description: Searches YouTube and verifies video results
platform: android
env: dev
---
setup:
  - open ${variables.app_name} app
  - wait until search icon is visible

steps:
  - click on search icon
  - type '${secrets.search_query}'
  - wait 3s
  - click on the first result from the list
  - wait for the search results to be visible
  - scroll down

assertions:
  - verify ${variables.expected_channel} is visible

Phases Explained

Phase	Purpose
setup	Initialization — launch the app, navigate to starting screen, dismiss popups. Failures here skip the test.
steps	The main test actions — the interactions you're actually testing.
assertions	Verification checks — confirm the expected outcome. You can also mix in actions here if needed.

Variables & Secrets

Keep your flows flexible and secure with variable interpolation.

Variables `${variables.X}`

Loaded from environment files. Values appear in logs.

Secrets `${secrets.X}`

Resolved from shell environment variables at runtime. Always shown as *** in logs.

Environment File

Create .appclaw/env/<name>.yaml in your project root:

.appclaw/env/dev.yaml

variables:
  app_name: youtube
  expected_channel: TestMu AI
  timeout: 30
  locale: en-US

Then reference it in your YAML header with env: dev, or pass --env dev on the CLI.

Inline Variables

For self-contained flows, embed variables directly in the YAML header:

Inline env block

name: Self-contained flow
env:
  variables:
    app_name: youtube
    search_term: appium 3.0
---
- open ${variables.app_name} app
- type ${variables.search_term}

Resolution Order

--env CLI flag wins over the YAML env: field, which wins over inline env: blocks. Secrets always come from shell environment variables.

Tap / Click

Tap on an element by describing its label. AppClaw matches it against visible text and elements on screen.

Natural language

- tap Login
- click on the search icon
- press Submit
- select the first item
- choose English
- pick the blue option
- navigate to Settings
- toggle Dark Mode
- enable Notifications
- close the popup
- dismiss the dialog

All of these are equivalent — they find the element and tap it. Use whichever reads most naturally.

Structured YAML

- tap: "Login Button"

Type Text

Type text into the currently focused field, or specify a target field.

Natural language

# Type into focused field
- type "hello world"
- enter text "user@example.com"

# Type into a specific field
- type "john@example.com" in email field
- enter "password123" into password field

# Search (types the text)
- search for "Appium 3.0"
- look for "restaurants nearby"

Structured YAML

- type: "hello world"

Wait / Pause

Pause execution for a fixed duration.

Natural language

- wait 3s
- wait 1.5 seconds
- sleep 500ms
- pause 2 sec
- wait               # defaults to 2 seconds
- wait a moment      # defaults to 2 seconds

Structured YAML

- wait: 3          # seconds

Wait Until

Wait dynamically until a condition is met. Polls the screen every 500ms up to a timeout (default 10s). Uses AI vision to understand the screen — you can describe what you expect to see in plain English.

Wait for something to appear

- wait until search icon is visible
- wait for the search results to be visible
- wait for the home screen to be loaded
- wait until "Welcome back" appears
- wait 15s until login button is visible  # custom timeout

Wait for something to disappear

- wait until loading spinner is gone
- wait for the popup to be hidden
- wait until progress bar disappeared

Wait for screen to stabilize

- wait until screen is loaded
- wait until screen is stable
- wait 5s until screen is ready

Structured YAML

# With custom timeout
- waitUntil: "Login button"
  timeout: 15

# Wait for element to disappear
- waitUntilGone: "Loading spinner"
  timeout: 20

# Screen loaded (DOM stability check)
- waitUntil: "screen loaded"

Smart Vision

When you write something descriptive like wait for the search results to be visible, AppClaw uses AI vision to understand the screen holistically — it checks whether results are actually shown, not just whether the literal words "search results" appear. You can describe what you expect to see naturally.

Scroll / Swipe

Scroll or swipe in any direction, optionally repeating multiple times or scrolling until an element is found.

Basic scroll / swipe

- scroll down
- scroll up 3 times
- swipe left
- swipe right 2 times

Scroll until found

# Scroll until an element appears
- scroll down until "Terms & Conditions" is visible
- scroll down 5 times to find "Accept"
- scroll down to see "Load More"

Structured YAML

- scrollAssert: "Terms & Conditions"
  direction: down
  maxScrolls: 5

Drag / Slider

Drag one element to another — sliders, carousels, reorderable lists, and any drag-and-drop interaction. Requires vision mode (AGENT_MODE=vision).

Natural language

- drag the green circle slider to the +100 mark
- slide the price handle to +80
- move the volume knob to maximum

Structured YAML — shorthand

# "drag: from to to"
- drag: "green circle slider to +100 mark"

Structured YAML — explicit from/to

- drag:
    from: green circle slider
    to:   +100 mark

Vision required

Drag uses AI vision to locate both the source and target by visual description. Set AGENT_MODE=vision and VISION_LOCATE_PROVIDER=stark with a valid LLM_API_KEY.

Assert / Verify

Verify that something is visible on screen. Works with both literal text and visual/semantic descriptions via AI vision.

Natural language

- verify "Welcome back" is visible
- assert Dashboard is visible
- check that the login button is on the screen
- verify TestMu AI is visible

Structured YAML

- assert: "Welcome back"
- verify: "Dashboard"    # alias for assert
- check:  "Login button"  # alias for assert

Navigation & Control

App launch, back, home, enter, and flow completion.

Natural language

# Launch / open
- open YouTube app
- launch Settings
- start Chrome

# Navigation
- go back
- press back button
- go home

# Submit / Enter
- press enter
- submit
- perform search
- confirm

# Flow completion
- done
- done: "Login was successful"

Full Command Reference

Every supported step kind at a glance.

Kind	Parameters	Description
openApp	query	Open an app by name
launchApp	—	Launch app defined in `appId` metadata
tap	label	Tap element by visible text/label
type	text, target?	Type text, optionally into a named field
enter	—	Press Enter / Return key
back	—	Press the Back button
home	—	Press the Home button
wait	seconds	Pause for a fixed duration
waitUntil	condition, text?, timeout	Poll until visible/gone/screenLoaded
swipe	direction, repeat?	Swipe up/down/left/right
drag	from, to	Drag from one element to another (vision mode)
assert	text	Verify text or description is visible
scrollAssert	text, direction, maxScrolls	Scroll until text found
getInfo	query	Ask the AI a question about the screen
done	message?	Signal flow completion

Vision Modes

Control how AppClaw locates elements on screen.

Agent Mode `AGENT_MODE`

Value	Behavior
dom	Default. Uses the app's DOM/accessibility tree to find elements.
vision	Uses AI vision (screenshots + LLM) as the primary strategy for all interactions.

Vision Mode `VISION_MODE`

Value	Behavior
fallback	Default. Try DOM first, fall back to vision if no match found.
always	Skip DOM entirely, use vision for every interaction.
never	DOM only. No vision fallback.

Environment Variables

All environment variables recognized by AppClaw. These are especially useful for CI/CD pipelines.

LLM Configuration

Variable	Description
LLM_PROVIDER	LLM provider: `anthropic`, `openai`, `gemini`, `groq`, `ollama`
LLM_API_KEY	API key for the chosen provider
LLM_MODEL	Specific model name to use
LLM_THINKING	Extended thinking: `on` or `off` (default: on)
LLM_THINKING_BUDGET	Max thinking tokens: 1–10000 (default: 128)
LLM_SCREENSHOT_MAX_EDGE_PX	Downscale screenshots to this max edge (0 = disabled)

Device & Platform

Variable	Description
PLATFORM	Same as `--platform` flag
DEVICE_TYPE	Same as `--device-type` flag
DEVICE_UDID	Same as `--udid` flag
DEVICE_NAME	Same as `--device` flag

Vision

Variable	Description
VISION_MODE	`always`, `fallback`, or `never`
AGENT_MODE	`dom` or `vision`
GEMINI_API_KEY	Gemini API key for Stark vision — only needed when `LLM_PROVIDER` is not `gemini` and `AGENT_MODE=vision`. If provider is already Gemini, `LLM_API_KEY` is reused automatically.

Execution Tuning

Variable	Description
MAX_STEPS	Max steps per goal (default: 30)
STEP_DELAY	Delay between steps in ms (default: 500)
MAX_ELEMENTS	Max DOM elements to parse (default: 40)
MAX_HISTORY_STEPS	Max action history retained (default: 10)

MCP Connection

Variable	Description
MCP_TRANSPORT	`stdio` or `sse` (default: stdio)
MCP_HOST	MCP server host (default: localhost)
MCP_PORT	MCP server port (default: 8080)

LambdaTest Cloud

Run AppClaw tests on real iOS and Android devices in the cloud — no local device or emulator required. AppClaw integrates with LambdaTest's real device cloud via its Appium-compatible hub.

Setup

Add your LambdaTest credentials and target device to .env:

.env

# Enable LambdaTest cloud
CLOUD_PROVIDER=lambdatest

# LambdaTest credentials (from app.lambdatest.com → Profile → Access Key)
LAMBDATEST_USERNAME=your_username
LAMBDATEST_ACCESS_KEY=your_access_key

# Target device
LAMBDATEST_DEVICE_NAME=iPhone 14
LAMBDATEST_OS_VERSION=16
PLATFORM=ios

# Your app (upload via LambdaTest portal, copy the lt:// ID)
LAMBDATEST_APP=lt://APP10xxxxxxxxxxxxxxxx

# LLM for AI-powered automation
LLM_PROVIDER=gemini
LLM_API_KEY=your_gemini_api_key

CLI Usage

Once .env is configured, run AppClaw exactly as you would locally:

Terminal

# Run a natural-language goal on a cloud device
appclaw "Open the app and navigate to the checkout screen"

# Run a YAML flow on a cloud device
appclaw --flow flows/checkout.yaml

SDK Usage

No SDK changes needed — when CLOUD_PROVIDER=lambdatest is set in .env, the SDK automatically routes the session through LambdaTest:

TypeScript

import { AppClaw } from 'appclaw';

// CLOUD_PROVIDER=lambdatest is read from .env automatically
const app = new AppClaw({
  provider:    'gemini',
  apiKey:      process.env.LLM_API_KEY,
  reportName:  'Checkout — LambdaTest',
});

await app.run('open the app');
await app.run('tap Add to Cart');
await app.run('tap Checkout');

await app.teardown();  // report saved to .appclaw/runs/

Environment Variables

Variable	Required	Description
CLOUD_PROVIDER	Yes	Set to `lambdatest` to enable cloud execution
LAMBDATEST_USERNAME	Yes	Your LambdaTest account username
LAMBDATEST_ACCESS_KEY	Yes	Your LambdaTest access key (from Profile → Access Key)
LAMBDATEST_DEVICE_NAME	Yes	Cloud device to use, e.g. `iPhone 14`, `Galaxy S23`
LAMBDATEST_OS_VERSION	Yes	OS version, e.g. `16` (iOS) or `13` (Android)
LAMBDATEST_APP	No	App ID from the LambdaTest portal (format: `lt://APP…`)
LAMBDATEST_BUILD_NAME	No	Build label shown in the LambdaTest dashboard
LAMBDATEST_PROJECT_NAME	No	Project label shown in the LambdaTest dashboard
LAMBDATEST_VIDEO	No	Record session video. Default: `true`
LAMBDATEST_NETWORK	No	Capture network logs. Default: `false`

No code changes required

Switching between local and cloud execution is purely config — set CLOUD_PROVIDER=lambdatest in your CI environment and remove it for local runs. Your YAML flows and SDK tests stay identical.

Node.js / TypeScript SDK

AppClaw ships a first-class programmatic API so you can drive mobile automation directly from Node.js or TypeScript — no CLI required. The SDK is the natural fit for QA automation inside test runners (Vitest, Jest, Mocha), CI pipelines, and any script that needs to control a device programmatically.

When to use the SDK vs the CLI

Use the CLI for one-off tasks and interactive exploration. Use the SDK when you want to run flows inside a test suite, assert on results, share a device connection across multiple flows, or integrate AppClaw into a larger automation pipeline.

Architecture

The SDK exposes a single AppClaw class that manages the full lifecycle:

Lazy MCP connect — the Appium connection is opened on the first runFlow() or runGoal() call, not on construction.
Connection reuse — subsequent calls share the same underlying connection, so you pay the startup cost once per test suite.
Explicit teardown — call teardown() in your afterAll hook to close the connection cleanly.
Silent by default — spinners and terminal colours are suppressed automatically, keeping CI logs clean.

Installation

AppClaw is a single package — the SDK is built in, nothing extra to install.

Terminal

npm install appclaw

Create a .env file in your project root (or pass options directly to the constructor — see Options Reference).

.env

LLM_PROVIDER=anthropic
LLM_API_KEY=sk-ant-...
PLATFORM=android

runFlow()

Parse and execute a YAML flow file against a connected device. Returns a FlowResult you can assert on.

TypeScript

import { AppClaw } from 'appclaw';

const app = new AppClaw({
  provider: 'anthropic',
  apiKey:   process.env.ANTHROPIC_API_KEY,
  platform: 'android',
});

const result = await app.runFlow('./flows/checkout.yaml');

console.log(result.success);    // true
console.log(result.stepsUsed);  // 6
console.log(result.stepsTotal); // 6

await app.teardown();

FlowResult shape

Field	Type	Description
success	boolean	Whether all steps completed successfully
stepsUsed	number	Steps executed before completion or failure
stepsTotal	number	Total steps in the flow (including unexecuted)
failedStep	number?	1-based index of the step that failed
failedPhase	string?	`setup` \| `test` \| `assertion`
error	string?	Human-readable failure reason

runGoal()

Execute a plain-English goal using the agent loop — same as passing a goal string to the CLI. Returns an AgentResult.

TypeScript

import { AppClaw } from 'appclaw';

const app = new AppClaw({ provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY });

const result = await app.runGoal(
  'Log in with email qa@company.com and password Test1234'
);

console.log(result.success);   // true
console.log(result.stepsUsed); // 4
console.log(result.reason);    // "Logged in successfully"

await app.teardown();

Flow vs Goal

Use runFlow() for repeatable QA scenarios — structured, deterministic, zero LLM cost. Use runGoal() for exploratory tasks or when you need the agent to adapt to dynamic screen states.

run()

Execute a single natural-language instruction directly on the device — the programmatic equivalent of typing a command in the playground REPL. Each call is one atomic action: parse the instruction, execute it, return the result.

TypeScript

import { AppClaw } from 'appclaw';

const app = new AppClaw({ provider: 'gemini', apiKey: process.env.GEMINI_API_KEY, platform: 'android' });

await app.run('open YouTube app');       // regex match — no LLM call
await app.run('tap Search');             // regex match — no LLM call
await app.run('type Appium 3.0');        // regex match — no LLM call
await app.run('tap the search button');  // LLM fallback → tap
await app.run('wait 2 seconds');         // regex match — no LLM call
await app.run('scroll down');            // regex match — no LLM call

await app.teardown();  // report written to .appclaw/runs/

How instructions are resolved

Regex match — common patterns (open X, tap X, type X, wait N seconds, scroll down, …) are resolved instantly with zero LLM cost.
LLM fallback — anything that doesn't match a regex is sent to the configured LLM, which classifies it into a structured action (tap, type, swipe, etc.).

RunResult shape

Field	Type	Description
success	boolean	Whether the action completed successfully
action	string	Resolved step kind: `tap` \| `type` \| `openApp` \| `wait` \| `swipe` \| …
message	string	Human-readable description of what happened

run() vs runGoal() vs runFlow()

Use run() when you want full control — one deterministic step at a time, easy to integrate with any test framework. Use runGoal() when you want the agent to figure out the steps itself. Use runFlow() for declarative YAML test cases you want to version-control.

Reports

Reports are enabled by default when using the SDK. After teardown() is called, AppClaw writes an HTML report to .appclaw/runs/ — one screenshot per step, plus a full execution summary. No extra configuration needed.

TypeScript

const app = new AppClaw({
  provider:   'gemini',
  apiKey:     process.env.GEMINI_API_KEY,
  platform:   'android',
  reportName: 'YouTube Search',  // shown in the report viewer
});

await app.run('open YouTube app');
await app.run('tap Search');
await app.run('type Appium 3.0');
await app.run('tap the search button');

await app.teardown();
// ↑ writes report to .appclaw/runs/<runId>/

Screen recording

Pass video: true to record the screen for the entire run and embed the video in the report. Recording starts automatically on the first run() call and stops in teardown().

TypeScript

const app = new AppClaw({
  provider:   'gemini',
  apiKey:     process.env.GEMINI_API_KEY,
  platform:   'android',
  reportName: 'YouTube Search',
  video:      true,               // record screen for the whole run
});

await app.run('open YouTube app');
await app.run('tap Search');
await app.run('type Appium 3.0');
await app.run('tap the search button');

await app.teardown();
// ↑ report includes recording.mp4 under the Recording tab

Parallel-safe

Each AppClaw instance records its own session independently — parallel tests do not interfere. Port allocation (MJPEG, system port) is also handled automatically per instance.

Viewing the report

Run the built-in report server after your tests complete:

Shell

npx appclaw --report

This starts a local server and opens the report in your browser. Every run is listed with its steps, screenshots, pass/fail status, and timing.

Report file layout

File tree

.appclaw/
  runs/
    runs.json              # global run index
    <runId>/
      manifest.json        # full run data (steps, timing, success)
      steps/
        step-000.png       # screenshot after step 1
        step-001.png       # screenshot after step 2
        step-002.png

Disabling reports

Set report: false to skip report generation (e.g. in performance-sensitive CI pipelines):

TypeScript

const app = new AppClaw({
  provider: 'gemini',
  apiKey:   process.env.GEMINI_API_KEY,
  report:   false,   // disable report generation
});

Using with Vitest / Jest

Create one AppClaw instance per test file, connect once in beforeAll, and tear down in afterAll. Individual tests call runFlow() or runGoal() and assert on the result.

tests/checkout.test.ts

import { describe, it, expect, afterAll } from 'vitest';
import { AppClaw } from 'appclaw';

const app = new AppClaw({
  provider: 'anthropic',
  apiKey:   process.env.ANTHROPIC_API_KEY,
  platform: 'android',
  maxSteps: 20,
});

afterAll(() => app.teardown());

describe('Checkout flow', () => {
  it('completes purchase as a logged-in user', async () => {
    const result = await app.runFlow('./flows/checkout.yaml');
    expect(result.success).toBe(true);
  });

  it('handles empty cart gracefully', async () => {
    const result = await app.runFlow('./flows/checkout-empty-cart.yaml');
    expect(result.success).toBe(true);
  });

  it('completes in under 15 steps', async () => {
    const result = await app.runFlow('./flows/checkout.yaml');
    expect(result.stepsUsed).toBeLessThan(15);
  });
});

Phased flows & assertion results

For flows that use setup / steps / assertions sections, the failedPhase field tells you exactly where execution broke down:

TypeScript

const result = await app.runFlow('./flows/login-phased.yaml');

if (!result.success) {
  // failedPhase: 'setup' | 'test' | 'assertion'
  console.error(`Failed in ${result.failedPhase} phase`);
  console.error(`Step ${result.failedStep}: ${result.error}`);
}

CI Scripts

For CI pipelines that don't use a test framework, run flows sequentially and exit non-zero on failure. The SDK's silent: true default keeps logs clean.

scripts/smoke-test.ts

import { AppClaw } from 'appclaw';

const app = new AppClaw({
  provider: 'google',
  apiKey:   process.env.GEMINI_API_KEY,
  platform: 'android',
  silent:   true,  // no spinners in CI
});

const flows = [
  './flows/login.yaml',
  './flows/checkout.yaml',
  './flows/search.yaml',
];

for (const flow of flows) {
  const result = await app.runFlow(flow);

  if (!result.success) {
    console.error(`FAILED: ${flow} — ${result.error}`);
    await app.teardown();
    process.exit(1);
  }

  console.log(`PASSED: ${flow} (${result.stepsUsed} steps)`);
}

await app.teardown();
console.log('All flows passed.');

Run it with tsx (no compilation step needed):

Terminal

npx tsx scripts/smoke-test.ts

Options Reference

All fields passed to new AppClaw(options). Every field is optional — unset fields fall back to .env values or built-in defaults, matching CLI behaviour exactly.

Option	Type	Default	Description
provider	string	`'gemini'`	`'anthropic'` \| `'openai'` \| `'gemini'` \| `'groq'` \| `'ollama'`
apiKey	string	—	API key for the chosen LLM provider
model	string	Provider default	Model ID override (e.g. `'claude-opus-4-6'`)
platform	string	—	`'android'` \| `'ios'`
agentMode	string	`'dom'`	`'dom'` uses accessibility tree; `'vision'` uses AI vision
maxSteps	number	`30`	Maximum agent steps before giving up (applies to `runGoal`)
stepDelay	number	`500`	Delay between steps in milliseconds
silent	boolean	`true`	Suppress spinners and terminal colour output. Set `false` to debug locally.
report	boolean	`true`	Auto-generate an HTML report to `.appclaw/runs/` on `teardown()`. Set `false` to disable.
reportName	string	`'AppClaw SDK Run'`	Name shown in the report viewer.
video	boolean	`false`	Record the screen for the entire run and embed the video under the Recording tab in the report. Recording starts on the first `run()` call and stops automatically in `teardown()`. Requires Appium screen recording support.
mcpTransport	string	`'stdio'`	`'stdio'` (local appium-mcp) \| `'sse'` (remote server)
mcpHost	string	`'localhost'`	appium-mcp host when transport is `'sse'`
mcpPort	number	`8080`	appium-mcp port when transport is `'sse'`

TypeScript types

All public types are exported from the top-level 'appclaw' import:

TypeScript

import {
  AppClaw,
  type AppClawOptions,   // constructor options
  type FlowResult,       // returned by runFlow()
  type RunResult,        // returned by run()
  type AgentResult,      // returned by runGoal()
  type RunYamlFlowOptions // second arg to runFlow()
} from 'appclaw';

GitHub Actions

Run AppClaw mobile UI automation flows and AI-driven goals directly in GitHub Actions — Android emulator or iOS simulator included, zero boilerplate.

Available on the GitHub Marketplace as AppClaw Mobile Tests.

workflow.yml

uses: AppiumTestDistribution/AppClaw@v1
with:
  flow: flows/login.yaml
  platform: android
  api-key: ${{ secrets.LLM_API_KEY }}

Quick Start

Android — run a YAML flow

android-flow.yml

name: Mobile Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: AppiumTestDistribution/AppClaw@v1
        with:
          flow: flows/login.yaml
          platform: android
          api-key: ${{ secrets.LLM_API_KEY }}

Android — natural language goal

android-goal.yml

- uses: AppiumTestDistribution/AppClaw@v1
  with:
    goal: 'Open YouTube, search for Appium 3.0, verify the first result is visible'
    platform: android
    api-key: ${{ secrets.LLM_API_KEY }}

iOS — run a YAML flow

ios-flow.yml

jobs:
  test:
    runs-on: macos-14  # iOS requires macOS (Apple Silicon)
    steps:
      - uses: actions/checkout@v4

      - uses: AppiumTestDistribution/AppClaw@v1
        with:
          flow: flows/ios-login.yaml
          platform: ios
          api-key: ${{ secrets.LLM_API_KEY }}

Inputs

All inputs are passed via the with: block in your workflow.

Input	Required	Default	Description
`flow`	one of*	—	Path to a YAML flow file relative to repo root
`goal`	one of*	—	Natural language goal executed by the LLM agent
`platform`	no	`android`	Target platform: `android` or `ios`
`provider`	no	`gemini`	LLM provider: `gemini`, `anthropic`, `openai`, `groq`
`api-key`	yes	—	LLM API key — stored as `LLM_API_KEY`
`model`	no	provider default	LLM model ID to pin (e.g. `gemini-2.0-flash`)
`agent-mode`	no	`dom`	`dom` (element locators) or `vision` (screenshot AI)
`max-steps`	no	`30`	Maximum agent steps before the run fails
`step-delay`	no	`500`	Milliseconds between steps
`android-api-level`	no	`33`	Android emulator API level (33 = Android 13)
`android-profile`	no	`pixel_6`	Android AVD hardware profile
`android-target`	no	`default`	Emulator target: `default` or `google_apis`
`ios-device-type`	no	`simulator`	iOS device type: `simulator` or `real`
`ios-simulator-name`	no	`iPhone 16`	iOS simulator model to boot (e.g. `iPhone 15`, `iPad Air`)
`ios-simulator-os`	no	latest	iOS version filter for simulator selection (e.g. `18.4`)
`mcp-debug`	no	`false`	Enable MCP debug logging (`MCP_DEBUG=1`). Useful for diagnosing CI timeouts.
`cloud-provider`	no	local	Cloud provider: `lambdatest`. Leave empty for local.
`lambdatest-username`	no**	—	LambdaTest account username
`lambdatest-access-key`	no**	—	LambdaTest access key
`lambdatest-device-name`	no**	—	Cloud device name (e.g. `Pixel 7`)
`lambdatest-os-version`	no**	—	Cloud OS version (e.g. `13`, `16`)
`lambdatest-app`	no	—	LambdaTest app ID (`lt://APP...`)
`report`	no	`true`	Upload HTML report as workflow artifact
`report-name`	no	`appclaw-report`	Name of the uploaded artifact
`appclaw-version`	no	`latest`	npm package version to pin

* Provide either flow or goal, not both.

** Required when cloud-provider: lambdatest.

Secrets Setup

Go to your repo → Settings → Secrets and variables → Actions → New repository secret:

Secret name	Description
`LLM_API_KEY`	Your API key — works for any provider (Gemini, Anthropic, OpenAI, Groq)
`LT_USERNAME`	LambdaTest username (only if using cloud devices)
`LT_ACCESS_KEY`	LambdaTest access key (only if using cloud devices)
`LT_APP_ID`	LambdaTest app ID (only if using cloud devices)

Examples

Parallel matrix — run multiple flows concurrently

matrix-parallel.yml

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        flow:
          - flows/login.yaml
          - flows/search.yaml
          - flows/checkout.yaml
    steps:
      - uses: actions/checkout@v4

      - uses: AppiumTestDistribution/AppClaw@v1
        with:
          flow: ${{ matrix.flow }}
          platform: android
          api-key: ${{ secrets.LLM_API_KEY }}
          report-name: report-${{ strategy.job-index }}

LambdaTest cloud devices

Run iOS tests on Ubuntu — no macOS runner needed.

lambdatest.yml

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: AppiumTestDistribution/AppClaw@v1
        with:
          flow: flows/ios-login.yaml
          platform: ios
          api-key: ${{ secrets.LLM_API_KEY }}
          cloud-provider: lambdatest
          lambdatest-username: ${{ secrets.LT_USERNAME }}
          lambdatest-access-key: ${{ secrets.LT_ACCESS_KEY }}
          lambdatest-device-name: 'iPhone 14'
          lambdatest-os-version: '16'
          lambdatest-app: ${{ secrets.LT_APP_ID }}

Vision mode (screenshot-based AI)

vision-mode.yml

- uses: AppiumTestDistribution/AppClaw@v1
  with:
    flow: flows/onboarding.yaml
    platform: android
    agent-mode: vision
    api-key: ${{ secrets.LLM_API_KEY }}

Pin model for cost control

pin-model.yml

- uses: AppiumTestDistribution/AppClaw@v1
  with:
    flow: flows/smoke.yaml
    platform: android
    api-key: ${{ secrets.LLM_API_KEY }}
    model: 'gemini-2.0-flash'  # cheaper/faster than pro

Nightly regression on a schedule

nightly.yml

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM UTC every night

jobs:
  nightly:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: AppiumTestDistribution/AppClaw@v1
        with:
          flow: flows/full-regression.yaml
          platform: android
          api-key: ${{ secrets.LLM_API_KEY }}
          report-name: nightly-report-${{ github.run_id }}

Reports

When report: true (default), an HTML report is uploaded as a workflow artifact after each run. Download it from the Actions run summary → Artifacts. The report includes:

Step-by-step screenshots with tap overlays
Pass/fail status per step
Execution timeline
Screen recording (if video: true is set in your flow)

Use report path in a downstream step

report-path.yml

- uses: AppiumTestDistribution/AppClaw@v1
  id: appclaw
  with:
    flow: flows/login.yaml
    platform: android
    api-key: ${{ secrets.LLM_API_KEY }}

- name: Print report location
  run: echo "Report at ${{ steps.appclaw.outputs.report-path }}"

Runner Requirements

Platform	Runner	Notes
`android`	`ubuntu-latest`	Free tier. KVM-enabled. Emulator boots in ~4-6 min.
`ios`	`macos-14`	Apple Silicon. macOS minutes cost ~10x Linux.

iOS tip: For faster iOS CI, use LambdaTest cloud devices on ubuntu-latest instead of a macOS runner.

What are App Guides?

App Guides (AppGuides) are per-app knowledge snippets injected directly into the agent's context window at the start of every automation run. They encode navigation patterns, gesture shortcuts, and common action paths for a specific app — so the agent never needs to rediscover them by trial and error.

Context Engineering for Mobile

AppGuides are AppClaw's implementation of context engineering — the practice of giving the LLM exactly the right knowledge to act correctly, rather than relying on the model's general training alone. For mobile automation, this means app-specific navigation knowledge baked into the system prompt before the agent ever looks at the screen.

How it works

When AppClaw starts a run against a known app, it automatically loads the matching guide and prepends it to the agent's system prompt with an APP_GUIDE (AppName): prefix. The LLM sees this contextual knowledge before it takes any action — making the first step decisive rather than exploratory.

System prompt injection (simplified)

APP_GUIDE (WhatsApp):

## WhatsApp Navigation
- Bottom tabs: Chats | Updates | Communities | Calls
- New chat: floating pencil/message icon (bottom-right)
- Search: magnifying-glass icon at the top of Chats

## Messaging
- Open a chat → type in the message bar at the bottom → send via arrow icon
- Attach media: paperclip icon next to message bar
- Voice note: long-press the microphone icon

Resolution order

Custom guide — .appclaw/guides/<appId>.md (highest priority, overrides built-ins)
Built-in guide — bundled guides for 10 common apps
No guide — agent explores the app from scratch using only what it sees on screen

Built-in Guides

AppClaw ships with guides for the most commonly automated apps on both Android and iOS. These activate automatically when AppClaw detects the matching package name or bundle ID.

App	Platform	App ID / Bundle ID
Gmail	Android	`com.google.android.gm`
Gmail	iOS	`com.google.gmail`
YouTube	Android	`com.google.android.youtube`
YouTube	iOS	`com.google.ios.youtube`
WhatsApp	Android	`com.whatsapp`
WhatsApp	iOS	`net.whatsapp.WhatsApp`
Chrome	Android	`com.android.chrome`
Chrome	iOS	`com.google.chrome`
Settings	Android	`com.android.settings`
Settings	iOS	`com.apple.Preferences`

Example: WhatsApp Guide

APP_GUIDE (WhatsApp)

## WhatsApp Navigation
- Bottom tabs: Chats | Updates | Communities | Calls
- New chat: floating pencil/message icon (bottom-right)
- Search: magnifying-glass icon at the top of Chats

## Messaging
- Open a chat → type in the message bar at the bottom → send via arrow icon
- Attach media: paperclip icon next to message bar
- Voice note: long-press the microphone icon
- Emoji/stickers: smiley face icon on the left of message bar

## Common Actions
- Star a message: long-press message → star icon
- Forward: long-press message → forward arrow
- Delete: long-press message → trash icon
- Group info: tap the group name at the top of the chat

Example: YouTube Guide

APP_GUIDE (YouTube)

## YouTube Navigation
- Bottom nav: Home | Shorts | + (upload) | Subscriptions | Library
- Search: magnifying-glass icon (top-right)
- Tap a video thumbnail to play; double-tap left/right to seek ±10 s

## Searching
- Tap the search icon → type query → press Enter or tap search icon again
- Filter results: tap "Filters" after searching

## Playback
- Full screen: rotate device or tap the expand icon (bottom-right of player)
- Quality: tap ⋮ inside player → Quality
- Captions: tap CC icon inside player

Custom Guides

Add a guide for any app — or override a built-in — by dropping a Markdown file at .appclaw/guides/<appId>.md in your project directory. Custom guides always take priority over built-ins.

Custom guides always win

If a custom guide exists for an app ID, it replaces the built-in entirely. To extend a built-in guide, copy its contents into your custom file and add your own sections.

Creating a custom guide

Find your app's package name (Android) or bundle ID (iOS). You can get this from the appId field in your YAML flow, or by inspecting the device.
Create the directory .appclaw/guides/ in your project root.
Write a Markdown file named <appId>.md.

.appclaw/guides/com.myapp.android.md

## Main Navigation
- Bottom tabs: Home | Search | Orders | Profile
- Hamburger menu (top-left) → categories and account settings

## Checkout Flow
- Cart icon is always in the top-right corner
- Tap "Proceed to Checkout" → select address → choose payment → Place Order
- Apply coupon: tap "Have a coupon?" on the order summary screen

## Product Search
- Tap the search bar at the top; supports filters: Brand | Price | Rating
- Long-press any product thumbnail to preview without navigating away

Once in place, AppClaw picks it up automatically — no code changes or restarts needed.

Tips for writing good guides

Do	Why
Describe where things are, not what they say	UI labels change; positions are stable
List gestures explicitly ("swipe right to archive")	The agent can't infer non-obvious gestures from a screenshot
Use bullet points over prose	Every token counts — bullets are faster for the model to parse
Document multi-step paths (Settings → Account → Privacy)	Saves the agent multiple round-trips for deeply nested flows
Keep it short (under 500 tokens)	Guides are injected on every step — brevity reduces cost

Quick Start

CLI Options

Platform & Device

Execution

Explorer (Test Generation)

Execution Modes

Agent Mode

YAML Flows

Playground

Designing YAML Flows

Flat Format

Metadata Fields

Phased Format

Phases Explained

Variables & Secrets

Variables ${variables.X}

Secrets ${secrets.X}

Environment File

Inline Variables

Tap / Click

Type Text

Wait / Pause

Wait Until

Scroll / Swipe

Drag / Slider

Assert / Verify

Navigation & Control

Full Command Reference

Vision Modes

Agent Mode AGENT_MODE

Vision Mode VISION_MODE

Environment Variables

LLM Configuration

Device & Platform

Vision

Execution Tuning

MCP Connection

LambdaTest Cloud

Setup

CLI Usage

SDK Usage

Environment Variables

Node.js / TypeScript SDK

Architecture

Installation

runFlow()

FlowResult shape

runGoal()

run()

How instructions are resolved

RunResult shape

Reports

Screen recording

Viewing the report

Report file layout

Disabling reports

Using with Vitest / Jest

Phased flows & assertion results

CI Scripts

Options Reference

TypeScript types

GitHub Actions

Quick Start

Android — run a YAML flow

Android — natural language goal

iOS — run a YAML flow

Inputs

Secrets Setup

Examples

Parallel matrix — run multiple flows concurrently

LambdaTest cloud devices

Vision mode (screenshot-based AI)

Pin model for cost control

Nightly regression on a schedule

Reports

Use report path in a downstream step

Runner Requirements

What are App Guides?

How it works

Variables `${variables.X}`

Secrets `${secrets.X}`

Agent Mode `AGENT_MODE`

Vision Mode `VISION_MODE`