agent-browser

by vercel-labs

agent-browser is a Chrome/Chromium automation CLI for AI agents and shell scripts. Use it to open pages, navigate, click, fill forms, capture snapshots, take screenshots, record video, profile performance, manage sessions, handle authentication, and automate end-to-end browser workflows.

Stars0

Favorites0

Comments0

CategoryBrowser Automation

Install Command

npx skills add https://github.com/vercel-labs/agent-browser --skill agent-browser

Automation Cli Chrome Chrome Devtools Protocol Agent Browser Testing

Overview

What is agent-browser?

agent-browser is a command-line browser automation tool designed for AI agents and shell-based workflows. It connects directly to Chrome or Chromium via the Chrome DevTools Protocol (CDP), so you can script real browser interactions from the terminal or an agent runtime.

With agent-browser you can:

Open and navigate web pages (agent-browser open <url>)
Discover interactive elements via structured snapshots
Click buttons, follow links, and interact with forms
Fill inputs, type text, and press keys
Take snapshots to understand page structure and available actions
Manage sessions and preserve authenticated state
Work through authentication flows (including OAuth and 2FA with human help)
Use proxies for geo-testing or corporate environments
Record performance traces for profiling
Capture video of browser sessions for debugging or documentation

Who is agent-browser for?

agent-browser is a good fit if you:

Run an AI agent or automation framework that needs real browser control
Want a CLI-first way to automate Chrome/Chromium workflows
Need robust element targeting that is friendly to LLMs (using compact @refs)
Automate login flows, form submissions, or multi-step web app flows
Capture reproducible tests, demos, or debugging sessions as video or traces

It is especially useful in these scenarios:

Browser automation: scripted navigation, clicking, and form filling
Workflow automation: end-to-end sequences like "log in → navigate → export report"
Test automation: smoke tests, regression checks, and performance profiling of web apps

When agent-browser is and is not a good fit

Use agent-browser when:

You can run a local CLI and have access to Chrome or Chromium
You want deterministic, scriptable browser behavior exposed to an AI agent
You require fine-grained control over sessions, cookies, and authentication

It may not be a good fit when:

You cannot install or run Chrome/Chromium on the host
You only need raw HTML or simple HTTP requests (a pure HTTP client or scraper may be simpler)
You need headless browser control in languages or runtimes that are tightly coupled to other browser automation libraries

How to Use

Installation options

agent-browser supports multiple installation methods. Choose one that matches your environment:

npm (Node.js)
```
npm i -g agent-browser
```
Homebrew (macOS/Linux)
```
brew install agent-browser
```
Rust / Cargo
```
cargo install agent-browser
```

After installing the CLI, run the built-in Chrome setup:

agent-browser install

This downloads and wires up a compatible Chrome/Chromium build. When a new version is available, update with:

agent-browser upgrade

If you are using agent-browser as a skill in an agent platform, you can also add it with:

npx skills add https://github.com/vercel-labs/agent-browser --skill agent-browser

Check the SKILL.md file in the repository for the latest skill-specific wiring details.

Core browser automation workflow

Every agent-browser workflow follows a simple loop: open → snapshot → interact → re-snapshot.

Navigate to a page

agent-browser open https://example.com/form

Take a snapshot to discover elements
Use the interactive snapshot mode to get a compact list of clickable and fillable elements with @refs:
```
agent-browser snapshot -i
```
Example output (simplified):
```
@e1 [input type="email"]
@e2 [input type="password"]
@e3 [button] "Submit"
```

Interact using refs

agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3

Wait and re-snapshot

agent-browser wait --load networkidle
agent-browser snapshot -i

This pattern allows an AI agent to reason over a compact structural view instead of the full DOM, which significantly reduces context usage.

Command reference basics

agent-browser exposes a rich set of commands (see references/commands.md), including:

Navigation

agent-browser open <url>
agent-browser back
agent-browser forward
agent-browser reload
agent-browser close

Snapshot and refs

agent-browser snapshot          # full tree
agent-browser snapshot -i       # interactive elements only (recommended)
agent-browser snapshot -c       # compact output
agent-browser snapshot -d 3     # limit depth
agent-browser snapshot -s "#main"  # scoped to CSS selector

Interactions

agent-browser click @e1
agent-browser dblclick @e1
agent-browser hover @e1
agent-browser focus @e1
agent-browser fill @e2 "text"
agent-browser type @e2 "text"
agent-browser press Enter

Use references/snapshot-refs.md for deeper guidance on how @refs are generated and how long they remain valid.

Working with sessions and authentication

agent-browser provides built-in tools for authenticated and multi-session browsing. This is useful for login flows, multi-account testing, or isolating user roles.

Named sessions (see references/session-management.md):

# Session "auth": login flow
agent-browser --session auth open https://app.example.com/login

# Session "public": separate browsing
agent-browser --session public open https://example.com

Each session has isolated cookies, storage, cache, and history.

Session state persistence:

# Save cookies and storage
agent-browser state save ./auth-state.json

# Restore later
agent-browser state load ./auth-state.json
agent-browser open https://app.example.com/dashboard

Authentication patterns (see references/authentication.md):
- Import cookies from a debug-enabled Chrome you are already logged into
- Walk through standard login forms with snapshots and fill/click
- Handle cookie-based auth, HTTP basic auth, and token refresh

For complex OAuth or 2FA flows, a human may still be involved in the initial setup, after which agent-browser can reuse the saved authenticated state.

Proxy support and network configuration

If you need to route traffic through a proxy (for geo-testing, rate limiting, or corporate environments), use the options documented in references/proxy-support.md:

HTTP/HTTPS proxy via CLI flag

agent-browser --proxy "http://proxy.example.com:8080" open https://example.com

Environment variable configuration

export HTTP_PROXY="http://proxy.example.com:8080"
export HTTPS_PROXY="https://proxy.example.com:8080"
agent-browser open https://example.com

SOCKS proxy

export ALL_PROXY="socks5://proxy.example.com:1080"
agent-browser open https://example.com

Profiling and performance tracing

For test automation and performance investigations, agent-browser can capture Chrome performance traces (see references/profiling.md):

# Start profiling
agent-browser profiler start

# Run your scenario
agent-browser open https://example.com
agent-browser click @e1
agent-browser wait 1000

# Stop and save trace
agent-browser profiler stop ./trace.json

You can open the resulting trace.json in Chrome DevTools (Performance tab) or compatible viewers to analyze JavaScript execution, rendering, and user timing events.

Video recording for debugging and documentation

agent-browser can record a video of the browser session, which is helpful for debugging failing automations or creating how-to guides (see references/video-recording.md):

# Start recording
agent-browser record start ./demo.webm

# Perform actions
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1

# Stop recording
agent-browser record stop

You can embed these .webm recordings in documentation, share them with teammates, or attach them to bug reports.

Using templates for common workflows

The repository includes shell script templates in the templates/ directory to help you build repeatable workflows:

templates/form-automation.sh – structured pattern for filling and submitting forms
templates/authenticated-session.sh – example for logging in and persisting session state
templates/capture-workflow.sh – pattern for snapshotting or recording a multi-step flow

You can copy and adapt these scripts to your own environment, CI jobs, or agent pipelines.

FAQ

What problems does agent-browser solve compared to simple HTTP clients?

agent-browser controls a real Chrome/Chromium instance via CDP. That means it can handle:

Client-side rendering and complex JavaScript
Single-page apps that depend on browser APIs
Real user interactions like clicks, typing, and key presses
Visual timing, rendering behavior, and performance traces

If you only need raw HTML or JSON from basic endpoints, an HTTP client might be enough. For anything that behaves like a real user in a browser, agent-browser is more appropriate.

How do I install Chrome or Chromium for agent-browser?

After installing the CLI with npm, Homebrew, or Cargo, run:

agent-browser install

This downloads and configures a compatible Chrome/Chromium build that agent-browser can control via CDP. When a new version is released, update with:

agent-browser upgrade

Can agent-browser reuse my existing logged-in browser session?

Yes. references/authentication.md describes how to start Chrome with --remote-debugging-port and import cookies from a session you are already logged into. Once imported, you can save that authenticated state with agent-browser state save and restore it later without repeating the entire login flow.

Is agent-browser suitable for CI and automated testing?

Yes. agent-browser is a CLI tool that works well in automated environments as long as Chrome/Chromium is available. You can:

Run end-to-end flows as part of test suites
Capture performance traces during builds
Record videos of failing scenarios

For CI, use the installation method that matches your build image (npm, Homebrew, or Cargo), then script your flows using shell scripts or your agent framework.

How does agent-browser help AI agents work with complex pages?

Instead of dumping the full DOM, agent-browser provides compact snapshots with stable @refs for important elements (links, buttons, inputs, etc.). This drastically reduces token usage and makes it easier for an AI agent to:

Understand page structure
Select the right element by ref
Issue precise click, fill, and press commands

references/snapshot-refs.md explains how refs are generated, when to refresh them, and best practices for robust automation.

Does agent-browser support proxies and corporate networks?

Yes. You can configure HTTP, HTTPS, and SOCKS proxies either through CLI flags (--proxy) or environment variables (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY). references/proxy-support.md covers basic configuration, authenticated proxies, bypass rules, and troubleshooting tips.

Where should I start in the repository to learn more?

For a practical deep dive into agent-browser:

Start with SKILL.md for the high-level overview and quick start
Read references/commands.md for the full command list and options
Check references/authentication.md, references/session-management.md, references/snapshot-refs.md, references/profiling.md, and references/video-recording.md for focused topics
Explore the templates/ directory for ready-made workflow scripts that you can adapt to your own use cases

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

slack

by vercel-labs

Automate Slack from the command line using browser automation. The slack skill connects to an existing Slack web session via agent-browser so you can check unread channels, scan DMs, search conversations, extract data, and capture structured reports as part of larger workflows.

Workflow Automation

Favorites 0GitHub 25.2K

dogfood

by vercel-labs

Automate exploratory QA of any web application with structured bug reports, screenshots, and videos. dogfood drives the agent-browser client to explore a target site, find visual, functional, UX, performance, console, and accessibility issues, and output a ready-to-share QA report with clear repro steps.

Test Automation

Favorites 0GitHub 25.2K

vercel-sandbox

by vercel-labs

Run agent-browser with headless Chrome inside Vercel Sandbox microVMs so Vercel-deployed apps can perform real browser automation, screenshots, and page interactions safely and at scale.

Browser Automation

Favorites 0GitHub 25.2K

electron

by vercel-labs

Automate existing Electron desktop apps like VS Code, Slack, Discord, Figma, Notion, and Spotify via agent-browser and Chrome DevTools Protocol (CDP). This skill helps you connect to a running Electron app, take snapshots, and interact with its UI as part of end-to-end desktop and workflow automation.

Desktop Automation

Favorites 0GitHub 25.2K

skill-creator

by anthropics

Create, refine, test, and benchmark agent skills with the skill-creator workflow, including eval review, grading, blind comparison, and description improvement.

Skill Authoring

Favorites 0GitHub 0

deploy-to-vercel

by vercel-labs

Install the deploy-to-vercel skill to deploy apps and websites to Vercel preview environments with a practical CLI-first workflow.

Deployment

Favorites 0GitHub 0

frontend-design

by anthropics

Use the frontend-design skill to create polished frontend interfaces with a strong visual direction, practical code output, and better-than-generic UI results.

UI Design

Favorites 0GitHub 0

doc-coauthoring

by anthropics

A practical Claude skill for guiding users through a structured co-authoring workflow for docs, proposals, specs, RFCs, and decision documents.

Technical Writing

Favorites 0GitHub 0