agent-browser
by vercel-labsagent-browser is a Chrome/Chromium automation CLI for AI agents and shell scripts. Use it to open pages, navigate, click, fill forms, capture snapshots, take screenshots, record video, profile performance, manage sessions, handle authentication, and automate end-to-end browser workflows.
Overview
What is agent-browser?
agent-browser is a command-line browser automation tool designed for AI agents and shell-based workflows. It connects directly to Chrome or Chromium via the Chrome DevTools Protocol (CDP), so you can script real browser interactions from the terminal or an agent runtime.
With agent-browser you can:
- Open and navigate web pages (
agent-browser open <url>) - Discover interactive elements via structured snapshots
- Click buttons, follow links, and interact with forms
- Fill inputs, type text, and press keys
- Take snapshots to understand page structure and available actions
- Manage sessions and preserve authenticated state
- Work through authentication flows (including OAuth and 2FA with human help)
- Use proxies for geo-testing or corporate environments
- Record performance traces for profiling
- Capture video of browser sessions for debugging or documentation
Who is agent-browser for?
agent-browser is a good fit if you:
- Run an AI agent or automation framework that needs real browser control
- Want a CLI-first way to automate Chrome/Chromium workflows
- Need robust element targeting that is friendly to LLMs (using compact
@refs) - Automate login flows, form submissions, or multi-step web app flows
- Capture reproducible tests, demos, or debugging sessions as video or traces
It is especially useful in these scenarios:
- Browser automation: scripted navigation, clicking, and form filling
- Workflow automation: end-to-end sequences like "log in → navigate → export report"
- Test automation: smoke tests, regression checks, and performance profiling of web apps
When agent-browser is and is not a good fit
Use agent-browser when:
- You can run a local CLI and have access to Chrome or Chromium
- You want deterministic, scriptable browser behavior exposed to an AI agent
- You require fine-grained control over sessions, cookies, and authentication
It may not be a good fit when:
- You cannot install or run Chrome/Chromium on the host
- You only need raw HTML or simple HTTP requests (a pure HTTP client or scraper may be simpler)
- You need headless browser control in languages or runtimes that are tightly coupled to other browser automation libraries
How to Use
Installation options
agent-browser supports multiple installation methods. Choose one that matches your environment:
-
npm (Node.js)
npm i -g agent-browser -
Homebrew (macOS/Linux)
brew install agent-browser -
Rust / Cargo
cargo install agent-browser
After installing the CLI, run the built-in Chrome setup:
agent-browser install
This downloads and wires up a compatible Chrome/Chromium build. When a new version is available, update with:
agent-browser upgrade
If you are using agent-browser as a skill in an agent platform, you can also add it with:
npx skills add https://github.com/vercel-labs/agent-browser --skill agent-browser
Check the SKILL.md file in the repository for the latest skill-specific wiring details.
Core browser automation workflow
Every agent-browser workflow follows a simple loop: open → snapshot → interact → re-snapshot.
-
Navigate to a page
agent-browser open https://example.com/form -
Take a snapshot to discover elements
Use the interactive snapshot mode to get a compact list of clickable and fillable elements with@refs:agent-browser snapshot -iExample output (simplified):
@e1 [input type="email"] @e2 [input type="password"] @e3 [button] "Submit" -
Interact using refs
agent-browser fill @e1 "user@example.com" agent-browser fill @e2 "password123" agent-browser click @e3 -
Wait and re-snapshot
agent-browser wait --load networkidle agent-browser snapshot -i
This pattern allows an AI agent to reason over a compact structural view instead of the full DOM, which significantly reduces context usage.
Command reference basics
agent-browser exposes a rich set of commands (see references/commands.md), including:
-
Navigation
agent-browser open <url> agent-browser back agent-browser forward agent-browser reload agent-browser close -
Snapshot and refs
agent-browser snapshot # full tree agent-browser snapshot -i # interactive elements only (recommended) agent-browser snapshot -c # compact output agent-browser snapshot -d 3 # limit depth agent-browser snapshot -s "#main" # scoped to CSS selector -
Interactions
agent-browser click @e1 agent-browser dblclick @e1 agent-browser hover @e1 agent-browser focus @e1 agent-browser fill @e2 "text" agent-browser type @e2 "text" agent-browser press Enter
Use references/snapshot-refs.md for deeper guidance on how @refs are generated and how long they remain valid.
Working with sessions and authentication
agent-browser provides built-in tools for authenticated and multi-session browsing. This is useful for login flows, multi-account testing, or isolating user roles.
-
Named sessions (see
references/session-management.md):# Session "auth": login flow agent-browser --session auth open https://app.example.com/login # Session "public": separate browsing agent-browser --session public open https://example.comEach session has isolated cookies, storage, cache, and history.
-
Session state persistence:
# Save cookies and storage agent-browser state save ./auth-state.json # Restore later agent-browser state load ./auth-state.json agent-browser open https://app.example.com/dashboard -
Authentication patterns (see
references/authentication.md):- Import cookies from a debug-enabled Chrome you are already logged into
- Walk through standard login forms with snapshots and
fill/click - Handle cookie-based auth, HTTP basic auth, and token refresh
For complex OAuth or 2FA flows, a human may still be involved in the initial setup, after which agent-browser can reuse the saved authenticated state.
Proxy support and network configuration
If you need to route traffic through a proxy (for geo-testing, rate limiting, or corporate environments), use the options documented in references/proxy-support.md:
-
HTTP/HTTPS proxy via CLI flag
agent-browser --proxy "http://proxy.example.com:8080" open https://example.com -
Environment variable configuration
export HTTP_PROXY="http://proxy.example.com:8080" export HTTPS_PROXY="https://proxy.example.com:8080" agent-browser open https://example.com -
SOCKS proxy
export ALL_PROXY="socks5://proxy.example.com:1080" agent-browser open https://example.com
Profiling and performance tracing
For test automation and performance investigations, agent-browser can capture Chrome performance traces (see references/profiling.md):
# Start profiling
agent-browser profiler start
# Run your scenario
agent-browser open https://example.com
agent-browser click @e1
agent-browser wait 1000
# Stop and save trace
agent-browser profiler stop ./trace.json
You can open the resulting trace.json in Chrome DevTools (Performance tab) or compatible viewers to analyze JavaScript execution, rendering, and user timing events.
Video recording for debugging and documentation
agent-browser can record a video of the browser session, which is helpful for debugging failing automations or creating how-to guides (see references/video-recording.md):
# Start recording
agent-browser record start ./demo.webm
# Perform actions
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
# Stop recording
agent-browser record stop
You can embed these .webm recordings in documentation, share them with teammates, or attach them to bug reports.
Using templates for common workflows
The repository includes shell script templates in the templates/ directory to help you build repeatable workflows:
templates/form-automation.sh– structured pattern for filling and submitting formstemplates/authenticated-session.sh– example for logging in and persisting session statetemplates/capture-workflow.sh– pattern for snapshotting or recording a multi-step flow
You can copy and adapt these scripts to your own environment, CI jobs, or agent pipelines.
FAQ
What problems does agent-browser solve compared to simple HTTP clients?
agent-browser controls a real Chrome/Chromium instance via CDP. That means it can handle:
- Client-side rendering and complex JavaScript
- Single-page apps that depend on browser APIs
- Real user interactions like clicks, typing, and key presses
- Visual timing, rendering behavior, and performance traces
If you only need raw HTML or JSON from basic endpoints, an HTTP client might be enough. For anything that behaves like a real user in a browser, agent-browser is more appropriate.
How do I install Chrome or Chromium for agent-browser?
After installing the CLI with npm, Homebrew, or Cargo, run:
agent-browser install
This downloads and configures a compatible Chrome/Chromium build that agent-browser can control via CDP. When a new version is released, update with:
agent-browser upgrade
Can agent-browser reuse my existing logged-in browser session?
Yes. references/authentication.md describes how to start Chrome with --remote-debugging-port and import cookies from a session you are already logged into. Once imported, you can save that authenticated state with agent-browser state save and restore it later without repeating the entire login flow.
Is agent-browser suitable for CI and automated testing?
Yes. agent-browser is a CLI tool that works well in automated environments as long as Chrome/Chromium is available. You can:
- Run end-to-end flows as part of test suites
- Capture performance traces during builds
- Record videos of failing scenarios
For CI, use the installation method that matches your build image (npm, Homebrew, or Cargo), then script your flows using shell scripts or your agent framework.
How does agent-browser help AI agents work with complex pages?
Instead of dumping the full DOM, agent-browser provides compact snapshots with stable @refs for important elements (links, buttons, inputs, etc.). This drastically reduces token usage and makes it easier for an AI agent to:
- Understand page structure
- Select the right element by ref
- Issue precise
click,fill, andpresscommands
references/snapshot-refs.md explains how refs are generated, when to refresh them, and best practices for robust automation.
Does agent-browser support proxies and corporate networks?
Yes. You can configure HTTP, HTTPS, and SOCKS proxies either through CLI flags (--proxy) or environment variables (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY). references/proxy-support.md covers basic configuration, authenticated proxies, bypass rules, and troubleshooting tips.
Where should I start in the repository to learn more?
For a practical deep dive into agent-browser:
- Start with
SKILL.mdfor the high-level overview and quick start - Read
references/commands.mdfor the full command list and options - Check
references/authentication.md,references/session-management.md,references/snapshot-refs.md,references/profiling.md, andreferences/video-recording.mdfor focused topics - Explore the
templates/directory for ready-made workflow scripts that you can adapt to your own use cases
