V

agent-browser

by vercel-labs

agent-browser is a Chrome/Chromium automation CLI for AI agents and shell scripts. Use it to open pages, navigate, click, fill forms, capture snapshots, take screenshots, record video, profile performance, manage sessions, handle authentication, and automate end-to-end browser workflows.

Stars0
Favorites0
Comments0
CategoryBrowser Automation
Install Command
npx skills add https://github.com/vercel-labs/agent-browser --skill agent-browser
Overview

Overview

What is agent-browser?

agent-browser is a command-line browser automation tool designed for AI agents and shell-based workflows. It connects directly to Chrome or Chromium via the Chrome DevTools Protocol (CDP), so you can script real browser interactions from the terminal or an agent runtime.

With agent-browser you can:

  • Open and navigate web pages (agent-browser open <url>)
  • Discover interactive elements via structured snapshots
  • Click buttons, follow links, and interact with forms
  • Fill inputs, type text, and press keys
  • Take snapshots to understand page structure and available actions
  • Manage sessions and preserve authenticated state
  • Work through authentication flows (including OAuth and 2FA with human help)
  • Use proxies for geo-testing or corporate environments
  • Record performance traces for profiling
  • Capture video of browser sessions for debugging or documentation

Who is agent-browser for?

agent-browser is a good fit if you:

  • Run an AI agent or automation framework that needs real browser control
  • Want a CLI-first way to automate Chrome/Chromium workflows
  • Need robust element targeting that is friendly to LLMs (using compact @refs)
  • Automate login flows, form submissions, or multi-step web app flows
  • Capture reproducible tests, demos, or debugging sessions as video or traces

It is especially useful in these scenarios:

  • Browser automation: scripted navigation, clicking, and form filling
  • Workflow automation: end-to-end sequences like "log in → navigate → export report"
  • Test automation: smoke tests, regression checks, and performance profiling of web apps

When agent-browser is and is not a good fit

Use agent-browser when:

  • You can run a local CLI and have access to Chrome or Chromium
  • You want deterministic, scriptable browser behavior exposed to an AI agent
  • You require fine-grained control over sessions, cookies, and authentication

It may not be a good fit when:

  • You cannot install or run Chrome/Chromium on the host
  • You only need raw HTML or simple HTTP requests (a pure HTTP client or scraper may be simpler)
  • You need headless browser control in languages or runtimes that are tightly coupled to other browser automation libraries

How to Use

Installation options

agent-browser supports multiple installation methods. Choose one that matches your environment:

  • npm (Node.js)

    npm i -g agent-browser
    
  • Homebrew (macOS/Linux)

    brew install agent-browser
    
  • Rust / Cargo

    cargo install agent-browser
    

After installing the CLI, run the built-in Chrome setup:

agent-browser install

This downloads and wires up a compatible Chrome/Chromium build. When a new version is available, update with:

agent-browser upgrade

If you are using agent-browser as a skill in an agent platform, you can also add it with:

npx skills add https://github.com/vercel-labs/agent-browser --skill agent-browser

Check the SKILL.md file in the repository for the latest skill-specific wiring details.

Core browser automation workflow

Every agent-browser workflow follows a simple loop: open → snapshot → interact → re-snapshot.

  1. Navigate to a page

    agent-browser open https://example.com/form
    
  2. Take a snapshot to discover elements
    Use the interactive snapshot mode to get a compact list of clickable and fillable elements with @refs:

    agent-browser snapshot -i
    

    Example output (simplified):

    @e1 [input type="email"]
    @e2 [input type="password"]
    @e3 [button] "Submit"
    
  3. Interact using refs

    agent-browser fill @e1 "user@example.com"
    agent-browser fill @e2 "password123"
    agent-browser click @e3
    
  4. Wait and re-snapshot

    agent-browser wait --load networkidle
    agent-browser snapshot -i
    

This pattern allows an AI agent to reason over a compact structural view instead of the full DOM, which significantly reduces context usage.

Command reference basics

agent-browser exposes a rich set of commands (see references/commands.md), including:

  • Navigation

    agent-browser open <url>
    agent-browser back
    agent-browser forward
    agent-browser reload
    agent-browser close
    
  • Snapshot and refs

    agent-browser snapshot          # full tree
    agent-browser snapshot -i       # interactive elements only (recommended)
    agent-browser snapshot -c       # compact output
    agent-browser snapshot -d 3     # limit depth
    agent-browser snapshot -s "#main"  # scoped to CSS selector
    
  • Interactions

    agent-browser click @e1
    agent-browser dblclick @e1
    agent-browser hover @e1
    agent-browser focus @e1
    agent-browser fill @e2 "text"
    agent-browser type @e2 "text"
    agent-browser press Enter
    

Use references/snapshot-refs.md for deeper guidance on how @refs are generated and how long they remain valid.

Working with sessions and authentication

agent-browser provides built-in tools for authenticated and multi-session browsing. This is useful for login flows, multi-account testing, or isolating user roles.

  • Named sessions (see references/session-management.md):

    # Session "auth": login flow
    agent-browser --session auth open https://app.example.com/login
    
    # Session "public": separate browsing
    agent-browser --session public open https://example.com
    

    Each session has isolated cookies, storage, cache, and history.

  • Session state persistence:

    # Save cookies and storage
    agent-browser state save ./auth-state.json
    
    # Restore later
    agent-browser state load ./auth-state.json
    agent-browser open https://app.example.com/dashboard
    
  • Authentication patterns (see references/authentication.md):

    • Import cookies from a debug-enabled Chrome you are already logged into
    • Walk through standard login forms with snapshots and fill/click
    • Handle cookie-based auth, HTTP basic auth, and token refresh

For complex OAuth or 2FA flows, a human may still be involved in the initial setup, after which agent-browser can reuse the saved authenticated state.

Proxy support and network configuration

If you need to route traffic through a proxy (for geo-testing, rate limiting, or corporate environments), use the options documented in references/proxy-support.md:

  • HTTP/HTTPS proxy via CLI flag

    agent-browser --proxy "http://proxy.example.com:8080" open https://example.com
    
  • Environment variable configuration

    export HTTP_PROXY="http://proxy.example.com:8080"
    export HTTPS_PROXY="https://proxy.example.com:8080"
    agent-browser open https://example.com
    
  • SOCKS proxy

    export ALL_PROXY="socks5://proxy.example.com:1080"
    agent-browser open https://example.com
    

Profiling and performance tracing

For test automation and performance investigations, agent-browser can capture Chrome performance traces (see references/profiling.md):

# Start profiling
agent-browser profiler start

# Run your scenario
agent-browser open https://example.com
agent-browser click @e1
agent-browser wait 1000

# Stop and save trace
agent-browser profiler stop ./trace.json

You can open the resulting trace.json in Chrome DevTools (Performance tab) or compatible viewers to analyze JavaScript execution, rendering, and user timing events.

Video recording for debugging and documentation

agent-browser can record a video of the browser session, which is helpful for debugging failing automations or creating how-to guides (see references/video-recording.md):

# Start recording
agent-browser record start ./demo.webm

# Perform actions
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1

# Stop recording
agent-browser record stop

You can embed these .webm recordings in documentation, share them with teammates, or attach them to bug reports.

Using templates for common workflows

The repository includes shell script templates in the templates/ directory to help you build repeatable workflows:

  • templates/form-automation.sh – structured pattern for filling and submitting forms
  • templates/authenticated-session.sh – example for logging in and persisting session state
  • templates/capture-workflow.sh – pattern for snapshotting or recording a multi-step flow

You can copy and adapt these scripts to your own environment, CI jobs, or agent pipelines.

FAQ

What problems does agent-browser solve compared to simple HTTP clients?

agent-browser controls a real Chrome/Chromium instance via CDP. That means it can handle:

  • Client-side rendering and complex JavaScript
  • Single-page apps that depend on browser APIs
  • Real user interactions like clicks, typing, and key presses
  • Visual timing, rendering behavior, and performance traces

If you only need raw HTML or JSON from basic endpoints, an HTTP client might be enough. For anything that behaves like a real user in a browser, agent-browser is more appropriate.

How do I install Chrome or Chromium for agent-browser?

After installing the CLI with npm, Homebrew, or Cargo, run:

agent-browser install

This downloads and configures a compatible Chrome/Chromium build that agent-browser can control via CDP. When a new version is released, update with:

agent-browser upgrade

Can agent-browser reuse my existing logged-in browser session?

Yes. references/authentication.md describes how to start Chrome with --remote-debugging-port and import cookies from a session you are already logged into. Once imported, you can save that authenticated state with agent-browser state save and restore it later without repeating the entire login flow.

Is agent-browser suitable for CI and automated testing?

Yes. agent-browser is a CLI tool that works well in automated environments as long as Chrome/Chromium is available. You can:

  • Run end-to-end flows as part of test suites
  • Capture performance traces during builds
  • Record videos of failing scenarios

For CI, use the installation method that matches your build image (npm, Homebrew, or Cargo), then script your flows using shell scripts or your agent framework.

How does agent-browser help AI agents work with complex pages?

Instead of dumping the full DOM, agent-browser provides compact snapshots with stable @refs for important elements (links, buttons, inputs, etc.). This drastically reduces token usage and makes it easier for an AI agent to:

  • Understand page structure
  • Select the right element by ref
  • Issue precise click, fill, and press commands

references/snapshot-refs.md explains how refs are generated, when to refresh them, and best practices for robust automation.

Does agent-browser support proxies and corporate networks?

Yes. You can configure HTTP, HTTPS, and SOCKS proxies either through CLI flags (--proxy) or environment variables (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY). references/proxy-support.md covers basic configuration, authenticated proxies, bypass rules, and troubleshooting tips.

Where should I start in the repository to learn more?

For a practical deep dive into agent-browser:

  • Start with SKILL.md for the high-level overview and quick start
  • Read references/commands.md for the full command list and options
  • Check references/authentication.md, references/session-management.md, references/snapshot-refs.md, references/profiling.md, and references/video-recording.md for focused topics
  • Explore the templates/ directory for ready-made workflow scripts that you can adapt to your own use cases

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...