agent-browser

by inferen-sh

agent-browser lets AI agents control a Playwright-powered browser via inference.sh. Open pages, use @e element refs to click, type, drag, upload files, scrape content, and capture screenshots or video. Ideal for web automation, data extraction, and agent-driven browsing workflows.

Stars0

Favorites0

Comments0

AddedMar 27, 2026

CategoryBrowser Automation

Install Command

npx skills add https://github.com/inferen-sh/skills --skill agent-browser

Agent Browser Playwright Bash Automation JavaScript Developer Audience Cli

Overview

What is agent-browser?

agent-browser is a browser automation skill designed for AI agents running on top of the inference.sh platform. It uses Playwright under the hood and exposes a simple, JSON-based interface so agents can:

Open and navigate web pages in a real browser
Interact with elements using stable @e references
Click, type, drag-and-drop, and upload files
Extract structured data for scraping and research
Capture screenshots and record video of sessions

Instead of hand-writing Playwright code, you call agent-browser through the infsh CLI (or from an agent that can run Bash commands). The skill coordinates the browser session, returns machine-friendly descriptions of the page, and lets your agent drive the interaction step by step.

Who is agent-browser for?

agent-browser is aimed at:

Developers wiring AI agents to real websites
Automation engineers who need repeatable browser workflows
Data and research teams doing targeted web scraping or UI-driven research
Workflow builders using inference.sh as an orchestration layer

It fits best when you already use, or are willing to use, inference.sh and want the browser to be a controlled, agent-accessible tool.

What problems does it solve?

agent-browser helps you solve common browser-automation jobs:

Automating login, navigation, and form workflows
Scraping structured content that requires interaction (search forms, filters, pagination)
Running agent-driven “testing-like” flows on live sites
Recording videos of an automated browsing session for review

It abstracts away direct Playwright scripting and gives the agent a higher-level set of actions using @e element references, which helps keep interactions stable across multiple steps.

When is agent-browser a good fit?

Use agent-browser when:

You run agents via inference.sh and need them to browse the web
You want Playwright-level reliability without writing Playwright code
Your flows consist of opening pages, interacting with elements, and reading results

It may not be a good fit when:

You cannot use the infsh CLI or Bash-like tooling
You need extremely custom Playwright features beyond what the skill exposes
Your use case is purely API-based and does not require a real browser

If you need fine-grained control of browser internals or frameworks beyond what the skill exposes, you might prefer direct Playwright scripts. For typical agent-driven automation, agent-browser provides a simpler, higher-level interface.

How to Use

Prerequisites

Before using agent-browser, make sure you have:

An environment where you can run Bash commands
The inference.sh CLI (infsh) installed
An inference.sh account you can log into from the CLI

The skill’s Quick Start explicitly requires the infsh CLI. You can follow the official CLI install instructions from the repository:

CLI install documentation: https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md

Once infsh is installed and configured, you can invoke the agent-browser skill from your terminal or from any agent tooling that can run infsh commands.

Installation and skill activation

agent-browser is distributed as part of the inferen-sh/skills repository. In most inference.sh–based environments you do not need to install a separate npm package; instead, you make sure the skill is available and then call it via infsh.

Typical setup steps:

Install the inference.sh CLI
- Follow cli-install.md from the repo.
Authenticate
- Run:
```
infsh login
```
- Follow the prompts to authenticate with inference.sh.
Confirm skill availability
- Ensure your inference.sh environment has access to the agent-browser app/skill under tools/utilities/agent-browser in the inferen-sh/skills repository.

If you are integrating with a broader “skills” ecosystem that supports npx skills add, you can also wire this repository as a source, but the canonical flow for agent-browser usage is through infsh app run.

Core browser automation workflow

The skill documentation describes a consistent 4-step pattern:

Open – Start a browser session and navigate to a URL.
Interact – Use returned @e element references to click, type, drag, or upload.
Re-snapshot – Request an updated snapshot to get new @e refs after navigation or DOM changes.
Close – End the session; optionally retrieve a video recording if enabled.

This pattern lets your agent maintain a mental model of the page state. Each call passes JSON input and receives structured JSON output, which you feed into your agent’s reasoning loop.

Quick start example

To see agent-browser in action with a simple one-page open, follow the Quick Start pattern from the repo:

infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

What this does:

--function open tells agent-browser to launch a new browser page at the given URL.
--session new creates a new session so later actions can reuse the same browser state.
The skill returns JSON that typically includes element descriptions and @e references your agent can use in follow-up interact calls.

Working with @e element references

A central feature of agent-browser is its use of @e refs. Rather than requiring CSS selectors or XPath, the skill returns handles like @e:button-1 (the exact format depends on the implementation) along with human-readable descriptions.

Your agent then:

Reads the list of available elements and their descriptions.
Chooses the appropriate @e ref (for example, the button labeled “Search”).
Calls an interaction function (such as click or fill) using that @e ref.

This design is optimized for AI agents because they can reason over descriptions rather than low-level DOM details. It also helps keep interactions robust even if the underlying selectors change, as long as descriptions remain interpretable.

Example: open, click, and resnapshot

A typical multi-step flow might look like this (pattern only; adjust to your specific functions):

# 1. Start a session and open a page
OPEN_RESULT=$(infsh app run agent-browser \
  --function open \
  --session new \
  --input '{"url": "https://example.com"}')

# 2. Use OPEN_RESULT to pick an @e ref (e.g. @e:search-button) in your agent logic
# 3. Interact with that element
INTERACT_RESULT=$(infsh app run agent-browser \
  --function interact \
  --session "$INF_SH_SESSION" \
  --input '{"action": "click", "element": "@e:search-button"}')

# 4. Re-snapshot after the click to get updated elements
SNAPSHOT_RESULT=$(infsh app run agent-browser \
  --function snapshot \
  --session "$INF_SH_SESSION" \
  --input '{}')

The names of functions beyond open can vary, so always check the latest SKILL.md and any associated docs in tools/utilities/agent-browser for the exact function signatures and input schema.

Screenshots and video recording

agent-browser can capture visual artifacts of the browsing session:

Screenshots – Useful for debugging agent behavior or storing visual confirmations.
Video – When you close the session with recording enabled, the skill can return or link to a video file of the full automated flow.

These features are configured through the skill’s input options. For details on enabling recording and accessing the outputs, consult the SKILL.md definition and any additional docs under tools/utilities/agent-browser in the repo.

Integration tips for agents and workflows

To make the most of agent-browser in your automation or research workflows:

Persist --session IDs: Make sure your agent stores the session identifier between calls so that multiple actions occur in the same browser.
Parse JSON output carefully: Use robust JSON parsing in your agent’s runtime; element lists and metadata can be rich.
Throttle interactions if needed: If a page is slow or dynamic, ensure your agent accounts for timing and waits for elements to appear between steps when the skill supports that configuration.
Log key actions: Keep a log of open, interact, and close calls if you need to audit or debug your agent’s browsing behavior.

FAQ

What is the relationship between agent-browser, inference.sh, and Playwright?

agent-browser is a skill that runs inside the inference.sh ecosystem. When you invoke it via infsh app run, it uses Playwright as the underlying browser automation engine. You do not call Playwright directly; instead, you work with the higher-level skill functions and @e element references.

How do I install agent-browser?

You do not install agent-browser as a standalone binary or npm package. Instead:

Install the inference.sh CLI (infsh) using the official cli-install.md instructions.
Log in with infsh login.
Ensure your environment has access to the agent-browser skill from the inferen-sh/skills repository (under tools/utilities/agent-browser).

From there, you can immediately invoke the skill via infsh app run agent-browser.

Do I need programming experience to use agent-browser?

Basic command-line and JSON familiarity are strongly recommended. You do not need to write Playwright scripts, but you should be comfortable:

Running infsh commands
Passing JSON as --input
Parsing JSON output in your agent or scripts

For more advanced workflows (conditional logic, loops, error handling), general scripting or programming knowledge is helpful.

Can I use agent-browser outside of inference.sh?

The skill is built specifically for use with inference.sh and is described as “Browser automation for AI agents via inference.sh.” The supported and documented way to run it is through the infsh CLI. If you require a standalone library, you may prefer using Playwright directly in your language of choice.

Is agent-browser suitable for large-scale web scraping?

agent-browser can be used for targeted scraping, especially when pages require interaction or JavaScript rendering. However, for very high-volume scraping at scale, you should consider:

inference.sh account limits and pricing
Respect for target site terms of service and robots.txt
Performance, concurrency, and rate limiting

For smaller-scale or workflow-specific scraping embedded in an agent, agent-browser is a strong fit. For massive crawling across many sites, a dedicated scraping stack may be more appropriate.

How does session management work?

Session management is controlled via the --session flag when calling infsh app run. A typical pattern is:

--session new when you call open for the first time
Reusing that session ID for subsequent interact and snapshot calls
Calling the appropriate close function to end the session and optionally retrieve video

Always consult the current SKILL.md for the exact options and outputs related to session management.

Where can I find the full specification of functions and inputs?

The authoritative reference for agent-browser lives in the repository:

SKILL.md at the root of the inferen-sh/skills repo
The tools/utilities/agent-browser directory for implementation details, examples, and any additional documentation

Open these files to see the current list of functions, expected JSON inputs, and output formats, then model your agent or scripts around those definitions.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

electron

by vercel-labs

Automate existing Electron desktop apps like VS Code, Slack, Discord, Figma, Notion, and Spotify via agent-browser and Chrome DevTools Protocol (CDP). This skill helps you connect to a running Electron app, take snapshots, and interact with its UI as part of end-to-end desktop and workflow automation.

Desktop Automation

Favorites 0GitHub 25.2K

agent-browser

by vercel-labs

agent-browser is a Chrome/Chromium automation CLI for AI agents and shell scripts. Use it to open pages, navigate, click, fill forms, capture snapshots, take screenshots, record video, profile performance, manage sessions, handle authentication, and automate end-to-end browser workflows.

Browser Automation

Favorites 0GitHub 0

agent-tools

by inferen-sh

agent-tools exposes the inference.sh CLI inside your agent so you can run 150+ AI apps from one place: image generation, video creation, LLMs, search, 3D, and Twitter automation. Ideal when you need a unified workflow runner for FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and more without managing GPUs or complex integrations.

Workflow Automation

Favorites 0GitHub 0

dogfood

by vercel-labs

Automate exploratory QA of any web application with structured bug reports, screenshots, and videos. dogfood drives the agent-browser client to explore a target site, find visual, functional, UX, performance, console, and accessibility issues, and output a ready-to-share QA report with clear repro steps.

Test Automation

Favorites 0GitHub 25.2K

vercel-sandbox

by vercel-labs

Run agent-browser with headless Chrome inside Vercel Sandbox microVMs so Vercel-deployed apps can perform real browser automation, screenshots, and page interactions safely and at scale.

Browser Automation

Favorites 0GitHub 25.2K

slack

by vercel-labs

Automate Slack from the command line using browser automation. The slack skill connects to an existing Slack web session via agent-browser so you can check unread channels, scan DMs, search conversations, extract data, and capture structured reports as part of larger workflows.

Workflow Automation

Favorites 0GitHub 25.2K

competitor-alternatives

by coreyhaines31

Skill for creating honest, SEO-ready competitor alternative and comparison pages that double as sales enablement content across alternatives lists, vs pages, and competitor teardowns.

Competitive Analysis

Favorites 0GitHub 17K

customer-persona

by inferen-sh

Research-backed customer persona creation with market data and avatar generation. Turn raw audience research into clear personas, journeys, and anti-personas for marketing, UX, product, and sales enablement.

UX Research

Favorites 0GitHub 0