I

agent-browser

by inferen-sh

agent-browser lets AI agents control a Playwright-powered browser via inference.sh. Open pages, use @e element refs to click, type, drag, upload files, scrape content, and capture screenshots or video. Ideal for web automation, data extraction, and agent-driven browsing workflows.

Stars0
Favorites0
Comments0
AddedMar 27, 2026
CategoryBrowser Automation
Install Command
npx skills add https://github.com/inferen-sh/skills --skill agent-browser
Overview

Overview

What is agent-browser?

agent-browser is a browser automation skill designed for AI agents running on top of the inference.sh platform. It uses Playwright under the hood and exposes a simple, JSON-based interface so agents can:

  • Open and navigate web pages in a real browser
  • Interact with elements using stable @e references
  • Click, type, drag-and-drop, and upload files
  • Extract structured data for scraping and research
  • Capture screenshots and record video of sessions

Instead of hand-writing Playwright code, you call agent-browser through the infsh CLI (or from an agent that can run Bash commands). The skill coordinates the browser session, returns machine-friendly descriptions of the page, and lets your agent drive the interaction step by step.

Who is agent-browser for?

agent-browser is aimed at:

  • Developers wiring AI agents to real websites
  • Automation engineers who need repeatable browser workflows
  • Data and research teams doing targeted web scraping or UI-driven research
  • Workflow builders using inference.sh as an orchestration layer

It fits best when you already use, or are willing to use, inference.sh and want the browser to be a controlled, agent-accessible tool.

What problems does it solve?

agent-browser helps you solve common browser-automation jobs:

  • Automating login, navigation, and form workflows
  • Scraping structured content that requires interaction (search forms, filters, pagination)
  • Running agent-driven “testing-like” flows on live sites
  • Recording videos of an automated browsing session for review

It abstracts away direct Playwright scripting and gives the agent a higher-level set of actions using @e element references, which helps keep interactions stable across multiple steps.

When is agent-browser a good fit?

Use agent-browser when:

  • You run agents via inference.sh and need them to browse the web
  • You want Playwright-level reliability without writing Playwright code
  • Your flows consist of opening pages, interacting with elements, and reading results

It may not be a good fit when:

  • You cannot use the infsh CLI or Bash-like tooling
  • You need extremely custom Playwright features beyond what the skill exposes
  • Your use case is purely API-based and does not require a real browser

If you need fine-grained control of browser internals or frameworks beyond what the skill exposes, you might prefer direct Playwright scripts. For typical agent-driven automation, agent-browser provides a simpler, higher-level interface.

How to Use

Prerequisites

Before using agent-browser, make sure you have:

  • An environment where you can run Bash commands
  • The inference.sh CLI (infsh) installed
  • An inference.sh account you can log into from the CLI

The skill’s Quick Start explicitly requires the infsh CLI. You can follow the official CLI install instructions from the repository:

  • CLI install documentation: https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md

Once infsh is installed and configured, you can invoke the agent-browser skill from your terminal or from any agent tooling that can run infsh commands.

Installation and skill activation

agent-browser is distributed as part of the inferen-sh/skills repository. In most inference.sh–based environments you do not need to install a separate npm package; instead, you make sure the skill is available and then call it via infsh.

Typical setup steps:

  1. Install the inference.sh CLI
    • Follow cli-install.md from the repo.
  2. Authenticate
    • Run:
      infsh login
      
    • Follow the prompts to authenticate with inference.sh.
  3. Confirm skill availability
    • Ensure your inference.sh environment has access to the agent-browser app/skill under tools/utilities/agent-browser in the inferen-sh/skills repository.

If you are integrating with a broader “skills” ecosystem that supports npx skills add, you can also wire this repository as a source, but the canonical flow for agent-browser usage is through infsh app run.

Core browser automation workflow

The skill documentation describes a consistent 4-step pattern:

  1. Open – Start a browser session and navigate to a URL.
  2. Interact – Use returned @e element references to click, type, drag, or upload.
  3. Re-snapshot – Request an updated snapshot to get new @e refs after navigation or DOM changes.
  4. Close – End the session; optionally retrieve a video recording if enabled.

This pattern lets your agent maintain a mental model of the page state. Each call passes JSON input and receives structured JSON output, which you feed into your agent’s reasoning loop.

Quick start example

To see agent-browser in action with a simple one-page open, follow the Quick Start pattern from the repo:

infsh login

# Open a page and get interactive elements
infsh app run agent-browser --function open --input '{"url": "https://example.com"}' --session new

What this does:

  • --function open tells agent-browser to launch a new browser page at the given URL.
  • --session new creates a new session so later actions can reuse the same browser state.
  • The skill returns JSON that typically includes element descriptions and @e references your agent can use in follow-up interact calls.

Working with @e element references

A central feature of agent-browser is its use of @e refs. Rather than requiring CSS selectors or XPath, the skill returns handles like @e:button-1 (the exact format depends on the implementation) along with human-readable descriptions.

Your agent then:

  1. Reads the list of available elements and their descriptions.
  2. Chooses the appropriate @e ref (for example, the button labeled “Search”).
  3. Calls an interaction function (such as click or fill) using that @e ref.

This design is optimized for AI agents because they can reason over descriptions rather than low-level DOM details. It also helps keep interactions robust even if the underlying selectors change, as long as descriptions remain interpretable.

Example: open, click, and resnapshot

A typical multi-step flow might look like this (pattern only; adjust to your specific functions):

# 1. Start a session and open a page
OPEN_RESULT=$(infsh app run agent-browser \
  --function open \
  --session new \
  --input '{"url": "https://example.com"}')

# 2. Use OPEN_RESULT to pick an @e ref (e.g. @e:search-button) in your agent logic
# 3. Interact with that element
INTERACT_RESULT=$(infsh app run agent-browser \
  --function interact \
  --session "$INF_SH_SESSION" \
  --input '{"action": "click", "element": "@e:search-button"}')

# 4. Re-snapshot after the click to get updated elements
SNAPSHOT_RESULT=$(infsh app run agent-browser \
  --function snapshot \
  --session "$INF_SH_SESSION" \
  --input '{}')

The names of functions beyond open can vary, so always check the latest SKILL.md and any associated docs in tools/utilities/agent-browser for the exact function signatures and input schema.

Screenshots and video recording

agent-browser can capture visual artifacts of the browsing session:

  • Screenshots – Useful for debugging agent behavior or storing visual confirmations.
  • Video – When you close the session with recording enabled, the skill can return or link to a video file of the full automated flow.

These features are configured through the skill’s input options. For details on enabling recording and accessing the outputs, consult the SKILL.md definition and any additional docs under tools/utilities/agent-browser in the repo.

Integration tips for agents and workflows

To make the most of agent-browser in your automation or research workflows:

  • Persist --session IDs: Make sure your agent stores the session identifier between calls so that multiple actions occur in the same browser.
  • Parse JSON output carefully: Use robust JSON parsing in your agent’s runtime; element lists and metadata can be rich.
  • Throttle interactions if needed: If a page is slow or dynamic, ensure your agent accounts for timing and waits for elements to appear between steps when the skill supports that configuration.
  • Log key actions: Keep a log of open, interact, and close calls if you need to audit or debug your agent’s browsing behavior.

FAQ

What is the relationship between agent-browser, inference.sh, and Playwright?

agent-browser is a skill that runs inside the inference.sh ecosystem. When you invoke it via infsh app run, it uses Playwright as the underlying browser automation engine. You do not call Playwright directly; instead, you work with the higher-level skill functions and @e element references.

How do I install agent-browser?

You do not install agent-browser as a standalone binary or npm package. Instead:

  1. Install the inference.sh CLI (infsh) using the official cli-install.md instructions.
  2. Log in with infsh login.
  3. Ensure your environment has access to the agent-browser skill from the inferen-sh/skills repository (under tools/utilities/agent-browser).

From there, you can immediately invoke the skill via infsh app run agent-browser.

Do I need programming experience to use agent-browser?

Basic command-line and JSON familiarity are strongly recommended. You do not need to write Playwright scripts, but you should be comfortable:

  • Running infsh commands
  • Passing JSON as --input
  • Parsing JSON output in your agent or scripts

For more advanced workflows (conditional logic, loops, error handling), general scripting or programming knowledge is helpful.

Can I use agent-browser outside of inference.sh?

The skill is built specifically for use with inference.sh and is described as “Browser automation for AI agents via inference.sh.” The supported and documented way to run it is through the infsh CLI. If you require a standalone library, you may prefer using Playwright directly in your language of choice.

Is agent-browser suitable for large-scale web scraping?

agent-browser can be used for targeted scraping, especially when pages require interaction or JavaScript rendering. However, for very high-volume scraping at scale, you should consider:

  • inference.sh account limits and pricing
  • Respect for target site terms of service and robots.txt
  • Performance, concurrency, and rate limiting

For smaller-scale or workflow-specific scraping embedded in an agent, agent-browser is a strong fit. For massive crawling across many sites, a dedicated scraping stack may be more appropriate.

How does session management work?

Session management is controlled via the --session flag when calling infsh app run. A typical pattern is:

  • --session new when you call open for the first time
  • Reusing that session ID for subsequent interact and snapshot calls
  • Calling the appropriate close function to end the session and optionally retrieve video

Always consult the current SKILL.md for the exact options and outputs related to session management.

Where can I find the full specification of functions and inputs?

The authoritative reference for agent-browser lives in the repository:

  • SKILL.md at the root of the inferen-sh/skills repo
  • The tools/utilities/agent-browser directory for implementation details, examples, and any additional documentation

Open these files to see the current list of functions, expected JSON inputs, and output formats, then model your agent or scripts around those definitions.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...