firecrawl-agent

by firecrawl

firecrawl-agent helps extract structured JSON from complex, multi-page websites. Learn when to use it, how to run the Firecrawl CLI agent, add schemas, set starting URLs, and save outputs for pricing, products, and directory-style data extraction.

Stars234

Favorites0

Comments0

AddedMar 31, 2026

CategoryWeb Scraping

Install Command

npx skills add firecrawl/cli --skill firecrawl-agent

Curation Score

This skill scores 76/100, which means it is a solid directory listing candidate: agents get clear triggers, example commands, and a concrete output model for autonomous structured website extraction, though adopters should still expect some operational guesswork beyond the basics.

76/100

Strengths

Strong triggerability: the description names explicit use cases like extracting pricing, product listings, directory entries, and JSON-schema-driven website extraction.
Good operational starting point: Quick start examples show real `firecrawl agent` commands with `--wait`, `--schema`, `--urls`, and output files.
Meaningful agent leverage: it clearly positions the skill as more capable than simple scraping for multi-page structured extraction.

Cautions

Install and setup clarity is limited: SKILL.md has no install command and no linked support files or references for prerequisites.
Evidence of deeper workflow guidance is thin: the repository preview shows only one SKILL.md file, with limited constraints and no scripts, rules, or troubleshooting assets.

Scraping JSON-LD Website Websites Cli Automation Ai

Overview

Overview of firecrawl-agent skill

What firecrawl-agent does

The firecrawl-agent skill is for autonomous web data extraction when a normal single-page scrape is not enough. It is designed to navigate a site, decide where relevant information lives, and return structured JSON, especially for jobs like pricing tables, product catalogs, directory entries, and feature lists.

Best fit users

This firecrawl-agent skill is best for people who need usable data rather than raw HTML: operators building datasets, analysts collecting competitor or market information, developers feeding downstream automations, and AI users who want multi-page extraction with a schema instead of ad hoc copy-paste.

The real job to be done

Most users are not looking for “web scraping” in the abstract. They want to answer concrete questions such as:

extract all pricing tiers from a SaaS site
collect product names and prices across many pages
turn a directory into JSON records
gather structured facts without hand-mapping every URL

That is where firecrawl-agent for Web Scraping is meaningfully different from a generic prompt.

Why choose firecrawl-agent over a plain prompt

A normal model prompt can suggest selectors or summarize visible content, but it usually does not provide a robust autonomous extraction workflow across multiple pages. firecrawl-agent is built around that exact use case: give it an extraction goal, optionally give it a schema, and let it navigate and return machine-usable output.

Key tradeoff to know before installing

The upside is less manual page-by-page work. The tradeoff is runtime: the agent can take a few minutes, and output quality depends heavily on how clearly you define the target fields and scope. If your need is only “grab one page quickly,” this may be more than you need.

How to Use firecrawl-agent skill

Install context for firecrawl-agent

The upstream skill allows firecrawl through Bash, including firecrawl agent and npx firecrawl. If you are installing it into a skills-based environment, use:

npx skills add https://github.com/firecrawl/cli --skill firecrawl-agent

In practice, you also need the Firecrawl CLI available in your environment and whatever authentication or setup that CLI requires.

Read this file first

Start with skills/firecrawl-agent/SKILL.md. In this repository, that file contains nearly all of the practical guidance. There are no obvious supporting rules/, resources/, or helper scripts for this skill, so your install decision should mostly hinge on whether the examples and CLI options match your workflow.

Understand the main invocation pattern

The core firecrawl-agent usage pattern is simple:

describe the extraction goal
optionally provide a schema
optionally constrain with starting URLs
wait for the job to finish
save JSON output to a file

Typical examples from the skill:

firecrawl agent "extract all pricing tiers" --wait -o .firecrawl/pricing.json

firecrawl agent "extract products" --schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}' --wait -o .firecrawl/products.json

firecrawl agent "get feature list" --urls "<url>" --wait -o .firecrawl/features.json

What input the skill needs

The firecrawl-agent skill works best when you provide three things clearly:

the extraction objective
the site or starting URLs
the output shape you want

Weak input:

“scrape this site”

Stronger input:

“Extract all pricing tiers from https://example.com/pricing and related plan pages. Return plan name, monthly price, annual price, included seats, and top features as JSON.”

Best input:

“Starting from https://example.com/pricing, extract every current pricing tier visible on the site. Return JSON with plans[] containing name, billing_period, price, currency, seat_limit, features[], and source_url. Ignore blog pages, docs, and historical changelog content.”

When to use a schema

Use --schema when your output must feed code, spreadsheets, validation, or repeatable workflows. A schema matters most when:

field names must stay stable
you need typed values like numbers or arrays
you want fewer ambiguous summaries
you plan to compare outputs across runs or sites

Without a schema, the agent may still work well, but results can be less predictable for downstream automation.

How to turn a rough goal into a good prompt

A strong firecrawl-agent guide prompt usually includes:

target entity type: plans, products, listings, locations
coverage rule: all current items, not examples
exclusions: ignore docs, blog, careers, changelog
normalization: return prices as numbers, one record per item
provenance: include source_url
edge-case policy: if a field is missing, return null

Example:

firecrawl agent "Extract all products from the site. Return JSON with products[] containing name, price, currency, short_description, category, availability, and source_url. Only include live product pages. Ignore blog, support, and policy pages. If price is missing, use null." --urls "https://example.com" --wait -o .firecrawl/products.json

Use starting URLs to reduce drift

If you give no URLs, the agent has more room to decide where to explore. That can be useful, but it also increases the chance of wasted navigation. For better precision, seed likely entry points such as:

pricing pages
product category pages
company directories
marketplace listings

This is one of the highest-leverage improvements for firecrawl-agent install success in real work.

Suggested workflow for reliable extraction

A practical workflow:

run a narrow test on one likely source page
inspect the JSON for missing or merged fields
add a schema and exclusions
expand to broader starting URLs
save outputs to a dedicated folder such as .firecrawl/
validate counts and spot-check source pages

This workflow is faster than starting broad and debugging a noisy result set.

Output handling and file strategy

Use -o to write results to a predictable path. This matters because autonomous extraction jobs are easier to evaluate when outputs are versioned or compared over time. Good examples:

.firecrawl/pricing.json
.firecrawl/products.json
.firecrawl/directory.json

If you are iterating, keep each run's purpose obvious in the filename rather than constantly overwriting a generic output.json.

Practical fit: what it is great at

The firecrawl-agent for Web Scraping use case is strongest when:

the target data spans multiple pages
the site structure is not fully known in advance
you need structured JSON, not prose
hand-authored scraping rules would take longer than the extraction task justifies

Practical misfit: when not to use it

Skip firecrawl-agent if:

you only need one page summarized
exact deterministic selectors are required for compliance-heavy workflows
you already have a stable scraper for a well-known page structure
the website is highly interactive, gated, or dependent on session-specific flows not supported in your environment

firecrawl-agent skill FAQ

Is firecrawl-agent good for beginners?

Yes, if you can already use a CLI and think in terms of output fields. The basic examples are approachable. The main beginner hurdle is not installation syntax; it is knowing how to specify a complete extraction target instead of asking vaguely.

What makes firecrawl-agent different from ordinary AI prompting?

Ordinary prompts often stop at analysis or ad hoc page content. firecrawl-agent usage is built around autonomous site navigation plus structured extraction. That combination is the reason to use the skill rather than a generic “summarize this website” request.

Do I always need a JSON schema?

No. For exploratory work, a plain extraction request may be enough. But if you need consistency across runs, automation, or clean typed fields, a schema is usually worth the extra minute.

How long does firecrawl-agent take?

The skill notes that autonomous extraction can take around 2 to 5 minutes. Expect longer jobs than a simple single-page scrape, especially when the site has many relevant pages.

Can firecrawl-agent extract pricing, products, or directories?

Yes. Those are exactly the examples the skill is positioned for: pricing tiers, product listings, directory-style entries, and other structured records spread across a website.

Is firecrawl-agent the right choice for every scraping job?

No. If the task is trivial, deterministic, or already covered by a conventional scraper, this skill may be unnecessary. It is most valuable when discovery and navigation are part of the problem.

How to Improve firecrawl-agent skill

Give firecrawl-agent a clearer extraction contract

The biggest quality jump usually comes from upgrading the prompt from “extract data” to a contract with:

exact fields
inclusion rules
exclusion rules
null handling
source URL capture

That reduces hallucinated structure and makes results easier to trust.

Constrain scope before you expand it

Many poor runs come from starting at the domain root with a loose goal. Improve output by beginning with one or two high-signal URLs, confirming field quality, then broadening coverage only after the schema and prompt are working.

Ask for provenance in every record

If you want to review or debug results, ask for source_url per item. This single field makes the firecrawl-agent guide workflow much easier because you can quickly verify whether extracted records came from the right pages.

Normalize fields that commonly vary

Tell the agent how to handle messy real-world variations:

numbers vs strings for price
monthly vs annual billing
arrays for feature lists
null for missing fields
one record per product or plan

These instructions materially improve machine-readability.

Watch for common failure modes

Typical issues include:

mixed page types in one dataset
duplicate records from variant pages
feature summaries merged into one blob
prices captured as text fragments instead of numeric values
partial site coverage because the starting point was too broad or too weak

Most of these are fixed by stronger scope and schema design, not by rerunning the same vague command.

Iterate based on output defects, not just missing volume

If the first run is wrong, do not only ask for “more pages.” First identify the defect:

wrong fields
wrong page classes
duplicates
missing normalization
incomplete coverage

Then revise the prompt directly around that defect. This is the fastest way to improve firecrawl-agent results.

A strong revision pattern

A useful second-pass prompt pattern is:

keep the same goal
add exclusions
tighten field definitions
request provenance
define how to handle missing values

Example revision:

first run: “extract all pricing tiers”
second run: “Extract all current pricing tiers from pricing and plan pages only. Ignore docs, blog, changelog, and legacy pages. Return plans[] with name, price, currency, billing_period, features[], and source_url. Use null when a field is not present.”

Improve install decisions by checking one thing first

Before adopting the firecrawl-agent skill, ask whether your real bottleneck is navigation discovery or extraction formatting. If it is navigation discovery across multi-page sites, this skill is a strong fit. If not, a simpler scrape or one-page extraction tool may be faster and easier to maintain.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

huggingface-datasets

by huggingface

Use the huggingface-datasets skill for Hugging Face Dataset Viewer API workflows to validate datasets, resolve splits, preview and paginate rows, search text, apply filters, and fetch parquet links or statistics. It is a practical huggingface-datasets guide for read-only dataset exploration.

Web Scraping

Favorites 0GitHub 10.4k

data-scraper-agent

by affaan-m

data-scraper-agent helps build a repeatable public-data pipeline for web scraping, enrichment, and storage. It is designed for monitoring jobs, prices, news, repos, sports, and listings on a schedule using GitHub Actions, with outputs to Notion, Sheets, or Supabase. Best for ongoing tracking, not one-off extractions.

Web Scraping

Favorites 0GitHub 156.1k

baoyu-url-to-markdown

by JimLiu

baoyu-url-to-markdown converts live URLs to Markdown with a vendored baoyu-fetch CLI using Chrome CDP, site adapters, and generic fallback. Review Bun runtime needs, first-time EXTEND.md setup, and usage for X, YouTube, Hacker News, and rendered pages.

Format Conversion

Favorites 0GitHub 13.2k

x-twitter-scraper

by Xquik-dev

Use x-twitter-scraper to retrieve X (Twitter) data and confirmation-gated actions through Xquik. It supports tweet search, user lookup, follower extraction, media download, monitors, webhooks, MCP, and write actions. Best for web scraping-style research with an API key, not X login secrets.

Web Scraping

Favorites 0GitHub 71

exa-search

by K-Dense-AI

exa-search is a web research skill powered by Exa for finding current information and extracting content from URLs. Use it for search, source discovery, article and PDF extraction, and technical or scientific research with semantic retrieval, academic-style filtering, and clear install and usage guidance.

Web Research

Favorites 0GitHub 0

browser-use

by browser-use

browser-use is a browser automation skill for opening pages, inspecting state, clicking indexed elements, typing into fields, taking screenshots, and reusing a persistent browser session. Use it for reliable form filling, navigation, and logged-in workflows with the browser-use CLI.

Browser Automation

Favorites 0GitHub 84.9k

remote-browser

by browser-use

remote-browser helps sandboxed agents control a headless browser for Browser Automation. Use it to open pages, inspect state, click indexed elements, type input, take screenshots, and connect to local apps or CDP-backed browser sessions.

Browser Automation

Favorites 0GitHub 84.9k

firecrawl

by firecrawl

firecrawl skill for installing, authenticating, and using the official Firecrawl CLI for web scraping, search, crawling, and page interaction. Learn setup, `firecrawl --status`, login, safe file output to `.firecrawl/`, and practical usage patterns backed by the repo.

Web Scraping

Favorites 0GitHub 234

firecrawl-search

by firecrawl

firecrawl-search is a web research skill for finding sources, running structured search, and optionally scraping full page content as JSON with Firecrawl CLI.

Web Research

Favorites 0GitHub 234

parallel-web

by K-Dense-AI

parallel-web is a web research and extraction skill powered by parallel-cli. It helps you search the web, extract URL content, enrich data from sources, and run deeper research with academic and scientific sources prioritized. Use it for parallel-web usage, web research, citations, and evidence-first workflows.

Web Research

Favorites 0GitHub 0

geomaster

by K-Dense-AI

geomaster is a geospatial science skill for GIS, remote sensing, spatial analysis, and Earth observation workflows. Use it for Data Analysis tasks like raster and vector operations, satellite imagery processing, spatial metrics, and workflow planning. The geomaster guide helps you install, inspect, and apply the skill with less guesswork.

Data Analysis

Favorites 0GitHub 0

asc-aso-audit

by rudrankriyam

asc-aso-audit helps you run an offline ASO audit on canonical App Store metadata in `./metadata`, then surface keyword gaps with Astro MCP. Use the asc-aso-audit skill after `asc metadata pull` to review `subtitle`, `keywords`, `description`, and `whatsNew` with less guesswork.

Data Analysis

Favorites 0GitHub 0

ffuf-web-fuzzing

by jthack

ffuf-web-fuzzing is a practical skill for discovering hidden web content, testing routes and parameters, and fuzzing authenticated targets with raw requests, auto-calibration, and result analysis. It fits security testers who need a repeatable ffuf-web-fuzzing guide for penetration testing and Security Audit workflows.

Security Audit

Favorites 0GitHub 0

web-to-markdown

by softaworks

web-to-markdown is a Format Conversion skill that turns live web pages into clean Markdown through the local web2md CLI, using a Chromium-family browser for JS-rendered pages, interactive flows, and batch URL conversion. It only runs when explicitly invoked by name.

Format Conversion

Favorites 0GitHub 1.3k

firecrawl-map

by firecrawl

firecrawl-map helps agents discover and list URLs on a site, with options for search filtering, limits, JSON output, sitemap modes, and subdomain control before deeper scraping or crawling.

Web Scraping

Favorites 0GitHub 234

firecrawl-crawl

by firecrawl

firecrawl-crawl helps agents bulk extract content from a website or docs section with path filters, depth limits, page caps, wait mode, and job status checks.

Web Scraping

Favorites 0GitHub 234