F

firecrawl-agent

by firecrawl

firecrawl-agent helps extract structured JSON from complex, multi-page websites. Learn when to use it, how to run the Firecrawl CLI agent, add schemas, set starting URLs, and save outputs for pricing, products, and directory-style data extraction.

Stars234
Favorites0
Comments0
AddedMar 31, 2026
CategoryWeb Scraping
Install Command
npx skills add firecrawl/cli --skill firecrawl-agent
Curation Score

This skill scores 76/100, which means it is a solid directory listing candidate: agents get clear triggers, example commands, and a concrete output model for autonomous structured website extraction, though adopters should still expect some operational guesswork beyond the basics.

76/100
Strengths
  • Strong triggerability: the description names explicit use cases like extracting pricing, product listings, directory entries, and JSON-schema-driven website extraction.
  • Good operational starting point: Quick start examples show real `firecrawl agent` commands with `--wait`, `--schema`, `--urls`, and output files.
  • Meaningful agent leverage: it clearly positions the skill as more capable than simple scraping for multi-page structured extraction.
Cautions
  • Install and setup clarity is limited: SKILL.md has no install command and no linked support files or references for prerequisites.
  • Evidence of deeper workflow guidance is thin: the repository preview shows only one SKILL.md file, with limited constraints and no scripts, rules, or troubleshooting assets.
Overview

Overview of firecrawl-agent skill

What firecrawl-agent does

The firecrawl-agent skill is for autonomous web data extraction when a normal single-page scrape is not enough. It is designed to navigate a site, decide where relevant information lives, and return structured JSON, especially for jobs like pricing tables, product catalogs, directory entries, and feature lists.

Best fit users

This firecrawl-agent skill is best for people who need usable data rather than raw HTML: operators building datasets, analysts collecting competitor or market information, developers feeding downstream automations, and AI users who want multi-page extraction with a schema instead of ad hoc copy-paste.

The real job to be done

Most users are not looking for “web scraping” in the abstract. They want to answer concrete questions such as:

  • extract all pricing tiers from a SaaS site
  • collect product names and prices across many pages
  • turn a directory into JSON records
  • gather structured facts without hand-mapping every URL

That is where firecrawl-agent for Web Scraping is meaningfully different from a generic prompt.

Why choose firecrawl-agent over a plain prompt

A normal model prompt can suggest selectors or summarize visible content, but it usually does not provide a robust autonomous extraction workflow across multiple pages. firecrawl-agent is built around that exact use case: give it an extraction goal, optionally give it a schema, and let it navigate and return machine-usable output.

Key tradeoff to know before installing

The upside is less manual page-by-page work. The tradeoff is runtime: the agent can take a few minutes, and output quality depends heavily on how clearly you define the target fields and scope. If your need is only “grab one page quickly,” this may be more than you need.

How to Use firecrawl-agent skill

Install context for firecrawl-agent

The upstream skill allows firecrawl through Bash, including firecrawl agent and npx firecrawl. If you are installing it into a skills-based environment, use:

npx skills add https://github.com/firecrawl/cli --skill firecrawl-agent

In practice, you also need the Firecrawl CLI available in your environment and whatever authentication or setup that CLI requires.

Read this file first

Start with skills/firecrawl-agent/SKILL.md. In this repository, that file contains nearly all of the practical guidance. There are no obvious supporting rules/, resources/, or helper scripts for this skill, so your install decision should mostly hinge on whether the examples and CLI options match your workflow.

Understand the main invocation pattern

The core firecrawl-agent usage pattern is simple:

  1. describe the extraction goal
  2. optionally provide a schema
  3. optionally constrain with starting URLs
  4. wait for the job to finish
  5. save JSON output to a file

Typical examples from the skill:

firecrawl agent "extract all pricing tiers" --wait -o .firecrawl/pricing.json
firecrawl agent "extract products" --schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}' --wait -o .firecrawl/products.json
firecrawl agent "get feature list" --urls "<url>" --wait -o .firecrawl/features.json

What input the skill needs

The firecrawl-agent skill works best when you provide three things clearly:

  • the extraction objective
  • the site or starting URLs
  • the output shape you want

Weak input:

  • “scrape this site”

Stronger input:

  • “Extract all pricing tiers from https://example.com/pricing and related plan pages. Return plan name, monthly price, annual price, included seats, and top features as JSON.”

Best input:

  • “Starting from https://example.com/pricing, extract every current pricing tier visible on the site. Return JSON with plans[] containing name, billing_period, price, currency, seat_limit, features[], and source_url. Ignore blog pages, docs, and historical changelog content.”

When to use a schema

Use --schema when your output must feed code, spreadsheets, validation, or repeatable workflows. A schema matters most when:

  • field names must stay stable
  • you need typed values like numbers or arrays
  • you want fewer ambiguous summaries
  • you plan to compare outputs across runs or sites

Without a schema, the agent may still work well, but results can be less predictable for downstream automation.

How to turn a rough goal into a good prompt

A strong firecrawl-agent guide prompt usually includes:

  • target entity type: plans, products, listings, locations
  • coverage rule: all current items, not examples
  • exclusions: ignore docs, blog, careers, changelog
  • normalization: return prices as numbers, one record per item
  • provenance: include source_url
  • edge-case policy: if a field is missing, return null

Example:

firecrawl agent "Extract all products from the site. Return JSON with products[] containing name, price, currency, short_description, category, availability, and source_url. Only include live product pages. Ignore blog, support, and policy pages. If price is missing, use null." --urls "https://example.com" --wait -o .firecrawl/products.json

Use starting URLs to reduce drift

If you give no URLs, the agent has more room to decide where to explore. That can be useful, but it also increases the chance of wasted navigation. For better precision, seed likely entry points such as:

  • pricing pages
  • product category pages
  • company directories
  • marketplace listings

This is one of the highest-leverage improvements for firecrawl-agent install success in real work.

Suggested workflow for reliable extraction

A practical workflow:

  1. run a narrow test on one likely source page
  2. inspect the JSON for missing or merged fields
  3. add a schema and exclusions
  4. expand to broader starting URLs
  5. save outputs to a dedicated folder such as .firecrawl/
  6. validate counts and spot-check source pages

This workflow is faster than starting broad and debugging a noisy result set.

Output handling and file strategy

Use -o to write results to a predictable path. This matters because autonomous extraction jobs are easier to evaluate when outputs are versioned or compared over time. Good examples:

  • .firecrawl/pricing.json
  • .firecrawl/products.json
  • .firecrawl/directory.json

If you are iterating, keep each run's purpose obvious in the filename rather than constantly overwriting a generic output.json.

Practical fit: what it is great at

The firecrawl-agent for Web Scraping use case is strongest when:

  • the target data spans multiple pages
  • the site structure is not fully known in advance
  • you need structured JSON, not prose
  • hand-authored scraping rules would take longer than the extraction task justifies

Practical misfit: when not to use it

Skip firecrawl-agent if:

  • you only need one page summarized
  • exact deterministic selectors are required for compliance-heavy workflows
  • you already have a stable scraper for a well-known page structure
  • the website is highly interactive, gated, or dependent on session-specific flows not supported in your environment

firecrawl-agent skill FAQ

Is firecrawl-agent good for beginners?

Yes, if you can already use a CLI and think in terms of output fields. The basic examples are approachable. The main beginner hurdle is not installation syntax; it is knowing how to specify a complete extraction target instead of asking vaguely.

What makes firecrawl-agent different from ordinary AI prompting?

Ordinary prompts often stop at analysis or ad hoc page content. firecrawl-agent usage is built around autonomous site navigation plus structured extraction. That combination is the reason to use the skill rather than a generic “summarize this website” request.

Do I always need a JSON schema?

No. For exploratory work, a plain extraction request may be enough. But if you need consistency across runs, automation, or clean typed fields, a schema is usually worth the extra minute.

How long does firecrawl-agent take?

The skill notes that autonomous extraction can take around 2 to 5 minutes. Expect longer jobs than a simple single-page scrape, especially when the site has many relevant pages.

Can firecrawl-agent extract pricing, products, or directories?

Yes. Those are exactly the examples the skill is positioned for: pricing tiers, product listings, directory-style entries, and other structured records spread across a website.

Is firecrawl-agent the right choice for every scraping job?

No. If the task is trivial, deterministic, or already covered by a conventional scraper, this skill may be unnecessary. It is most valuable when discovery and navigation are part of the problem.

How to Improve firecrawl-agent skill

Give firecrawl-agent a clearer extraction contract

The biggest quality jump usually comes from upgrading the prompt from “extract data” to a contract with:

  • exact fields
  • inclusion rules
  • exclusion rules
  • null handling
  • source URL capture

That reduces hallucinated structure and makes results easier to trust.

Constrain scope before you expand it

Many poor runs come from starting at the domain root with a loose goal. Improve output by beginning with one or two high-signal URLs, confirming field quality, then broadening coverage only after the schema and prompt are working.

Ask for provenance in every record

If you want to review or debug results, ask for source_url per item. This single field makes the firecrawl-agent guide workflow much easier because you can quickly verify whether extracted records came from the right pages.

Normalize fields that commonly vary

Tell the agent how to handle messy real-world variations:

  • numbers vs strings for price
  • monthly vs annual billing
  • arrays for feature lists
  • null for missing fields
  • one record per product or plan

These instructions materially improve machine-readability.

Watch for common failure modes

Typical issues include:

  • mixed page types in one dataset
  • duplicate records from variant pages
  • feature summaries merged into one blob
  • prices captured as text fragments instead of numeric values
  • partial site coverage because the starting point was too broad or too weak

Most of these are fixed by stronger scope and schema design, not by rerunning the same vague command.

Iterate based on output defects, not just missing volume

If the first run is wrong, do not only ask for “more pages.” First identify the defect:

  • wrong fields
  • wrong page classes
  • duplicates
  • missing normalization
  • incomplete coverage

Then revise the prompt directly around that defect. This is the fastest way to improve firecrawl-agent results.

A strong revision pattern

A useful second-pass prompt pattern is:

  • keep the same goal
  • add exclusions
  • tighten field definitions
  • request provenance
  • define how to handle missing values

Example revision:

  • first run: “extract all pricing tiers”
  • second run: “Extract all current pricing tiers from pricing and plan pages only. Ignore docs, blog, changelog, and legacy pages. Return plans[] with name, price, currency, billing_period, features[], and source_url. Use null when a field is not present.”

Improve install decisions by checking one thing first

Before adopting the firecrawl-agent skill, ask whether your real bottleneck is navigation discovery or extraction formatting. If it is navigation discovery across multi-page sites, this skill is a strong fit. If not, a simpler scrape or one-page extraction tool may be faster and easier to maintain.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...