firecrawl-crawl

by firecrawl

firecrawl-crawl helps agents bulk extract content from a website or docs section with path filters, depth limits, page caps, wait mode, and job status checks.

Stars234

Favorites0

Comments0

AddedMar 31, 2026

CategoryWeb Scraping

Install Command

npx skills add firecrawl/cli --skill firecrawl-crawl

Curation Score

This skill scores 74/100, which means it is listable and likely useful for agents that need site-wide or section-wide content extraction, but directory users should expect a fairly command-focused guide rather than a deeply supported workflow package. The repository evidence shows strong trigger cues and practical CLI examples for crawling with limits, depth, and path filters, which gives agents more reliable execution guidance than a generic prompt.

74/100

Strengths

Strong triggerability: the description explicitly names crawl-style intents like "get all the pages," "/docs," and "bulk extract."
Operationally usable: SKILL.md includes concrete `firecrawl crawl` examples for section crawling, depth-limited crawling, and checking a running crawl job.
Good agent leverage for a common workflow: it documents key controls like `--include-paths`, `--limit`, `--max-depth`, `--wait`, and `--progress` for bulk extraction tasks.

Cautions

Limited install-decision context: there is no install command in SKILL.md and no support files, references, or metadata to help users assess setup requirements.
Workflow depth appears modest: structural signals show workflow examples, but little evidence of constraints, edge-case handling, or troubleshooting guidance.

Scraping Websites Website Cli Firecrawl Workflow

Overview

Overview of firecrawl-crawl skill

What firecrawl-crawl does

The firecrawl-crawl skill is for bulk website extraction, not single-page scraping. It helps an agent crawl a site or a specific section, follow links, and return content from many pages in one job. If your goal is “get all docs pages,” “extract everything under /docs,” or “crawl this help center up to depth 3,” this is the right tool.

Who should use firecrawl-crawl

The best fit for firecrawl-crawl is anyone doing multi-page content collection for documentation analysis, migration, indexing, QA, research, or knowledge ingestion. It is especially useful when a normal prompt would be too manual because the target content spans dozens of linked pages on the same domain.

The real job-to-be-done

Users adopt firecrawl-crawl when they need coverage, not just accuracy on one URL. The main job is to define a crawl boundary clearly enough that the tool gathers the right pages without wasting time on irrelevant sections, duplicates, or the entire public site.

What makes this skill different

The main differentiators are practical crawl controls: path filtering, depth limits, page limits, asynchronous job handling, and optional wait/progress behavior. That makes firecrawl-crawl for Web Scraping more operational than a generic “scrape this site” instruction.

When this skill is a strong match

Use the firecrawl-crawl skill when:

you need many pages from one site
pages are discoverable through internal links
you want to constrain scope with /docs, /blog, or similar paths
you need a repeatable crawl command rather than ad hoc prompting

When not to use it

Do not start with firecrawl-crawl if you only need one page, need a URL inventory first, or are still unsure which section matters. In those cases, simpler search, scrape, or map steps are usually better before escalating to crawl.

How to Use firecrawl-crawl skill

Install context for firecrawl-crawl

This skill lives in the firecrawl/cli skill set and is meant to be invoked through the Firecrawl CLI tooling. If your environment supports Skills, the practical install pattern is:

npx skills add https://github.com/firecrawl/cli --skill firecrawl-crawl

You also need the Firecrawl CLI available so the agent can run commands such as firecrawl crawl or npx firecrawl crawl.

Read this file first

Start with skills/firecrawl-crawl/SKILL.md. For this skill, that file contains most of the operational value: when to use it, quick-start commands, and the key options that control crawl scope and runtime behavior.

Core command patterns

The repository shows three key firecrawl-crawl usage patterns:

# Crawl a docs section
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json

# Full crawl with depth limit
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json

# Check status of a running crawl
firecrawl crawl <job-id>

These cover most real workflows: constrained section crawl, broader site crawl with depth control, and polling an existing job.

Inputs that matter most

To get good results from firecrawl-crawl, provide:

a clean starting URL
the intended site section, if any
a sensible page cap with --limit
a depth cap with --max-depth when the site is broad
whether you want synchronous completion via --wait
an output path so results are easy to inspect later

The biggest quality lever is crawl scope. A good boundary usually matters more than any downstream processing.

Turn a rough request into a strong prompt

Weak request:

“Crawl this website and get everything.”

Stronger request:

“Use firecrawl-crawl on https://example.com, restrict to /docs, cap at 50 pages, wait for completion, save output to .firecrawl/crawl.json, and summarize the main product setup pages after extraction.”

Why this works:

it names the skill
it gives a start URL
it constrains the path
it limits cost and runtime
it states what should happen after crawl completion

Best first-run workflow

A practical firecrawl-crawl guide for first use:

Choose the narrowest useful start URL.
Add --include-paths if you only need a section.
Set --limit conservatively on the first pass.
Add --max-depth if the site has many branches.
Use --wait for simple runs, or submit and check the job later for larger crawls.
Save output with -o so you can review what actually got collected.

This sequence reduces wasted crawls and makes it easier to refine boundaries after the first result.

Scope controls that prevent bad crawls

The most important options surfaced in the skill are:

--include-paths to keep the crawl in the right section
--limit <n> to prevent runaway page counts
--max-depth <n> to stop overly deep traversal
--wait to block until completion
--progress to inspect progress during waiting

If you skip these, a crawl can become too broad faster than expected, especially on docs sites with changelogs, blog links, or cross-linked navigation.

Async vs wait mode

Use --wait when you want a single workflow step and the crawl should finish now. Skip it when the crawl may take longer and you prefer a job-based workflow. The repo explicitly supports checking status later with firecrawl crawl <job-id>, which is useful for larger jobs or agent workflows that separate submission from analysis.

Output handling and review

Always write to a file on serious runs, for example:

firecrawl crawl "https://example.com" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json

This makes post-run review easier. Before asking the agent to summarize or transform the results, verify that the output contains the intended section and page count. Bad crawl boundaries lead to bad downstream synthesis.

Good firecrawl-crawl usage patterns

High-value uses include:

collecting all docs pages for a product comparison
pulling a help center section for internal search or RAG prep
extracting a migration guide cluster before rewriting documentation
bulk-scraping a known site section where links already connect the relevant pages

These are much better fits than “find anything interesting on this domain.”

firecrawl-crawl skill FAQ

Is firecrawl-crawl beginner-friendly?

Yes, if you already understand the difference between one-page scraping and multi-page crawling. The command surface is small, but beginners should start with a narrow path and low page limit to avoid oversized runs.

What is the difference between firecrawl-crawl and an ordinary prompt?

A plain prompt can describe the goal, but firecrawl-crawl gives the agent a defined operational path: submit a crawl job, control depth and limits, optionally wait, and save structured output. That lowers guesswork and makes repeated runs more consistent.

When should I use firecrawl-crawl instead of scrape?

Use firecrawl-crawl when the target content spans many linked pages. Use scrape when you only need one known URL. If you are not yet sure which pages matter, map or search may be a better earlier step than crawl.

Is firecrawl-crawl good for full-site extraction?

Sometimes, but only if you can tolerate broad coverage and have good limits. For large sites, “full site” is often a bad first run. A docs subsection crawl is usually more practical than starting at the homepage with loose controls.

Does firecrawl-crawl work well for docs sections?

Yes. The repository examples explicitly highlight section-based extraction such as /docs, which is one of the strongest use cases for firecrawl-crawl for Web Scraping.

What can block good results?

The usual blockers are vague scope, missing path filters, no page cap, and starting from the wrong URL. These are not minor details; they directly decide whether the output is useful or noisy.

How to Improve firecrawl-crawl skill

Give tighter crawl boundaries

The fastest way to improve firecrawl-crawl output is to define the crawl boundary precisely. Name the start URL, section path, page cap, and desired depth. “Crawl the docs under /docs up to 2 levels deep” is much better than “crawl the site.”

Start small, then expand

For better adoption and fewer wasted runs, do a small validation crawl first:

low --limit
narrow --include-paths
moderate --max-depth

If the output looks right, expand the limit. This catches scope errors before they become expensive or slow.

Write prompts that include the post-crawl task

firecrawl-crawl install is only part of success. Also tell the agent what to do after extraction. Example:

“Use firecrawl-crawl to extract /docs up to 50 pages, save to .firecrawl/crawl.json, then identify onboarding, auth, and API reference pages.”

This improves end-to-end usefulness because the crawl and analysis are aligned from the start.

Avoid common failure modes

Common issues with the firecrawl-crawl skill:

starting at the homepage when only one section is needed
omitting --limit on a large site
omitting --max-depth when navigation is dense
forgetting -o and losing an easy review point
asking for “everything” without defining business relevance

Iterate based on output, not assumptions

After the first run, inspect what was actually collected. If irrelevant pages dominate, tighten --include-paths or lower depth. If important pages are missing, increase depth or start from a more relevant entry point. The best firecrawl-crawl guide is iterative: crawl, inspect, refine, rerun.

Keep firecrawl-crawl in the right role

Use firecrawl-crawl for collection, then hand off to summarization, classification, comparison, or indexing steps. Trying to make the crawl step solve every downstream task at once usually reduces clarity. The skill is strongest when it gathers the right corpus first.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

huggingface-datasets

by huggingface

Use the huggingface-datasets skill for Hugging Face Dataset Viewer API workflows to validate datasets, resolve splits, preview and paginate rows, search text, apply filters, and fetch parquet links or statistics. It is a practical huggingface-datasets guide for read-only dataset exploration.

Web Scraping

Favorites 0GitHub 10.4k

data-scraper-agent

by affaan-m

data-scraper-agent helps build a repeatable public-data pipeline for web scraping, enrichment, and storage. It is designed for monitoring jobs, prices, news, repos, sports, and listings on a schedule using GitHub Actions, with outputs to Notion, Sheets, or Supabase. Best for ongoing tracking, not one-off extractions.

Web Scraping

Favorites 0GitHub 156.1k

baoyu-url-to-markdown

by JimLiu

baoyu-url-to-markdown converts live URLs to Markdown with a vendored baoyu-fetch CLI using Chrome CDP, site adapters, and generic fallback. Review Bun runtime needs, first-time EXTEND.md setup, and usage for X, YouTube, Hacker News, and rendered pages.

Format Conversion

Favorites 0GitHub 13.2k

x-twitter-scraper

by Xquik-dev

Use x-twitter-scraper to retrieve X (Twitter) data and confirmation-gated actions through Xquik. It supports tweet search, user lookup, follower extraction, media download, monitors, webhooks, MCP, and write actions. Best for web scraping-style research with an API key, not X login secrets.

Web Scraping

Favorites 0GitHub 71

exa-search

by K-Dense-AI

exa-search is a web research skill powered by Exa for finding current information and extracting content from URLs. Use it for search, source discovery, article and PDF extraction, and technical or scientific research with semantic retrieval, academic-style filtering, and clear install and usage guidance.

Web Research

Favorites 0GitHub 0

browser-use

by browser-use

browser-use is a browser automation skill for opening pages, inspecting state, clicking indexed elements, typing into fields, taking screenshots, and reusing a persistent browser session. Use it for reliable form filling, navigation, and logged-in workflows with the browser-use CLI.

Browser Automation

Favorites 0GitHub 84.9k

remote-browser

by browser-use

remote-browser helps sandboxed agents control a headless browser for Browser Automation. Use it to open pages, inspect state, click indexed elements, type input, take screenshots, and connect to local apps or CDP-backed browser sessions.

Browser Automation

Favorites 0GitHub 84.9k

firecrawl

by firecrawl

firecrawl skill for installing, authenticating, and using the official Firecrawl CLI for web scraping, search, crawling, and page interaction. Learn setup, `firecrawl --status`, login, safe file output to `.firecrawl/`, and practical usage patterns backed by the repo.

Web Scraping

Favorites 0GitHub 234

firecrawl-search

by firecrawl

firecrawl-search is a web research skill for finding sources, running structured search, and optionally scraping full page content as JSON with Firecrawl CLI.

Web Research

Favorites 0GitHub 234

parallel-web

by K-Dense-AI

parallel-web is a web research and extraction skill powered by parallel-cli. It helps you search the web, extract URL content, enrich data from sources, and run deeper research with academic and scientific sources prioritized. Use it for parallel-web usage, web research, citations, and evidence-first workflows.

Web Research

Favorites 0GitHub 0

geomaster

by K-Dense-AI

geomaster is a geospatial science skill for GIS, remote sensing, spatial analysis, and Earth observation workflows. Use it for Data Analysis tasks like raster and vector operations, satellite imagery processing, spatial metrics, and workflow planning. The geomaster guide helps you install, inspect, and apply the skill with less guesswork.

Data Analysis

Favorites 0GitHub 0

asc-aso-audit

by rudrankriyam

asc-aso-audit helps you run an offline ASO audit on canonical App Store metadata in `./metadata`, then surface keyword gaps with Astro MCP. Use the asc-aso-audit skill after `asc metadata pull` to review `subtitle`, `keywords`, `description`, and `whatsNew` with less guesswork.

Data Analysis

Favorites 0GitHub 0

ffuf-web-fuzzing

by jthack

ffuf-web-fuzzing is a practical skill for discovering hidden web content, testing routes and parameters, and fuzzing authenticated targets with raw requests, auto-calibration, and result analysis. It fits security testers who need a repeatable ffuf-web-fuzzing guide for penetration testing and Security Audit workflows.

Security Audit

Favorites 0GitHub 0

web-to-markdown

by softaworks

web-to-markdown is a Format Conversion skill that turns live web pages into clean Markdown through the local web2md CLI, using a Chromium-family browser for JS-rendered pages, interactive flows, and batch URL conversion. It only runs when explicitly invoked by name.

Format Conversion

Favorites 0GitHub 1.3k

firecrawl-agent

by firecrawl

firecrawl-agent helps extract structured JSON from complex, multi-page websites. Learn when to use it, how to run the Firecrawl CLI agent, add schemas, set starting URLs, and save outputs for pricing, products, and directory-style data extraction.

Web Scraping

Favorites 0GitHub 234

firecrawl-map

by firecrawl

firecrawl-map helps agents discover and list URLs on a site, with options for search filtering, limits, JSON output, sitemap modes, and subdomain control before deeper scraping or crawling.

Web Scraping

Favorites 0GitHub 234