firecrawl-crawl
by firecrawlfirecrawl-crawl helps agents bulk extract content from a website or docs section with path filters, depth limits, page caps, wait mode, and job status checks.
This skill scores 74/100, which means it is listable and likely useful for agents that need site-wide or section-wide content extraction, but directory users should expect a fairly command-focused guide rather than a deeply supported workflow package. The repository evidence shows strong trigger cues and practical CLI examples for crawling with limits, depth, and path filters, which gives agents more reliable execution guidance than a generic prompt.
- Strong triggerability: the description explicitly names crawl-style intents like "get all the pages," "/docs," and "bulk extract."
- Operationally usable: SKILL.md includes concrete `firecrawl crawl` examples for section crawling, depth-limited crawling, and checking a running crawl job.
- Good agent leverage for a common workflow: it documents key controls like `--include-paths`, `--limit`, `--max-depth`, `--wait`, and `--progress` for bulk extraction tasks.
- Limited install-decision context: there is no install command in SKILL.md and no support files, references, or metadata to help users assess setup requirements.
- Workflow depth appears modest: structural signals show workflow examples, but little evidence of constraints, edge-case handling, or troubleshooting guidance.
Overview of firecrawl-crawl skill
What firecrawl-crawl does
The firecrawl-crawl skill is for bulk website extraction, not single-page scraping. It helps an agent crawl a site or a specific section, follow links, and return content from many pages in one job. If your goal is “get all docs pages,” “extract everything under /docs,” or “crawl this help center up to depth 3,” this is the right tool.
Who should use firecrawl-crawl
The best fit for firecrawl-crawl is anyone doing multi-page content collection for documentation analysis, migration, indexing, QA, research, or knowledge ingestion. It is especially useful when a normal prompt would be too manual because the target content spans dozens of linked pages on the same domain.
The real job-to-be-done
Users adopt firecrawl-crawl when they need coverage, not just accuracy on one URL. The main job is to define a crawl boundary clearly enough that the tool gathers the right pages without wasting time on irrelevant sections, duplicates, or the entire public site.
What makes this skill different
The main differentiators are practical crawl controls: path filtering, depth limits, page limits, asynchronous job handling, and optional wait/progress behavior. That makes firecrawl-crawl for Web Scraping more operational than a generic “scrape this site” instruction.
When this skill is a strong match
Use the firecrawl-crawl skill when:
- you need many pages from one site
- pages are discoverable through internal links
- you want to constrain scope with
/docs,/blog, or similar paths - you need a repeatable crawl command rather than ad hoc prompting
When not to use it
Do not start with firecrawl-crawl if you only need one page, need a URL inventory first, or are still unsure which section matters. In those cases, simpler search, scrape, or map steps are usually better before escalating to crawl.
How to Use firecrawl-crawl skill
Install context for firecrawl-crawl
This skill lives in the firecrawl/cli skill set and is meant to be invoked through the Firecrawl CLI tooling. If your environment supports Skills, the practical install pattern is:
npx skills add https://github.com/firecrawl/cli --skill firecrawl-crawl
You also need the Firecrawl CLI available so the agent can run commands such as firecrawl crawl or npx firecrawl crawl.
Read this file first
Start with skills/firecrawl-crawl/SKILL.md. For this skill, that file contains most of the operational value: when to use it, quick-start commands, and the key options that control crawl scope and runtime behavior.
Core command patterns
The repository shows three key firecrawl-crawl usage patterns:
# Crawl a docs section
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json
# Full crawl with depth limit
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json
# Check status of a running crawl
firecrawl crawl <job-id>
These cover most real workflows: constrained section crawl, broader site crawl with depth control, and polling an existing job.
Inputs that matter most
To get good results from firecrawl-crawl, provide:
- a clean starting URL
- the intended site section, if any
- a sensible page cap with
--limit - a depth cap with
--max-depthwhen the site is broad - whether you want synchronous completion via
--wait - an output path so results are easy to inspect later
The biggest quality lever is crawl scope. A good boundary usually matters more than any downstream processing.
Turn a rough request into a strong prompt
Weak request:
- “Crawl this website and get everything.”
Stronger request:
- “Use
firecrawl-crawlonhttps://example.com, restrict to/docs, cap at 50 pages, wait for completion, save output to.firecrawl/crawl.json, and summarize the main product setup pages after extraction.”
Why this works:
- it names the skill
- it gives a start URL
- it constrains the path
- it limits cost and runtime
- it states what should happen after crawl completion
Best first-run workflow
A practical firecrawl-crawl guide for first use:
- Choose the narrowest useful start URL.
- Add
--include-pathsif you only need a section. - Set
--limitconservatively on the first pass. - Add
--max-depthif the site has many branches. - Use
--waitfor simple runs, or submit and check the job later for larger crawls. - Save output with
-oso you can review what actually got collected.
This sequence reduces wasted crawls and makes it easier to refine boundaries after the first result.
Scope controls that prevent bad crawls
The most important options surfaced in the skill are:
--include-pathsto keep the crawl in the right section--limit <n>to prevent runaway page counts--max-depth <n>to stop overly deep traversal--waitto block until completion--progressto inspect progress during waiting
If you skip these, a crawl can become too broad faster than expected, especially on docs sites with changelogs, blog links, or cross-linked navigation.
Async vs wait mode
Use --wait when you want a single workflow step and the crawl should finish now. Skip it when the crawl may take longer and you prefer a job-based workflow. The repo explicitly supports checking status later with firecrawl crawl <job-id>, which is useful for larger jobs or agent workflows that separate submission from analysis.
Output handling and review
Always write to a file on serious runs, for example:
firecrawl crawl "https://example.com" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json
This makes post-run review easier. Before asking the agent to summarize or transform the results, verify that the output contains the intended section and page count. Bad crawl boundaries lead to bad downstream synthesis.
Good firecrawl-crawl usage patterns
High-value uses include:
- collecting all docs pages for a product comparison
- pulling a help center section for internal search or RAG prep
- extracting a migration guide cluster before rewriting documentation
- bulk-scraping a known site section where links already connect the relevant pages
These are much better fits than “find anything interesting on this domain.”
firecrawl-crawl skill FAQ
Is firecrawl-crawl beginner-friendly?
Yes, if you already understand the difference between one-page scraping and multi-page crawling. The command surface is small, but beginners should start with a narrow path and low page limit to avoid oversized runs.
What is the difference between firecrawl-crawl and an ordinary prompt?
A plain prompt can describe the goal, but firecrawl-crawl gives the agent a defined operational path: submit a crawl job, control depth and limits, optionally wait, and save structured output. That lowers guesswork and makes repeated runs more consistent.
When should I use firecrawl-crawl instead of scrape?
Use firecrawl-crawl when the target content spans many linked pages. Use scrape when you only need one known URL. If you are not yet sure which pages matter, map or search may be a better earlier step than crawl.
Is firecrawl-crawl good for full-site extraction?
Sometimes, but only if you can tolerate broad coverage and have good limits. For large sites, “full site” is often a bad first run. A docs subsection crawl is usually more practical than starting at the homepage with loose controls.
Does firecrawl-crawl work well for docs sections?
Yes. The repository examples explicitly highlight section-based extraction such as /docs, which is one of the strongest use cases for firecrawl-crawl for Web Scraping.
What can block good results?
The usual blockers are vague scope, missing path filters, no page cap, and starting from the wrong URL. These are not minor details; they directly decide whether the output is useful or noisy.
How to Improve firecrawl-crawl skill
Give tighter crawl boundaries
The fastest way to improve firecrawl-crawl output is to define the crawl boundary precisely. Name the start URL, section path, page cap, and desired depth. “Crawl the docs under /docs up to 2 levels deep” is much better than “crawl the site.”
Start small, then expand
For better adoption and fewer wasted runs, do a small validation crawl first:
- low
--limit - narrow
--include-paths - moderate
--max-depth
If the output looks right, expand the limit. This catches scope errors before they become expensive or slow.
Write prompts that include the post-crawl task
firecrawl-crawl install is only part of success. Also tell the agent what to do after extraction. Example:
- “Use
firecrawl-crawlto extract/docsup to 50 pages, save to.firecrawl/crawl.json, then identify onboarding, auth, and API reference pages.”
This improves end-to-end usefulness because the crawl and analysis are aligned from the start.
Avoid common failure modes
Common issues with the firecrawl-crawl skill:
- starting at the homepage when only one section is needed
- omitting
--limiton a large site - omitting
--max-depthwhen navigation is dense - forgetting
-oand losing an easy review point - asking for “everything” without defining business relevance
Iterate based on output, not assumptions
After the first run, inspect what was actually collected. If irrelevant pages dominate, tighten --include-paths or lower depth. If important pages are missing, increase depth or start from a more relevant entry point. The best firecrawl-crawl guide is iterative: crawl, inspect, refine, rerun.
Keep firecrawl-crawl in the right role
Use firecrawl-crawl for collection, then hand off to summarization, classification, comparison, or indexing steps. Trying to make the crawl step solve every downstream task at once usually reduces clarity. The skill is strongest when it gathers the right corpus first.
