F

firecrawl-crawl

by firecrawl

firecrawl-crawl helps agents bulk extract content from a website or docs section with path filters, depth limits, page caps, wait mode, and job status checks.

Stars234
Favorites0
Comments0
AddedMar 31, 2026
CategoryWeb Scraping
Install Command
npx skills add firecrawl/cli --skill firecrawl-crawl
Curation Score

This skill scores 74/100, which means it is listable and likely useful for agents that need site-wide or section-wide content extraction, but directory users should expect a fairly command-focused guide rather than a deeply supported workflow package. The repository evidence shows strong trigger cues and practical CLI examples for crawling with limits, depth, and path filters, which gives agents more reliable execution guidance than a generic prompt.

74/100
Strengths
  • Strong triggerability: the description explicitly names crawl-style intents like "get all the pages," "/docs," and "bulk extract."
  • Operationally usable: SKILL.md includes concrete `firecrawl crawl` examples for section crawling, depth-limited crawling, and checking a running crawl job.
  • Good agent leverage for a common workflow: it documents key controls like `--include-paths`, `--limit`, `--max-depth`, `--wait`, and `--progress` for bulk extraction tasks.
Cautions
  • Limited install-decision context: there is no install command in SKILL.md and no support files, references, or metadata to help users assess setup requirements.
  • Workflow depth appears modest: structural signals show workflow examples, but little evidence of constraints, edge-case handling, or troubleshooting guidance.
Overview

Overview of firecrawl-crawl skill

What firecrawl-crawl does

The firecrawl-crawl skill is for bulk website extraction, not single-page scraping. It helps an agent crawl a site or a specific section, follow links, and return content from many pages in one job. If your goal is “get all docs pages,” “extract everything under /docs,” or “crawl this help center up to depth 3,” this is the right tool.

Who should use firecrawl-crawl

The best fit for firecrawl-crawl is anyone doing multi-page content collection for documentation analysis, migration, indexing, QA, research, or knowledge ingestion. It is especially useful when a normal prompt would be too manual because the target content spans dozens of linked pages on the same domain.

The real job-to-be-done

Users adopt firecrawl-crawl when they need coverage, not just accuracy on one URL. The main job is to define a crawl boundary clearly enough that the tool gathers the right pages without wasting time on irrelevant sections, duplicates, or the entire public site.

What makes this skill different

The main differentiators are practical crawl controls: path filtering, depth limits, page limits, asynchronous job handling, and optional wait/progress behavior. That makes firecrawl-crawl for Web Scraping more operational than a generic “scrape this site” instruction.

When this skill is a strong match

Use the firecrawl-crawl skill when:

  • you need many pages from one site
  • pages are discoverable through internal links
  • you want to constrain scope with /docs, /blog, or similar paths
  • you need a repeatable crawl command rather than ad hoc prompting

When not to use it

Do not start with firecrawl-crawl if you only need one page, need a URL inventory first, or are still unsure which section matters. In those cases, simpler search, scrape, or map steps are usually better before escalating to crawl.

How to Use firecrawl-crawl skill

Install context for firecrawl-crawl

This skill lives in the firecrawl/cli skill set and is meant to be invoked through the Firecrawl CLI tooling. If your environment supports Skills, the practical install pattern is:

npx skills add https://github.com/firecrawl/cli --skill firecrawl-crawl

You also need the Firecrawl CLI available so the agent can run commands such as firecrawl crawl or npx firecrawl crawl.

Read this file first

Start with skills/firecrawl-crawl/SKILL.md. For this skill, that file contains most of the operational value: when to use it, quick-start commands, and the key options that control crawl scope and runtime behavior.

Core command patterns

The repository shows three key firecrawl-crawl usage patterns:

# Crawl a docs section
firecrawl crawl "<url>" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json

# Full crawl with depth limit
firecrawl crawl "<url>" --max-depth 3 --wait --progress -o .firecrawl/crawl.json

# Check status of a running crawl
firecrawl crawl <job-id>

These cover most real workflows: constrained section crawl, broader site crawl with depth control, and polling an existing job.

Inputs that matter most

To get good results from firecrawl-crawl, provide:

  • a clean starting URL
  • the intended site section, if any
  • a sensible page cap with --limit
  • a depth cap with --max-depth when the site is broad
  • whether you want synchronous completion via --wait
  • an output path so results are easy to inspect later

The biggest quality lever is crawl scope. A good boundary usually matters more than any downstream processing.

Turn a rough request into a strong prompt

Weak request:

  • “Crawl this website and get everything.”

Stronger request:

  • “Use firecrawl-crawl on https://example.com, restrict to /docs, cap at 50 pages, wait for completion, save output to .firecrawl/crawl.json, and summarize the main product setup pages after extraction.”

Why this works:

  • it names the skill
  • it gives a start URL
  • it constrains the path
  • it limits cost and runtime
  • it states what should happen after crawl completion

Best first-run workflow

A practical firecrawl-crawl guide for first use:

  1. Choose the narrowest useful start URL.
  2. Add --include-paths if you only need a section.
  3. Set --limit conservatively on the first pass.
  4. Add --max-depth if the site has many branches.
  5. Use --wait for simple runs, or submit and check the job later for larger crawls.
  6. Save output with -o so you can review what actually got collected.

This sequence reduces wasted crawls and makes it easier to refine boundaries after the first result.

Scope controls that prevent bad crawls

The most important options surfaced in the skill are:

  • --include-paths to keep the crawl in the right section
  • --limit <n> to prevent runaway page counts
  • --max-depth <n> to stop overly deep traversal
  • --wait to block until completion
  • --progress to inspect progress during waiting

If you skip these, a crawl can become too broad faster than expected, especially on docs sites with changelogs, blog links, or cross-linked navigation.

Async vs wait mode

Use --wait when you want a single workflow step and the crawl should finish now. Skip it when the crawl may take longer and you prefer a job-based workflow. The repo explicitly supports checking status later with firecrawl crawl <job-id>, which is useful for larger jobs or agent workflows that separate submission from analysis.

Output handling and review

Always write to a file on serious runs, for example:

firecrawl crawl "https://example.com" --include-paths /docs --limit 50 --wait -o .firecrawl/crawl.json

This makes post-run review easier. Before asking the agent to summarize or transform the results, verify that the output contains the intended section and page count. Bad crawl boundaries lead to bad downstream synthesis.

Good firecrawl-crawl usage patterns

High-value uses include:

  • collecting all docs pages for a product comparison
  • pulling a help center section for internal search or RAG prep
  • extracting a migration guide cluster before rewriting documentation
  • bulk-scraping a known site section where links already connect the relevant pages

These are much better fits than “find anything interesting on this domain.”

firecrawl-crawl skill FAQ

Is firecrawl-crawl beginner-friendly?

Yes, if you already understand the difference between one-page scraping and multi-page crawling. The command surface is small, but beginners should start with a narrow path and low page limit to avoid oversized runs.

What is the difference between firecrawl-crawl and an ordinary prompt?

A plain prompt can describe the goal, but firecrawl-crawl gives the agent a defined operational path: submit a crawl job, control depth and limits, optionally wait, and save structured output. That lowers guesswork and makes repeated runs more consistent.

When should I use firecrawl-crawl instead of scrape?

Use firecrawl-crawl when the target content spans many linked pages. Use scrape when you only need one known URL. If you are not yet sure which pages matter, map or search may be a better earlier step than crawl.

Is firecrawl-crawl good for full-site extraction?

Sometimes, but only if you can tolerate broad coverage and have good limits. For large sites, “full site” is often a bad first run. A docs subsection crawl is usually more practical than starting at the homepage with loose controls.

Does firecrawl-crawl work well for docs sections?

Yes. The repository examples explicitly highlight section-based extraction such as /docs, which is one of the strongest use cases for firecrawl-crawl for Web Scraping.

What can block good results?

The usual blockers are vague scope, missing path filters, no page cap, and starting from the wrong URL. These are not minor details; they directly decide whether the output is useful or noisy.

How to Improve firecrawl-crawl skill

Give tighter crawl boundaries

The fastest way to improve firecrawl-crawl output is to define the crawl boundary precisely. Name the start URL, section path, page cap, and desired depth. “Crawl the docs under /docs up to 2 levels deep” is much better than “crawl the site.”

Start small, then expand

For better adoption and fewer wasted runs, do a small validation crawl first:

  • low --limit
  • narrow --include-paths
  • moderate --max-depth

If the output looks right, expand the limit. This catches scope errors before they become expensive or slow.

Write prompts that include the post-crawl task

firecrawl-crawl install is only part of success. Also tell the agent what to do after extraction. Example:

  • “Use firecrawl-crawl to extract /docs up to 50 pages, save to .firecrawl/crawl.json, then identify onboarding, auth, and API reference pages.”

This improves end-to-end usefulness because the crawl and analysis are aligned from the start.

Avoid common failure modes

Common issues with the firecrawl-crawl skill:

  • starting at the homepage when only one section is needed
  • omitting --limit on a large site
  • omitting --max-depth when navigation is dense
  • forgetting -o and losing an easy review point
  • asking for “everything” without defining business relevance

Iterate based on output, not assumptions

After the first run, inspect what was actually collected. If irrelevant pages dominate, tighten --include-paths or lower depth. If important pages are missing, increase depth or start from a more relevant entry point. The best firecrawl-crawl guide is iterative: crawl, inspect, refine, rerun.

Keep firecrawl-crawl in the right role

Use firecrawl-crawl for collection, then hand off to summarization, classification, comparison, or indexing steps. Trying to make the crawl step solve every downstream task at once usually reduces clarity. The skill is strongest when it gathers the right corpus first.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...