F

firecrawl-scrape

by firecrawl

firecrawl-scrape helps extract clean, LLM-friendly content from known URLs, including JS-rendered pages. Use it to scrape markdown, links, or page-specific answers with Firecrawl CLI or npx firecrawl.

Stars234
Favorites0
Comments0
AddedMar 31, 2026
CategoryWeb Scraping
Install Command
npx skills add https://github.com/firecrawl/cli --skill firecrawl-scrape
Curation Score

This skill scores 72/100, which means it is acceptable to list for directory users who want a clear URL-scraping command, but it is not especially complete as an install-decision page. The repository evidence shows strong triggerability and practical command examples for scraping static or JS-rendered pages into markdown, including multi-URL use, output formats, and query-based extraction. However, adoption clarity is held back by very sparse top-level description text, no install command in SKILL.md, and no support files or deeper operational guidance.

72/100
Strengths
  • Strong trigger cues in the description explicitly map user intents like "scrape", "fetch", and "read this webpage" to this skill.
  • Quick-start examples show concrete usage patterns: basic scrape, main-content-only, JS wait, multiple URLs, alternate formats, and page querying.
  • Operational value is specific versus a generic prompt: it directs agents to use `firecrawl scrape`/`npx firecrawl`, save outputs, and prefer this over WebFetch for webpage extraction.
Cautions
  • SKILL.md does not include an install command, so users still need outside context to set up the CLI before they can run it.
  • Repository support is thin beyond one markdown file; there are no scripts, references, or companion resources for troubleshooting, auth/setup, or edge-case handling.
Overview

Overview of firecrawl-scrape skill

What firecrawl-scrape does

The firecrawl-scrape skill is for extracting clean, LLM-friendly content from one or more web pages when you already know the URL. It is built for practical page retrieval, not broad site discovery: give it a page, and it returns structured output such as markdown, links, or a direct query answer based on that page.

Who should use firecrawl-scrape

This skill fits users who need reliable page content from:

  • documentation pages
  • blog posts
  • pricing pages
  • product pages
  • JavaScript-rendered sites and SPAs

It is especially useful if ordinary fetch tools fail on client-rendered pages or return noisy HTML that is awkward to pass into an LLM.

The real job-to-be-done

Most users do not want “web scraping” in the abstract. They want one of these outcomes:

  • read a page into markdown for later analysis
  • pull the main content without headers and footers
  • extract links alongside page text
  • ask a focused question about a known URL
  • scrape several known URLs in parallel

That is where firecrawl-scrape is stronger than a generic prompt that says “read this webpage.”

Why users pick this skill over generic fetch

The main differentiator is that firecrawl-scrape is designed for webpage content extraction, including JS-rendered pages, and returns output optimized for LLM workflows. The upstream skill explicitly says to use it instead of WebFetch for webpage content extraction. That matters if your usual browser or fetch path misses rendered content, navigation clutter, or link context.

Best-fit and misfit in one glance

Best fit:

  • you already have the URL
  • you want page content, not site-wide exploration
  • you need markdown or links in a machine-usable format
  • the page may require render time before content appears

Misfit:

  • you need to discover URLs first
  • you need whole-site traversal
  • you need interaction beyond page scraping
  • you only need a simple static HTML fetch and already trust another tool

How to Use firecrawl-scrape skill

firecrawl-scrape install context

This skill lives in the firecrawl/cli repository under skills/firecrawl-scrape. The skill itself is invocation guidance for the Firecrawl CLI, so the practical requirement is access to the firecrawl command or npx firecrawl. The examples in the skill use both forms:

  • firecrawl scrape ...
  • npx firecrawl ...

If your environment does not already have the CLI available, use the npx firecrawl form to reduce setup friction.

What input firecrawl-scrape needs

At minimum, firecrawl-scrape needs a concrete URL. From there, the quality of output depends on what else you specify:

  • output format needed: markdown, links, or both
  • whether to keep only main content
  • whether the page needs render delay with --wait-for
  • whether you want raw page content saved to a file
  • whether you want a targeted answer using --query

This is not a skill for vague goals like “research this company online.” It is for “scrape this exact page and return useful output.”

The fastest successful first command

If you just need readable page content, start here:

firecrawl scrape "<url>" -o .firecrawl/page.md

If the page is cluttered with navigation or sidebars, use:

firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md

If the page is a SPA or loads content after render:

firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md

When to use main-content mode

--only-main-content is one of the highest-value options because it often improves downstream summarization and extraction quality. Use it when your goal is:

  • summarizing an article
  • extracting product or pricing details
  • feeding content into another LLM step
  • reducing token waste from menus, footers, and repeated page chrome

Skip it if you explicitly need navigation links or surrounding layout context.

How to handle JavaScript-rendered pages

A common adoption blocker is pages that look fine in a browser but return incomplete content through simple fetch methods. firecrawl-scrape addresses that with render-aware scraping. In practice, if content appears late, add --wait-for with a realistic delay such as 3000.

Use render waiting when:

  • product specs populate after page load
  • documentation content hydrates client-side
  • pricing tables appear after scripts run

Do not add long waits by default. Start small and only increase delay when output is clearly missing content.

How to scrape multiple URLs efficiently

The skill supports multiple URLs in one command and notes that they are scraped concurrently. That makes it useful for small known-page batches such as:

  • several docs pages
  • a homepage, pricing page, and FAQ
  • a blog post set you already selected

Example:

firecrawl scrape https://example.com https://example.com/blog https://example.com/docs

This is more appropriate than a crawl when you already know the exact targets.

If your next step depends on both readable content and page references, request multiple formats:

firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json

This is a strong choice for workflows like:

  • extract content, then inspect outbound links
  • build citation-aware notes
  • separate body text from navigation and referenced destinations

Choose JSON output when you need structured post-processing rather than a single markdown file.

How to use firecrawl-scrape for targeted questions

One of the most practical firecrawl-scrape usage patterns is asking a page-specific question during scraping:

firecrawl scrape "https://example.com/pricing" --query "What is the enterprise plan price?"

This works best when:

  • the answer is likely on one page
  • you want a focused extraction instead of full-page review
  • you want to reduce manual reading time

It is weaker when the answer spans multiple pages or requires comparing several documents.

Turn a rough request into a strong prompt

Weak request:

  • “Scrape this site and tell me what matters.”

Strong request:

  • “Use firecrawl-scrape on https://example.com/pricing with --only-main-content. Save markdown to .firecrawl/pricing.md. Then extract plan names, monthly prices, annual billing notes, and enterprise contact language.”

Why this is better:

  • it gives a specific URL
  • it chooses the right output mode
  • it defines what to extract after scraping
  • it reduces ambiguity about scope

Suggested workflow for firecrawl-scrape for Web Scraping

A good practical sequence is:

  1. Confirm you have the exact page URL.
  2. Start with markdown extraction.
  3. Add --only-main-content if the page is noisy.
  4. Add --wait-for if rendered content is missing.
  5. Switch to --format markdown,links if link structure matters.
  6. Use --query only when the task is narrow and page-bounded.

This follows the upstream positioning of scrape as a middle step in a broader workflow: search → scrape → map → crawl → interact.

Files to read first in the repository

Read skills/firecrawl-scrape/SKILL.md first. It contains nearly all of the practical value:

  • when to use the skill
  • quick-start commands
  • supported options
  • usage tips

Because this skill directory entry is install-oriented, the key pre-install takeaway is simple: the source document is concise, and there are no extra helper scripts or references you need to inspect before trying it.

Practical adoption tips that change output quality

A few choices matter disproportionately:

  • Prefer exact URLs over top-level domains.
  • Use --only-main-content for analysis-heavy tasks.
  • Use --wait-for only when output is visibly incomplete.
  • Save outputs to .firecrawl/ so you can inspect raw results before chaining more automation.
  • Use --query for page-local facts, not open-ended research.

These small decisions usually matter more than adding more prompt wording.

firecrawl-scrape skill FAQ

Is firecrawl-scrape better than a normal prompt with a URL?

Usually yes, if the job is actual webpage extraction. The firecrawl-scrape skill gives a clear invocation path, supports JS-rendered pages, can return markdown or links, and exposes scraping-specific options. A normal prompt may work for simple reading tasks, but it is less reliable when pages need rendering or cleaner output structure.

When should I use firecrawl-scrape instead of WebFetch?

Use firecrawl-scrape when you want webpage content extraction. The upstream skill explicitly recommends it instead of WebFetch for that purpose. That recommendation is most relevant for rendered pages, cleaner markdown output, and scraping workflows that need repeatable CLI behavior.

Is firecrawl-scrape beginner-friendly?

Yes, relative to many scraping tools. The first-run path is short: provide a URL, run a command, inspect the output. You do not need to understand full crawling strategy to get value. The main thing beginners must know is that this is page scraping, not site-wide exploration.

Can firecrawl-scrape handle SPAs and dynamic pages?

Yes. That is one of its core reasons to exist. If a page relies on JavaScript rendering, use --wait-for when needed so the content has time to appear before extraction.

When is firecrawl-scrape the wrong choice?

Avoid it when:

  • you do not know the target URL yet
  • you need broad domain discovery
  • you need recursive site traversal
  • your task requires interaction rather than extraction
  • the answer must be synthesized across many pages you have not identified

In those cases, search, map, crawl, or other tools are a better first step.

Do I need to install the whole repository to use it?

You need access to the Firecrawl CLI behavior the skill references, but the skill itself is lightweight. For decision-making, there is little repo overhead here: the practical instructions are concentrated in SKILL.md, and there are no companion scripts or resource folders you need to master first.

How to Improve firecrawl-scrape skill

Give firecrawl-scrape narrower goals

The most common quality issue is overbroad intent. Better results come from requests like:

  • “extract the pricing table”
  • “return markdown plus links”
  • “answer this one question from the page”
    not:
  • “scrape everything useful”

The narrower the page task, the less cleanup you need afterward.

Improve inputs with page-aware instructions

Strong inputs combine URL, output mode, and extraction target. Example:

firecrawl scrape "https://example.com/docs/auth" \
  --only-main-content \
  -o .firecrawl/auth.md

Then tell the agent exactly what to do with that file:

  • summarize setup steps
  • list required headers
  • extract code examples
  • compare auth methods

This two-step pattern is often more dependable than asking for scraping and analysis in one vague request.

Fix missing content before changing the whole workflow

If output looks thin, first test whether the page needs rendering time:

firecrawl scrape "<url>" --wait-for 3000 -o .firecrawl/page.md

Many users switch tools too early when the real issue is simply that the page had not finished rendering.

Reduce noise before downstream analysis

If the result is full of navigation, cookie text, or footer content, switch to:

firecrawl scrape "<url>" --only-main-content -o .firecrawl/page.md

This often improves:

  • summarization quality
  • extraction precision
  • token efficiency
  • consistency across similar pages

Use structured output when you plan to automate

If the scraped page feeds another step, ask for structured formats up front rather than reparsing markdown later:

firecrawl scrape "<url>" --format markdown,links -o .firecrawl/page.json

That makes firecrawl-scrape install decisions easier too: if your workflow depends on link-aware automation, this skill has a clearer fit than plain text fetch tools.

Iterate after the first run, not before

A productive firecrawl-scrape guide pattern is:

  1. run the simplest scrape
  2. inspect what is missing or noisy
  3. add one option to fix that specific issue
  4. rerun and compare

Typical iteration path:

  • baseline scrape
  • add --only-main-content
  • add --wait-for
  • add --format markdown,links
  • use --query for direct extraction

This is faster than designing a complex command before you have seen the page output.

Common failure modes to watch for

The biggest practical issues are:

  • using a homepage when the real target is a subpage
  • expecting scrape to behave like crawl
  • not waiting for JS-rendered content
  • asking --query questions that require multiple pages
  • saving only final summaries instead of raw scrape output

Most of these are avoidable with clearer scope and one inspection pass.

How advanced users get more from firecrawl-scrape

Advanced users usually improve results by composing firecrawl-scrape with later steps, not by overcomplicating the scrape itself. A strong pattern is:

  • scrape exact pages cleanly
  • save raw outputs
  • run extraction, comparison, or synthesis afterward

That keeps firecrawl-scrape for Web Scraping focused on the page retrieval layer, where it performs best.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...