firecrawl-map

by firecrawl

firecrawl-map helps agents discover and list URLs on a site, with options for search filtering, limits, JSON output, sitemap modes, and subdomain control before deeper scraping or crawling.

Stars234

Favorites0

Comments0

AddedMar 31, 2026

CategoryWeb Scraping

Install Command

npx skills add firecrawl/cli --skill firecrawl-map

Curation Score

This skill scores 76/100, which means it is a solid directory listing candidate: agents get clear trigger cues, concrete CLI examples, and enough option coverage to use it with less guesswork than a generic prompt. Directory users can make a credible install decision, though they should expect a fairly lean skill page without much edge-case or setup guidance.

76/100

Strengths

Very strong triggerability: the description names explicit user intents like “map the site,” “find the URL for,” and “list all pages.”
Operationally clear examples show real commands for both targeted search and full URL discovery, including output files and JSON mode.
Useful leverage in a broader workflow: it positions map as a step in a search → scrape → map → crawl → interact pattern.

Cautions

Install/adoption clarity is limited because the skill does not include an install command or setup guidance in SKILL.md.
Support material is minimal: no scripts, references, resources, or explicit constraints/edge-case guidance are present.

Firecrawl Cli Scraping Websites Workflow Json

Overview

Overview of firecrawl-map skill

What firecrawl-map does

firecrawl-map is a focused skill for discovering URLs on a website. It is best used when you know the domain but do not know the exact page, or when you want a fast inventory of site structure before scraping, crawling, or extracting content.

Who should use firecrawl-map skill

The best fit for the firecrawl-map skill is anyone doing web research, site discovery, or pre-scrape planning:

AI agents that need to find the right page before deeper extraction
Developers building web scraping workflows
Researchers auditing a site's public URL footprint
Operators who need a quick list of URLs without launching a full crawl

The real job-to-be-done

Users typically do not want “all pages” as an end in itself. They want to answer questions like:

“Where is the authentication doc on this site?”
“What pages exist under this domain before I scrape?”
“Is there a sitemap-backed shortcut to discover URLs quickly?”
“Should I map first or jump straight to crawl?”

That makes firecrawl-map for Web Scraping especially useful as a discovery step, not a final data extraction step.

Why people choose firecrawl-map

The main differentiator is speed and scope control. Compared with a generic prompt like “find the docs page,” the firecrawl-map skill gives you a reproducible CLI path for listing URLs, filtering by search terms, and exporting output for later steps.

Key strengths surfaced by the repository:

Direct CLI usage with firecrawl map
Optional --search filtering for large sites
URL inventory output in text or JSON
Supports sitemap strategy selection
Useful as a middle step between search and deeper crawl/scrape work

What it is not for

firecrawl-map is not the right tool when you need:

Full page content extraction
Interactive browsing
Detailed structured scraping from each page
Rich site traversal logic beyond URL discovery

In those cases, mapping is the setup step, not the finish line.

How to Use firecrawl-map skill

Install context for firecrawl-map skill

This skill lives in the firecrawl/cli repository under skills/firecrawl-map. It is designed to be invoked in environments that can run:

firecrawl *
npx firecrawl *

If your agent or local workflow can execute Bash commands, this firecrawl-map install path is usually enough:

npx firecrawl map "<url>" --limit 100

If you already have the Firecrawl CLI available globally, use:

firecrawl map "<url>" --limit 100

Read this file first before using it

Start with:

skills/firecrawl-map/SKILL.md

This repository slice is small, so there is not much supporting material to inspect. That is good for adoption speed, but it also means you should be explicit in your prompts about domain, goal, and output format.

Basic firecrawl-map usage patterns

The skill supports two common usage modes.

Find a likely page by topic:

firecrawl map "https://example.com" --search "authentication" -o .firecrawl/filtered.txt

Get a broader URL inventory:

firecrawl map "https://example.com" --limit 500 --json -o .firecrawl/urls.json

This is the core firecrawl-map usage pattern: start narrow with search if you are hunting for one page, or start broad with a capped URL list if you are planning the next scraping step.

What input the skill needs

To use firecrawl-map skill well, provide these inputs clearly:

The root URL or domain
Whether you need one likely page or many URLs
A search phrase, if you know the topic
Desired limit on returned URLs
Output format: plain text or JSON
Whether subdomains should count
How to treat sitemaps

Weak input:

“Find docs on this site”

Strong input:

“Map https://docs.example.com, search for authentication, return top matching URLs as JSON, and include subdomains only if the main docs domain has too few results.”

The stronger version reduces guesswork and makes the command choice obvious.

How to turn a rough request into a strong prompt

A good firecrawl-map guide for prompting is to specify five things in one sentence:

site
intent
scope
filter
output

Example:

“Use firecrawl-map on https://example.com to list up to 200 public URLs, prefer sitemap discovery, skip unrelated subdomains, and save JSON output for later scraping.”

Example for targeted discovery:

“Use firecrawl-map to find the page on https://example.com most related to pricing API limits, and write matching URLs to a text file.”

Best workflow: map before scrape or crawl

A practical workflow looks like this:

Use firecrawl map with --search if you are trying to locate one page.
Use firecrawl map with --limit and --json if you need a broader URL set.
Review the returned URLs.
Select the most relevant pages.
Move to scrape or crawl only after you know the site structure well enough.

This saves time and cost compared with scraping blindly.

Options that materially change output quality

The most important options are:

--search <query>: best for locating a topic page on a large site
--limit <n>: prevents oversized result sets
--json: makes downstream filtering and automation easier
--sitemap <include|skip|only>: useful when sitemap coverage matters
--include-subdomains: expands scope, but can add noise
-o, --output <path>: makes results reusable in a pipeline

If results are noisy, the first things to tighten are search phrase, domain scope, and subdomain inclusion.

Choosing sitemap strategy

The --sitemap option matters more than many users expect:

only: fastest when you trust the site's sitemap and want cleaner coverage
include: good default when you want sitemap help without depending on it fully
skip: useful when sitemap results are stale, incomplete, or misleading

For documentation sites, include or only often produces better firecrawl-map for Web Scraping results than unconstrained discovery.

When to include subdomains

Use --include-subdomains only if the target content may live outside the main hostname, such as:

docs.example.com
developers.example.com
support.example.com

Do not enable it by default for corporate sites unless you truly want broader coverage. It can flood your URL list with marketing, support, or app surfaces unrelated to your goal.

Practical examples users actually need

Find a login or auth doc page:

firecrawl map "https://docs.example.com" --search "authentication" -o .firecrawl/auth-pages.txt

Get a reusable JSON URL inventory:

firecrawl map "https://example.com" --limit 300 --json -o .firecrawl/site-map.json

Prefer sitemap-only discovery for a docs site:

firecrawl map "https://docs.example.com" --sitemap only --limit 500 --json

Broaden scope to subdomains when docs location is unclear:

firecrawl map "https://example.com" --search "API reference" --include-subdomains

Common adoption blockers

The main reasons people struggle with the firecrawl-map skill are not installation issues but request quality issues:

Starting with too broad a domain
Forgetting to add --search when hunting one page
Pulling too many URLs without a limit
Including subdomains too early
Treating map as a content extraction tool

If the first result is messy, narrow the site and sharpen the topic before changing tools.

firecrawl-map skill FAQ

Is firecrawl-map better than a normal prompt?

Yes when the task is URL discovery on a known site. A normal prompt may guess likely pages, but firecrawl-map gives a concrete, repeatable way to enumerate and filter URLs from the target domain.

Is firecrawl-map skill good for beginners?

Yes, because the command surface is small. The easiest starting point is one of these two commands:

firecrawl map "https://example.com" --search "pricing"

firecrawl map "https://example.com" --limit 100 --json

The main beginner mistake is asking it to extract page content, which is outside the skill's core purpose.

When should I use firecrawl-map instead of crawling?

Use firecrawl-map first when you need to understand site structure or locate candidate pages. Use crawling later when you need broader traversal or page-level processing after discovery.

When should I not use firecrawl-map?

Skip it if:

You already know the exact URL
You need page text, metadata, or structured extraction
You need browser interaction rather than URL listing
The task is not site discovery

Does firecrawl-map work well for large sites?

Yes, but only if you control scope. Use --search, --limit, and sitemap strategy deliberately. Large sites are where firecrawl-map usage gets the most value, but also where loose prompts create the most noise.

What output format should I choose?

Choose plain text when a human just needs a quick page list. Choose --json when another tool, script, or downstream step will process the results.

How to Improve firecrawl-map skill

Start with a narrower target than you think

The easiest way to improve firecrawl-map results is to reduce scope early. If you know the content is likely in docs, use the docs hostname directly instead of the company's homepage.

Better:

https://docs.example.com

Worse:

https://example.com

Use search phrases that match page intent

For the firecrawl-map skill, search quality matters more than keyword quantity. Short intent phrases usually beat stuffed queries.

Better:

authentication
rate limits
API reference

Worse:

where can I find complete developer authentication API reference and login documentation

The better version is easier for URL filtering and usually returns cleaner matches.

Pick JSON whenever results feed another step

If your next step is scrape, filter, classify, or deduplicate, use:

--json

This small choice makes the firecrawl-map guide much more automation-friendly and reduces manual cleanup.

Use map iteratively, not once

A strong workflow is:

Run a narrow --search
Inspect likely URLs
Run a second map on the best subdomain or section
Increase --limit only if needed
Move to scrape/crawl after discovery stabilizes

This beats one huge run because it keeps signal high.

Watch for common failure modes

Typical failure modes with firecrawl-map for Web Scraping:

Too many irrelevant URLs from broad domains
Missing target pages because search terms are vague
Incomplete inventories from relying on the wrong sitemap strategy
Noisy results from enabling subdomains unnecessarily

Each has a simple fix: tighten site, sharpen query, change sitemap mode, or shrink scope.

Improve prompts by specifying success criteria

Do not just ask for “all URLs.” Say what would count as success.

Example:

“Use firecrawl-map to find pages related to authentication setup on https://docs.example.com. Return the most relevant URLs first, cap at 50, and save JSON output for follow-up scraping.”

That makes the tool choice, parameters, and stopping point much clearer.

Keep a simple escalation path

Use this practical decision path:

Need one likely page: map --search
Need a URL inventory: map --limit --json
Need page content: scrape after map
Need broader traversal: crawl after map

This is the most useful way to improve firecrawl-map outcomes without overcomplicating your workflow.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

huggingface-datasets

by huggingface

Use the huggingface-datasets skill for Hugging Face Dataset Viewer API workflows to validate datasets, resolve splits, preview and paginate rows, search text, apply filters, and fetch parquet links or statistics. It is a practical huggingface-datasets guide for read-only dataset exploration.

Web Scraping

Favorites 0GitHub 10.4k

data-scraper-agent

by affaan-m

data-scraper-agent helps build a repeatable public-data pipeline for web scraping, enrichment, and storage. It is designed for monitoring jobs, prices, news, repos, sports, and listings on a schedule using GitHub Actions, with outputs to Notion, Sheets, or Supabase. Best for ongoing tracking, not one-off extractions.

Web Scraping

Favorites 0GitHub 156.1k

baoyu-url-to-markdown

by JimLiu

baoyu-url-to-markdown converts live URLs to Markdown with a vendored baoyu-fetch CLI using Chrome CDP, site adapters, and generic fallback. Review Bun runtime needs, first-time EXTEND.md setup, and usage for X, YouTube, Hacker News, and rendered pages.

Format Conversion

Favorites 0GitHub 13.2k

x-twitter-scraper

by Xquik-dev

Use x-twitter-scraper to retrieve X (Twitter) data and confirmation-gated actions through Xquik. It supports tweet search, user lookup, follower extraction, media download, monitors, webhooks, MCP, and write actions. Best for web scraping-style research with an API key, not X login secrets.

Web Scraping

Favorites 0GitHub 71

exa-search

by K-Dense-AI

exa-search is a web research skill powered by Exa for finding current information and extracting content from URLs. Use it for search, source discovery, article and PDF extraction, and technical or scientific research with semantic retrieval, academic-style filtering, and clear install and usage guidance.

Web Research

Favorites 0GitHub 0

browser-use

by browser-use

browser-use is a browser automation skill for opening pages, inspecting state, clicking indexed elements, typing into fields, taking screenshots, and reusing a persistent browser session. Use it for reliable form filling, navigation, and logged-in workflows with the browser-use CLI.

Browser Automation

Favorites 0GitHub 84.9k

remote-browser

by browser-use

remote-browser helps sandboxed agents control a headless browser for Browser Automation. Use it to open pages, inspect state, click indexed elements, type input, take screenshots, and connect to local apps or CDP-backed browser sessions.

Browser Automation

Favorites 0GitHub 84.9k

firecrawl

by firecrawl

firecrawl skill for installing, authenticating, and using the official Firecrawl CLI for web scraping, search, crawling, and page interaction. Learn setup, `firecrawl --status`, login, safe file output to `.firecrawl/`, and practical usage patterns backed by the repo.

Web Scraping

Favorites 0GitHub 234

firecrawl-search

by firecrawl

firecrawl-search is a web research skill for finding sources, running structured search, and optionally scraping full page content as JSON with Firecrawl CLI.

Web Research

Favorites 0GitHub 234

parallel-web

by K-Dense-AI

parallel-web is a web research and extraction skill powered by parallel-cli. It helps you search the web, extract URL content, enrich data from sources, and run deeper research with academic and scientific sources prioritized. Use it for parallel-web usage, web research, citations, and evidence-first workflows.

Web Research

Favorites 0GitHub 0

geomaster

by K-Dense-AI

geomaster is a geospatial science skill for GIS, remote sensing, spatial analysis, and Earth observation workflows. Use it for Data Analysis tasks like raster and vector operations, satellite imagery processing, spatial metrics, and workflow planning. The geomaster guide helps you install, inspect, and apply the skill with less guesswork.

Data Analysis

Favorites 0GitHub 0

asc-aso-audit

by rudrankriyam

asc-aso-audit helps you run an offline ASO audit on canonical App Store metadata in `./metadata`, then surface keyword gaps with Astro MCP. Use the asc-aso-audit skill after `asc metadata pull` to review `subtitle`, `keywords`, `description`, and `whatsNew` with less guesswork.

Data Analysis

Favorites 0GitHub 0

ffuf-web-fuzzing

by jthack

ffuf-web-fuzzing is a practical skill for discovering hidden web content, testing routes and parameters, and fuzzing authenticated targets with raw requests, auto-calibration, and result analysis. It fits security testers who need a repeatable ffuf-web-fuzzing guide for penetration testing and Security Audit workflows.

Security Audit

Favorites 0GitHub 0

web-to-markdown

by softaworks

web-to-markdown is a Format Conversion skill that turns live web pages into clean Markdown through the local web2md CLI, using a Chromium-family browser for JS-rendered pages, interactive flows, and batch URL conversion. It only runs when explicitly invoked by name.

Format Conversion

Favorites 0GitHub 1.3k

firecrawl-agent

by firecrawl

firecrawl-agent helps extract structured JSON from complex, multi-page websites. Learn when to use it, how to run the Firecrawl CLI agent, add schemas, set starting URLs, and save outputs for pricing, products, and directory-style data extraction.

Web Scraping

Favorites 0GitHub 234

firecrawl-crawl

by firecrawl

firecrawl-crawl helps agents bulk extract content from a website or docs section with path filters, depth limits, page caps, wait mode, and job status checks.

Web Scraping

Favorites 0GitHub 234