data-scraper-agent

by affaan-m

data-scraper-agent helps build a repeatable public-data pipeline for web scraping, enrichment, and storage. It is designed for monitoring jobs, prices, news, repos, sports, and listings on a schedule using GitHub Actions, with outputs to Notion, Sheets, or Supabase. Best for ongoing tracking, not one-off extractions.

Stars156.1k

Favorites0

Comments0

AddedApr 15, 2026

CategoryWeb Scraping

Install Command

npx skills add affaan-m/everything-claude-code --skill data-scraper-agent

Curation Score

This skill scores 84/100, which means it is a solid directory listing candidate: users get a clearly triggerable data-scraping workflow, enough operational detail to understand the stack and purpose quickly, and real guidance beyond a generic prompt. It should help agents execute public-data monitoring tasks with less guesswork, though users should still verify fit for their specific target site and storage setup.

84/100

Strengths

Explicit activation guidance covers common public-data monitoring requests like scraping, tracking, and scheduled collection.
Strong workflow framing shows the full COLLECT → ENRICH → STORE pipeline, which helps agents execute with less ambiguity.
Substantive body content with no placeholder markers, plus concrete stack references (Python, Gemini Flash, GitHub Actions, Notion/Sheets/Supabase).

Cautions

No install command or support files are present, so setup and integration may require manual interpretation from the SKILL.md alone.
The skill is broad by design, so edge cases like site-specific anti-bot measures or unusual data sources are not deeply operationalized in the excerpt.

Python Google Notion Supabase Playwright Github Actions

Overview

Overview of data-scraper-agent skill

What data-scraper-agent does

The data-scraper-agent skill helps you build an automated pipeline that collects public data, enriches it with an LLM, and stores the output for ongoing tracking. It is best for data-scraper-agent for Web Scraping tasks where the goal is not a one-off scrape, but a repeatable agent that keeps checking sources like job boards, pricing pages, news feeds, GitHub repos, sports results, and listings.

Who should install it

Install the data-scraper-agent skill if you need a low-cost way to monitor public sources on a schedule, without maintaining your own server. It fits users who want alerts, structured records, or trend tracking more than ad hoc scraping. It is less useful if you only need a single manual extraction or if the target site is private, login-gated, or heavily anti-bot protected.

Why it is different

The main value of this data-scraper-agent skill is the workflow, not just the scraper. It emphasizes a three-step loop: collect, enrich, store. That makes it easier to turn raw pages into usable data, classify results, and keep the system running via GitHub Actions. The practical tradeoff is that quality depends on the source being public and on giving the agent clear schema and filtering rules.

How to Use data-scraper-agent skill

Install and inspect the skill

Use the data-scraper-agent install command in your Claude Code workflow:
npx skills add affaan-m/everything-claude-code --skill data-scraper-agent

After install, read SKILL.md first, then check the rest of the skill context in the repo if present. Even though this skill is self-contained, the best way to use data-scraper-agent usage is to confirm the execution path, output format, and any assumptions before you ask it to build against a real target.

Turn a vague request into a usable brief

A weak prompt like “scrape this site” does not give enough structure. A strong prompt tells the skill what source to monitor, what fields to collect, how often to run, and where results should land. For example: “Build a data-scraper-agent for public software engineering jobs on two boards, collect title/company/location/salary/posted date, dedupe by URL, enrich with role seniority, and store weekly results in Google Sheets.”

What to specify for better output

The skill works best when you provide the public source, the desired schema, and the decision logic. Include whether the site is static or JS-rendered, how fresh the data needs to be, and what counts as a new or changed record. If you omit those details, the agent may scrape too much, miss important fields, or produce records that are hard to compare over time.

Files and concepts to read first

Start with SKILL.md and focus on the sections that explain activation, the three-layer architecture, and the free stack. Those parts tell you when the skill is the right fit and how to wire the pipeline. If you are adapting it to a new repo, look for the concrete examples of schedule setup, storage choices, and enrichment rules before you modify prompts.

data-scraper-agent skill FAQ

Is this only for web pages?

No. The data-scraper-agent guide is for any public source the agent can reach, including APIs, feeds, and pages that may need browser rendering. For simple HTML pages, basic HTTP scraping is often enough. For dynamic sites, you may need a browser-based approach, which increases setup complexity.

Do I need coding experience to use it?

Basic comfort with prompting helps, but this is still a build-oriented skill. Beginners can use it if they can describe the source and desired output clearly. If you cannot define the fields, schedule, or destination, the result will likely be too vague to deploy reliably.

How is it different from a normal prompt?

A normal prompt usually produces a one-off scraper or summary. The data-scraper-agent skill is meant to create a repeatable system with collection, enrichment, storage, and scheduled runs. That makes it more suitable when you care about maintaining data over time, not just extracting it once.

When should I not use it?

Do not use data-scraper-agent if the source requires login, has strict rate limits, blocks automation, or the data is highly sensitive. It is also a poor fit when you only need a quick manual export or when the source changes so often that a simple prompt would be easier than maintaining an agent.

How to Improve data-scraper-agent skill

Give tighter source definitions

The strongest data-scraper-agent results come from naming exact URLs, patterns, and scope boundaries. Say which pages matter, which ones do not, and what the agent should ignore. For example, “monitor only the listing pages for remote backend roles in the US; exclude internships, sponsored posts, and duplicate reposts.” That kind of brief reduces false positives and helps the agent stay stable.

Define the enrichment and storage rules

If you want useful output, tell the skill what the LLM should infer and what must remain literal. Use enrichment for classification, priority scoring, or short summaries, but keep source fields like price, title, and URL exact. Also specify the destination format up front: Notion for review workflows, Sheets for lightweight analysis, Supabase for structured querying.

Review the first run for failure modes

The most common problems are duplicated records, missing fields from dynamic pages, and over-aggressive enrichment that changes the meaning of the source. After the first run, inspect a few records and tighten the prompt around deduping, selectors, and accepted source fields. If the output is noisy, reduce the scope before adding more automation.

Iterate based on what you actually track

Use the first version to prove the monitoring loop, then improve data-scraper-agent based on the signals you care about most: freshness, completeness, or classification quality. If freshness matters, refine the schedule. If completeness matters, adjust extraction rules. If decision-making matters, improve the enrichment prompt so the agent explains why each item was included.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

huggingface-datasets

by huggingface

Use the huggingface-datasets skill for Hugging Face Dataset Viewer API workflows to validate datasets, resolve splits, preview and paginate rows, search text, apply filters, and fetch parquet links or statistics. It is a practical huggingface-datasets guide for read-only dataset exploration.

Web Scraping

Favorites 0GitHub 10.4k

browser-automation

by alirezarezvani

browser-automation helps agents build Playwright workflows for scraping, form filling, screenshots, downloads, session handling, and structured data extraction. Includes recipes, API references, and helper scripts for production browser automation, not E2E testing.

Browser Automation

Favorites 0GitHub 22.2k

baoyu-url-to-markdown

by JimLiu

baoyu-url-to-markdown converts live URLs to Markdown with a vendored baoyu-fetch CLI using Chrome CDP, site adapters, and generic fallback. Review Bun runtime needs, first-time EXTEND.md setup, and usage for X, YouTube, Hacker News, and rendered pages.

Format Conversion

Favorites 0GitHub 13.2k

x-twitter-scraper

by Xquik-dev

Use x-twitter-scraper to retrieve X (Twitter) data and confirmation-gated actions through Xquik. It supports tweet search, user lookup, follower extraction, media download, monitors, webhooks, MCP, and write actions. Best for web scraping-style research with an API key, not X login secrets.

Web Scraping

Favorites 0GitHub 71

exa-search

by K-Dense-AI

exa-search is a web research skill powered by Exa for finding current information and extracting content from URLs. Use it for search, source discovery, article and PDF extraction, and technical or scientific research with semantic retrieval, academic-style filtering, and clear install and usage guidance.

Web Research

Favorites 0GitHub 0

browser-use

by browser-use

browser-use is a browser automation skill for opening pages, inspecting state, clicking indexed elements, typing into fields, taking screenshots, and reusing a persistent browser session. Use it for reliable form filling, navigation, and logged-in workflows with the browser-use CLI.

Browser Automation

Favorites 0GitHub 84.9k

remote-browser

by browser-use

remote-browser helps sandboxed agents control a headless browser for Browser Automation. Use it to open pages, inspect state, click indexed elements, type input, take screenshots, and connect to local apps or CDP-backed browser sessions.

Browser Automation

Favorites 0GitHub 84.9k

Firecrawl Automation

by ComposioHQ

Firecrawl Automation helps Claude Code run Firecrawl through Composio to scrape pages, crawl sites, extract structured data, batch process URLs, and map site structures with scoped, credit-aware workflows.

Web Scraping

Favorites 0GitHub 67.5k

firecrawl

by firecrawl

firecrawl skill for installing, authenticating, and using the official Firecrawl CLI for web scraping, search, crawling, and page interaction. Learn setup, `firecrawl --status`, login, safe file output to `.firecrawl/`, and practical usage patterns backed by the repo.

Web Scraping

Favorites 0GitHub 234

firecrawl-search

by firecrawl

firecrawl-search is a web research skill for finding sources, running structured search, and optionally scraping full page content as JSON with Firecrawl CLI.

Web Research

Favorites 0GitHub 234

parallel-web

by K-Dense-AI

parallel-web is a web research and extraction skill powered by parallel-cli. It helps you search the web, extract URL content, enrich data from sources, and run deeper research with academic and scientific sources prioritized. Use it for parallel-web usage, web research, citations, and evidence-first workflows.

Web Research

Favorites 0GitHub 0

geomaster

by K-Dense-AI

geomaster is a geospatial science skill for GIS, remote sensing, spatial analysis, and Earth observation workflows. Use it for Data Analysis tasks like raster and vector operations, satellite imagery processing, spatial metrics, and workflow planning. The geomaster guide helps you install, inspect, and apply the skill with less guesswork.

Data Analysis

Favorites 0GitHub 0

asc-aso-audit

by rudrankriyam

asc-aso-audit helps you run an offline ASO audit on canonical App Store metadata in `./metadata`, then surface keyword gaps with Astro MCP. Use the asc-aso-audit skill after `asc metadata pull` to review `subtitle`, `keywords`, `description`, and `whatsNew` with less guesswork.

Data Analysis

Favorites 0GitHub 0

ffuf-web-fuzzing

by jthack

ffuf-web-fuzzing is a practical skill for discovering hidden web content, testing routes and parameters, and fuzzing authenticated targets with raw requests, auto-calibration, and result analysis. It fits security testers who need a repeatable ffuf-web-fuzzing guide for penetration testing and Security Audit workflows.

Security Audit

Favorites 0GitHub 0

web-to-markdown

by softaworks

web-to-markdown is a Format Conversion skill that turns live web pages into clean Markdown through the local web2md CLI, using a Chromium-family browser for JS-rendered pages, interactive flows, and batch URL conversion. It only runs when explicitly invoked by name.

Format Conversion

Favorites 0GitHub 1.3k

Apify Automation

by ComposioHQ

Apify Automation is a Claude skill for running Apify Actors through Composio: connect MCP, run sync or async scraping jobs, fetch datasets, create tasks, and inspect logs.

Web Scraping

Favorites 0GitHub 67.4k