data-scraper-agent
by affaan-mdata-scraper-agent helps build a repeatable public-data pipeline for web scraping, enrichment, and storage. It is designed for monitoring jobs, prices, news, repos, sports, and listings on a schedule using GitHub Actions, with outputs to Notion, Sheets, or Supabase. Best for ongoing tracking, not one-off extractions.
This skill scores 84/100, which means it is a solid directory listing candidate: users get a clearly triggerable data-scraping workflow, enough operational detail to understand the stack and purpose quickly, and real guidance beyond a generic prompt. It should help agents execute public-data monitoring tasks with less guesswork, though users should still verify fit for their specific target site and storage setup.
- Explicit activation guidance covers common public-data monitoring requests like scraping, tracking, and scheduled collection.
- Strong workflow framing shows the full COLLECT → ENRICH → STORE pipeline, which helps agents execute with less ambiguity.
- Substantive body content with no placeholder markers, plus concrete stack references (Python, Gemini Flash, GitHub Actions, Notion/Sheets/Supabase).
- No install command or support files are present, so setup and integration may require manual interpretation from the SKILL.md alone.
- The skill is broad by design, so edge cases like site-specific anti-bot measures or unusual data sources are not deeply operationalized in the excerpt.
Overview of data-scraper-agent skill
What data-scraper-agent does
The data-scraper-agent skill helps you build an automated pipeline that collects public data, enriches it with an LLM, and stores the output for ongoing tracking. It is best for data-scraper-agent for Web Scraping tasks where the goal is not a one-off scrape, but a repeatable agent that keeps checking sources like job boards, pricing pages, news feeds, GitHub repos, sports results, and listings.
Who should install it
Install the data-scraper-agent skill if you need a low-cost way to monitor public sources on a schedule, without maintaining your own server. It fits users who want alerts, structured records, or trend tracking more than ad hoc scraping. It is less useful if you only need a single manual extraction or if the target site is private, login-gated, or heavily anti-bot protected.
Why it is different
The main value of this data-scraper-agent skill is the workflow, not just the scraper. It emphasizes a three-step loop: collect, enrich, store. That makes it easier to turn raw pages into usable data, classify results, and keep the system running via GitHub Actions. The practical tradeoff is that quality depends on the source being public and on giving the agent clear schema and filtering rules.
How to Use data-scraper-agent skill
Install and inspect the skill
Use the data-scraper-agent install command in your Claude Code workflow:
npx skills add affaan-m/everything-claude-code --skill data-scraper-agent
After install, read SKILL.md first, then check the rest of the skill context in the repo if present. Even though this skill is self-contained, the best way to use data-scraper-agent usage is to confirm the execution path, output format, and any assumptions before you ask it to build against a real target.
Turn a vague request into a usable brief
A weak prompt like “scrape this site” does not give enough structure. A strong prompt tells the skill what source to monitor, what fields to collect, how often to run, and where results should land. For example: “Build a data-scraper-agent for public software engineering jobs on two boards, collect title/company/location/salary/posted date, dedupe by URL, enrich with role seniority, and store weekly results in Google Sheets.”
What to specify for better output
The skill works best when you provide the public source, the desired schema, and the decision logic. Include whether the site is static or JS-rendered, how fresh the data needs to be, and what counts as a new or changed record. If you omit those details, the agent may scrape too much, miss important fields, or produce records that are hard to compare over time.
Files and concepts to read first
Start with SKILL.md and focus on the sections that explain activation, the three-layer architecture, and the free stack. Those parts tell you when the skill is the right fit and how to wire the pipeline. If you are adapting it to a new repo, look for the concrete examples of schedule setup, storage choices, and enrichment rules before you modify prompts.
data-scraper-agent skill FAQ
Is this only for web pages?
No. The data-scraper-agent guide is for any public source the agent can reach, including APIs, feeds, and pages that may need browser rendering. For simple HTML pages, basic HTTP scraping is often enough. For dynamic sites, you may need a browser-based approach, which increases setup complexity.
Do I need coding experience to use it?
Basic comfort with prompting helps, but this is still a build-oriented skill. Beginners can use it if they can describe the source and desired output clearly. If you cannot define the fields, schedule, or destination, the result will likely be too vague to deploy reliably.
How is it different from a normal prompt?
A normal prompt usually produces a one-off scraper or summary. The data-scraper-agent skill is meant to create a repeatable system with collection, enrichment, storage, and scheduled runs. That makes it more suitable when you care about maintaining data over time, not just extracting it once.
When should I not use it?
Do not use data-scraper-agent if the source requires login, has strict rate limits, blocks automation, or the data is highly sensitive. It is also a poor fit when you only need a quick manual export or when the source changes so often that a simple prompt would be easier than maintaining an agent.
How to Improve data-scraper-agent skill
Give tighter source definitions
The strongest data-scraper-agent results come from naming exact URLs, patterns, and scope boundaries. Say which pages matter, which ones do not, and what the agent should ignore. For example, “monitor only the listing pages for remote backend roles in the US; exclude internships, sponsored posts, and duplicate reposts.” That kind of brief reduces false positives and helps the agent stay stable.
Define the enrichment and storage rules
If you want useful output, tell the skill what the LLM should infer and what must remain literal. Use enrichment for classification, priority scoring, or short summaries, but keep source fields like price, title, and URL exact. Also specify the destination format up front: Notion for review workflows, Sheets for lightweight analysis, Supabase for structured querying.
Review the first run for failure modes
The most common problems are duplicated records, missing fields from dynamic pages, and over-aggressive enrichment that changes the meaning of the source. After the first run, inspect a few records and tighten the prompt around deduping, selectors, and accepted source fields. If the output is noisy, reduce the scope before adding more automation.
Iterate based on what you actually track
Use the first version to prove the monitoring loop, then improve data-scraper-agent based on the signals you care about most: freshness, completeness, or classification quality. If freshness matters, refine the schedule. If completeness matters, adjust extraction rules. If decision-making matters, improve the enrichment prompt so the agent explains why each item was included.
