defuddle

by kepano

defuddle extracts clean markdown from web pages with the Defuddle CLI, removing clutter for research, docs, and articles. Use it for standard HTML pages, install with npm, and skip URLs ending in .md.

Stars19.7k

Favorites0

Comments0

AddedApr 5, 2026

CategoryWeb Research

Install Command

npx skills add kepano/obsidian-skills --skill defuddle

Curation Score

This skill scores 76/100, which means it is a solid directory listing candidate: agents get a clear trigger, a simple command pattern, and a concrete reason to use it instead of a generic web fetch for normal web pages. Directory users can make a credible install decision, though they should expect a lightweight wrapper around an external CLI rather than a deeply guided workflow.

76/100

Strengths

Strong triggerability: it explicitly says to use Defuddle when a user provides a standard web URL to read or analyze, and not for URLs ending in .md.
Operationally clear: the skill gives install guidance plus concrete commands for markdown extraction, file output, and metadata retrieval.
Good agent leverage: it explains the practical benefit of removing navigation, ads, and clutter to reduce token usage versus raw page fetching.

Cautions

Limited edge-case guidance: beyond excluding .md URLs, it does not explain handling failures, unsupported pages, auth walls, or dynamic sites.
Minimal supporting material: there are no scripts, references, or examples showing expected outputs, so adoption relies on the short SKILL.md alone.

Cli Npm Markdown Websites Documentation Blog Automation

Overview

Overview of defuddle skill

What the defuddle skill does

The defuddle skill turns a normal web page into clean, readable markdown with much less clutter than a raw fetch. It is built for pages like articles, docs, guides, blog posts, and other HTML pages where menus, ads, sidebars, and navigation waste tokens and distract analysis.

Best fit for Web Research

Use defuddle for Web Research when your real goal is to read, summarize, compare, quote, or analyze page content rather than inspect site chrome or raw HTML. The main value is cleaner input for downstream reasoning. If a user gives you a standard page URL and wants the content, defuddle usage is usually a better starting point than a generic web fetch.

Key limits and when not to use it

The biggest boundary is simple: do not use defuddle on URLs ending in .md. Those pages are already markdown, so a direct fetch is cleaner and avoids unnecessary transformation. It is also a weak fit when you need exact page structure, interactive elements, scripts, or full DOM fidelity.

Why users choose defuddle

The practical differentiator is not “can it fetch a page,” but “can it give me the main text in a token-efficient format quickly.” That makes the defuddle skill attractive for research pipelines, note capture, article summarization, and documentation reading where cleaner markdown materially improves output quality.

How to Use defuddle skill

defuddle install and basic command

For defuddle install, the repository points to the Defuddle CLI itself:

npm install -g defuddle

Core command:

defuddle parse <url> --md

Use --md consistently. That is the recommended output for most research and analysis workflows because it removes visual noise while preserving readable structure.

Inputs the defuddle skill needs

The defuddle skill needs a page URL and, ideally, a clear intent. Good input looks like:

the exact URL
what you need from it
whether you want full markdown, saved output, or only metadata

Examples:

“Read this article and summarize the main argument: <url>”
“Extract clean markdown from this docs page and save it to content.md: <url>”
“Get only the page title and description for <url>”

Useful commands:

defuddle parse <url> --md -o content.md
defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p domain

Turn a rough goal into a strong defuddle prompt

Weak request: “Look at this URL.”

Better defuddle guide prompt:

“Use defuddle on <url> with markdown output. Ignore site navigation. Then summarize the key points in 5 bullets and quote the most important section.”
“Use defuddle for this documentation page: <url>. Extract markdown, identify setup steps, prerequisites, and caveats, then rewrite them as a checklist.”
“Pull only metadata from <url> first. If the title and description match the topic, then extract full markdown.”

This works better because it tells the agent both how to call defuddle and what to do with the cleaned content afterward.

defuddle skill FAQ

Is defuddle better than an ordinary prompt plus fetch?

Usually yes for article-style pages. A normal fetch often includes headers, footers, cookie notices, and navigation. defuddle usage improves signal-to-noise before analysis starts, which can lower token cost and reduce summarization errors caused by irrelevant page elements.

When should I not use the defuddle skill?

Skip defuddle for .md URLs, raw files, or cases where you need exact HTML, embedded media behavior, page scripts, or layout details. It is a content-extraction tool, not a browser automation or DOM inspection tool.

Is the defuddle skill beginner-friendly?

Yes. The command surface is very small: install once, then use defuddle parse <url> --md. That makes the defuddle skill easy to adopt even if you only want cleaner source text for research or note capture.

What outputs can defuddle return?

You can get markdown with --md, JSON with --json, HTML by default, or specific metadata using -p <name>. For most reading and research tasks, markdown is the best default; metadata mode is useful for quick validation and routing.

How to Improve defuddle skill

Give defuddle a precise page target

The easiest way to improve defuddle results is to supply the canonical content page, not a homepage, search page, or listing page. Article URLs and single-doc pages produce cleaner markdown than hubs full of navigation and repeated links.

Ask for the downstream task in the same request

The defuddle skill is stronger when extraction is paired with a concrete next step. Instead of only saying “parse this,” ask for:

summary
key claims
setup steps
FAQs
quotes
comparison points

That reduces handoff ambiguity and helps the agent structure output around your real job-to-be-done.

Use metadata mode before full extraction when uncertain

If the URL may redirect, be low quality, or be the wrong page, start with:

defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p domain

This is a simple but effective defuddle guide tactic: validate relevance first, then spend effort on full markdown extraction.

Common failure modes and how to iterate

If output feels thin or oddly structured, the issue is often the source page, not the CLI. Try a more specific URL, switch from a category page to an article page, or save markdown to inspect it manually. If your first result is too broad, rerun defuddle for Web Research with a narrower instruction like “extract setup steps only” or “quote sections about authentication only.”

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

use-my-browser

by xixu-me

use-my-browser is a browser automation strategy skill for choosing the right web layer: public web tools, live Chrome, raw fetch, or Playwright for signed-in, dynamic, and DevTools-driven tasks.

Browser Automation

Favorites 0GitHub 6

web-access

by eze-is

web-access is a skill for live web work, combining search, page fetch, raw HTML inspection, and Chrome CDP browser automation for dynamic, login-gated, and interactive sites.

Browser Automation

Favorites 0GitHub 2.6k

perplexity

by softaworks

perplexity is a focused skill for Perplexity-powered web research in softaworks/agent-toolkit. It helps you choose Search vs Ask vs /research, start with low result limits, and avoid using web search for docs, workspace questions, or known URLs.

Web Research

Favorites 0GitHub 1.3k

producthunt

by ReScienceLab

producthunt is a Product Hunt skill for retrieving posts, topics, users, collections, and comments via the official GraphQL API. Install it from ReScienceLab/opc-skills, set `PRODUCTHUNT_ACCESS_TOKEN`, and run scripts like `get_posts.py` and `get_post.py` for launch research and Product Launches monitoring.

Product Launches

Favorites 0GitHub 654

firecrawl

by firecrawl

firecrawl skill for installing, authenticating, and using the official Firecrawl CLI for web scraping, search, crawling, and page interaction. Learn setup, `firecrawl --status`, login, safe file output to `.firecrawl/`, and practical usage patterns backed by the repo.

Web Scraping

Favorites 0GitHub 234

firecrawl-search

by firecrawl

firecrawl-search is a web research skill for finding sources, running structured search, and optionally scraping full page content as JSON with Firecrawl CLI.

Web Research

Favorites 0GitHub 234

tavily-search

by tavily-ai

tavily-search is a web research skill that uses the Tavily CLI to return structured search results for AI agents, including snippets, relevance signals, and metadata. It supports domain filters, time ranges, and advanced search depth for current source discovery and guided web research workflows.

Web Research

Favorites 0GitHub 184

requesthunt

by ReScienceLab

requesthunt helps you collect and analyze real user feedback from Reddit, X, and GitHub for demand research and competitive analysis. Set a REQUESTHUNT_API_KEY, run the Python scripts, scrape topics, search requests, and turn pain points, complaints, and feature requests into evidence-backed reports.

Competitive Analysis

Favorites 0GitHub 0

firecrawl-agent

by firecrawl

firecrawl-agent helps extract structured JSON from complex, multi-page websites. Learn when to use it, how to run the Firecrawl CLI agent, add schemas, set starting URLs, and save outputs for pricing, products, and directory-style data extraction.

Web Scraping

Favorites 0GitHub 234

firecrawl-map

by firecrawl

firecrawl-map helps agents discover and list URLs on a site, with options for search filtering, limits, JSON output, sitemap modes, and subdomain control before deeper scraping or crawling.

Web Scraping

Favorites 0GitHub 234

domain-hunter

by ReScienceLab

domain-hunter helps agents find available domains, verify availability, compare registrar pricing, review TLD tradeoffs, and choose where to buy with less guesswork.

Web Research

Favorites 0GitHub 0

fact-checker

by Shubhamsaboo

fact-checker is a prompt-driven skill for structured claim verification, source evaluation, and clear verdicts with confidence and context. Install it from Shubhamsaboo/awesome-llm-apps to fact check statements, rumors, statistics, and misleading claims with a repeatable workflow.

Fact Checking

Favorites 0GitHub 104.2k

deep-research

by Shubhamsaboo

deep-research is a lightweight agent skill for structured web research. It helps clarify scope, gather multiple sources, evaluate credibility, and synthesize cited findings from a single SKILL.md workflow.

Web Research

Favorites 0GitHub 104.2k

research

by MarsWang42

Structured deep-research workflow for complex topics. Learn how the research skill works, what it needs, and how to use its planning and execution flow effectively.

Academic Research

Favorites 0GitHub 690

firecrawl-scrape

by firecrawl

firecrawl-scrape helps extract clean, LLM-friendly content from known URLs, including JS-rendered pages. Use it to scrape markdown, links, or page-specific answers with Firecrawl CLI or npx firecrawl.

Web Scraping

Favorites 0GitHub 234

multi-search-engine

by openclaw

multi-search-engine is a web research skill with 17 search engines, advanced operators, time filters, privacy-focused options, and WolframAlpha queries. It helps agents build and run better search URLs without API keys.

Web Research

Favorites 0GitHub 3.8k

defuddle

Overview of defuddle skill

What the defuddle skill does

Best fit for Web Research

Key limits and when not to use it

Why users choose defuddle

How to Use defuddle skill

defuddle install and basic command

Inputs the defuddle skill needs

Turn a rough goal into a strong defuddle prompt

Suggested workflow and what to read first

defuddle skill FAQ

Is defuddle better than an ordinary prompt plus fetch?

When should I not use the defuddle skill?

Is the defuddle skill beginner-friendly?

What outputs can defuddle return?

How to Improve defuddle skill

Give defuddle a precise page target

Ask for the downstream task in the same request

Use metadata mode before full extraction when uncertain

Common failure modes and how to iterate

Ratings & Reviews