web-to-markdown

by softaworks

web-to-markdown is a Format Conversion skill that turns live web pages into clean Markdown through the local web2md CLI, using a Chromium-family browser for JS-rendered pages, interactive flows, and batch URL conversion. It only runs when explicitly invoked by name.

Stars1.3k

Favorites0

Comments0

AddedApr 1, 2026

CategoryFormat Conversion

Install Command

npx skills add softaworks/agent-toolkit --skill web-to-markdown

Curation Score

This skill scores 77/100, which means it is a solid directory listing candidate for users who specifically want webpage-to-Markdown conversion via a local browser-driven CLI. It is clear enough for an agent to follow with less guesswork than a generic prompt, but install-decision clarity is held back by missing setup specifics in the skill itself and its dependence on an external local tool/browser environment.

77/100

Strengths

Strong operational framing: the skill clearly states what it does, what it will not do, and which inputs to collect before running.
Real agent leverage over a generic prompt: it targets JS-rendered pages through a local browser stack and documents practical flags like `--print`, `--out`, `--chrome-path`, and `--interactive`.
Repository evidence is substantive rather than placeholder content, with both SKILL.md and README explaining purpose, workflow, and usage constraints.

Cautions

Adoption is less turnkey because SKILL.md has no install command and the skill depends on a locally available `web2md` CLI plus a Chromium-family browser.
The hard trigger gate requires the user to explicitly name `web-to-markdown`, which improves safety but makes the skill less naturally triggerable from ordinary web-extraction requests.

Cli Scraping Chrome Websites Markdown

Overview

Overview of web-to-markdown skill

web-to-markdown is a narrowly scoped Format Conversion skill for turning live web pages into clean Markdown through a locally installed web2md CLI. Its value is not “summarize a page” but “render the actual page in a real browser, extract the main article or document body, and convert that result into portable Markdown.” That makes it a strong fit for users dealing with JavaScript-rendered pages, documentation sites, blog posts, gated flows that need interactive rendering, or archiving tasks where simple HTTP fetching is not enough.

Who web-to-markdown is best for

This web-to-markdown skill is best for users who need to:

convert one or more URLs into readable Markdown
handle pages that depend on client-side JavaScript
save content to files for later analysis or reuse
extract article-like content instead of scraping every page element

If your real goal is “get the main content from a page I can already access in a browser,” this skill is a better fit than a generic prompt.

What makes web-to-markdown different

The important differentiator is the pipeline:

Puppeteer via a local Chromium-family browser
Readability for main-content extraction
Turndown for Markdown conversion

That combination is designed for rendered content, not raw HTML. In practice, that means the web-to-markdown skill can work on pages where ordinary fetch-based tools fail or return incomplete content.

The hard trigger gate matters

This skill has an unusual but important constraint: it must only be used when the user explicitly requests it by name, with wording like use the skill web-to-markdown. If that explicit trigger is missing, the skill should not be applied. For directory users, this means adoption is simple, but invocation discipline matters.

Real job-to-be-done

Most users are not looking for “a browser automation skill.” They want one of these outcomes:

“Turn this article into Markdown I can keep.”
“Convert this docs page, even though it renders client-side.”
“Process a batch of URLs into .md files.”
“Open the page in a real browser so I can get past login or verification, then save the content.”

That is the real use case web-to-markdown is optimized for.

When not to choose this skill

Skip web-to-markdown if:

you only need a quick summary, not Markdown output
a plain HTTP fetch already gives you the content cleanly
you need a full crawler or site scraper
you want Playwright-based automation; this skill explicitly uses web2md, not other browser stacks

How to Use web-to-markdown skill

Install context before first use

Treat web-to-markdown as two dependencies:

the skill itself in your agent environment
a working local web2md CLI plus an available Chromium-family browser

A practical skill install path is:

npx skills add softaworks/agent-toolkit --skill web-to-markdown

The repository is at:
https://github.com/softaworks/agent-toolkit/tree/main/skills/web-to-markdown

Just adding the skill is not enough if your machine cannot run web2md or launch Chrome/Chromium/Brave/Edge. That local browser requirement is the main adoption blocker to check early.

Read these files first

This skill is small, so the best reading order is:

skills/web-to-markdown/SKILL.md
skills/web-to-markdown/README.md

SKILL.md gives you the trigger rule, required inputs, and workflow shape. README.md is where you confirm intended use cases such as JS-rendered pages, interactive mode, and batch conversion.

What input web-to-markdown needs

For reliable web-to-markdown usage, provide:

a url or list of URLs
output mode:
- print to stdout with --print
- write to a file with --out ./file.md
- write to a directory with --out ./some-dir/
optional browser controls when needed:
- --chrome-path <path> if browser detection fails
- --interactive for login walls, consent screens, or human verification

If you do not specify output behavior, the agent has to guess. That is unnecessary friction and often the easiest thing to make explicit.

The exact invocation requirement

This web-to-markdown skill should only be triggered when the user explicitly writes something like:

use the skill web-to-markdown ...
use a skill web-to-markdown ...

If you are testing the skill, say the name directly. This is not optional repository etiquette; it is core execution logic.

Turn a rough request into a strong prompt

Weak request:

convert this page

Strong request:

use the skill web-to-markdown to convert https://example.com/article to Markdown and save it to ./notes/article.md

Even better:

use the skill web-to-markdown to convert these 5 docs URLs to Markdown, save them in ./docs-md/, and use interactive mode if a consent screen appears

Good prompts reduce failure by telling the skill:

what page(s) to process
where output should go
whether browser interaction may be needed
whether this is a one-off or a batch job

Practical command patterns to ask for

Useful web-to-markdown usage patterns include:

single page to terminal: --print
single page to file: --out ./page.md
many pages to a folder: --out ./pages/
difficult page with visible browser: --interactive
explicit browser binary path: --chrome-path <path>

The repository guidance makes these patterns more valuable than open-ended requests like “scrape this site,” which are broader than the skill’s design.

Best workflow for one page

A high-success workflow looks like this:

confirm the user explicitly invoked web-to-markdown
collect the URL
decide whether output should print or save
use --interactive only for pages that need human help
review the Markdown result for missing sections or navigation noise
rerun with better browser settings if extraction was incomplete

This is faster than trying to overdesign the prompt up front.

Best workflow for multiple URLs

For batch work:

give the skill a list of URLs
choose a directory output target
expect filenames to be derived from page titles when saving to a folder
spot-check a few outputs before running a large batch

The main reason to batch is consistency. The main risk is assuming every page template on a site extracts equally well.

Common local setup blockers

Most failed web-to-markdown installs are not prompt problems. They are local environment issues:

web2md is not installed or not on PATH
no supported browser is available locally
browser auto-detection fails, requiring --chrome-path
the page needs a visible browser and human interaction

If you want a quick adoption test, try one public article page and one JS-heavy page before using the skill in production workflows.

Output quality expectations

web-to-markdown aims for clean main-content Markdown, not a pixel-perfect copy of the original page. That means:

article and documentation body content should come through well
headers, footers, ads, and page chrome are usually de-emphasized
unusual widgets, app shells, and embedded tools may not convert neatly

That tradeoff is usually desirable for archiving and analysis, but it is worth knowing before you install.

web-to-markdown skill FAQ

Is web-to-markdown better than an ordinary prompt?

Yes, when the real need is rendered-page conversion. A generic prompt can discuss a URL, but it does not inherently open a browser, wait for JavaScript, extract the readable body, and produce Markdown. This web-to-markdown skill is useful because it operationalizes that workflow.

Is web-to-markdown good for beginners?

Yes, if your task is simple: one URL, one output file, straightforward page. The main beginner challenge is local setup, not the skill design. If you can run a local browser automation CLI, the skill is approachable.

Does web-to-markdown handle JavaScript-heavy pages?

That is one of its main reasons to exist. It uses a real local browser through Puppeteer, so it is more suitable for JS-rendered pages than raw-fetch approaches.

Sometimes, with --interactive. The repository explicitly supports a mode where Chrome is shown and paused so the user can complete human steps. This is a practical advantage for protected or semi-protected pages.

When should I not use the web-to-markdown skill?

Do not use it when:

the user did not explicitly request web-to-markdown
a simple page fetch would already solve the task
you need structured scraping across many page components
you want a non-browser conversion path

The skill is specialized, and that specialization is a strength, not a weakness.

Does it work with any browser?

The documented fit is Chromium-family browsers such as Chrome, Chromium, Brave, or Edge via puppeteer-core. If auto-detection fails, expect to supply a path manually.

Is this only for articles?

No. Articles are the easiest fit, but the web-to-markdown skill can also help with docs pages and other content-heavy pages where “main body extraction” is the right output model. It is less ideal for dashboards or highly interactive apps.

How to Improve web-to-markdown skill

Give web-to-markdown explicit output instructions

A better request is not just “convert this URL,” but:

print it
save it to ./tmp/page.md
save all results under ./exports/

This removes guesswork and makes the first run more likely to match your workflow.

Use interactive mode only when the page needs it

--interactive is valuable for consent gates, login flows, and verification prompts, but it is slower and less automatable. For routine public pages, avoid it. For blocked pages, use it early instead of retrying blind.

Test browser detection early

If the first run fails to launch a browser, do not keep changing the prompt. Fix the execution context:

confirm a Chromium-family browser exists
provide --chrome-path <path> when needed

For many users, this is the single most important web-to-markdown install tip.

Choose representative pages before a big rollout

Before converting hundreds of URLs, test:

one simple article page
one JS-rendered page
one page behind consent or login friction

This tells you whether the skill is a fit for your actual site mix, not just for ideal cases.

Strengthen prompts with page-specific constraints

If you know a page is tricky, say so:

use the skill web-to-markdown on this docs page; it renders client-side, save to ./docs/intro.md
use the skill web-to-markdown on this member page with interactive mode because I need to pass a verification screen first

That extra context changes execution quality more than adding generic wording.

Validate the first Markdown result, then iterate

After the first output, check:

was the main content captured?
did the output include too much nav or boilerplate?
was the page only partially rendered?
did the filename or folder behavior match expectations?

Then rerun with better controls. web-to-markdown usually improves through one targeted retry, not through long speculative prompting.

Know the main failure modes

Common failure modes are:

no explicit trigger phrase, so the skill should not run
local browser launch issues
pages that need visible interaction
pages whose “main content” is ambiguous to Readability
users expecting full-site scraping instead of page conversion

Recognizing these early helps you decide whether to keep using web-to-markdown or switch tools.

Use web-to-markdown for the right output standard

You will get the best results when your success criterion is:

clean, readable Markdown
main content over page chrome
portable output for notes, archives, analysis, or downstream AI processing

If your success criterion is “preserve every layout detail,” this skill is the wrong tool. Matching your expectation to its design is the fastest way to improve results.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

PDF Processing

Favorites 0GitHub 0

xlsx

by anthropics

The xlsx skill helps agents read, edit, repair, create, and convert .xlsx, .xlsm, .csv, and .tsv files when the required deliverable is a spreadsheet. It is strongest for template-preserving updates, formula-safe workbook edits, messy tabular cleanup, and practical spreadsheet workflows backed by repo scripts for packing, validation, and recalculation.

Spreadsheet Workflows

Favorites 0GitHub 105.1k

pdf

by anthropics

The pdf skill guides PDF Processing tasks like text extraction, merge and split operations, rendering pages to images, and PDF form workflows. It is especially useful for checking fillable fields, extracting form metadata, and validating non-fillable form layouts with scripts.

PDF Processing

Favorites 0GitHub 105.1k

baoyu-youtube-transcript

by JimLiu

baoyu-youtube-transcript helps extract YouTube transcripts, subtitles, and cover images from a URL or video ID. It supports language selection, translation, markdown or SRT output, cached reformatting, and a fallback from InnerTube API to yt-dlp for more reliable transcript retrieval.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-url-to-markdown

by JimLiu

baoyu-url-to-markdown converts live URLs to Markdown with a vendored baoyu-fetch CLI using Chrome CDP, site adapters, and generic fallback. Review Bun runtime needs, first-time EXTEND.md setup, and usage for X, YouTube, Hacker News, and rendered pages.

Format Conversion

Favorites 0GitHub 13.2k

pymatgen

by K-Dense-AI

pymatgen is a Python materials science toolkit for crystal structures, phase diagrams, electronic structure, and file conversion. This pymatgen skill helps with scientific workflows using CIF, POSCAR, VASP, and Materials Project data.

Scientific

Favorites 0GitHub 0

minimax-xlsx

by MiniMax-AI

The minimax-xlsx skill helps create, read, edit, validate, and format Excel workbooks with an Excel-first workflow. Use minimax-xlsx for Spreadsheet Workflows when you need structured files that preserve formulas, styles, sheet layout, and workbook behavior. It supports .xlsx, .xlsm, .csv, and .tsv tasks, including analysis, new workbook creation, minimal-invasive edits, formula repair, and validation. The minimax-xlsx guide is designed for real workbook handoff, not flat tables.

Spreadsheet Workflows

Favorites 0GitHub 0

baoyu-format-markdown

by JimLiu

baoyu-format-markdown formats plain text or messy Markdown into cleaner, publishable Markdown while preserving meaning. It repairs frontmatter, headings, lists, code blocks, quotes, and CJK spacing, making it useful for Format Conversion without rewriting content.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-danger-x-to-markdown

by JimLiu

baoyu-danger-x-to-markdown converts X posts, threads, and some articles into Markdown with YAML front matter. It uses scripts in `scripts/` with `bun` or `npx -y bun`, supports cookie-based access and consent flow, and fits repeatable Format Conversion workflows better than a generic prompt.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-markdown-to-html

by JimLiu

baoyu-markdown-to-html converts Markdown into styled HTML for WeChat-style publishing. It supports themes, code highlighting, math, PlantUML, footnotes, image handling, and optional link citations, with runtime execution through bun or npx -y bun.

Format Conversion

Favorites 0GitHub 13.2k

nutrient-document-processing

by affaan-m

nutrient-document-processing skill for PDF processing and document automation with the Nutrient DWS API. Convert, OCR, extract, redact, sign, watermark, and fill files like PDFs, DOCX, XLSX, PPTX, HTML, and images.

PDF Processing

Favorites 0GitHub 156.2k

speech-to-text

by NoizAI

The speech-to-text skill transcribes supported audio files into plain text, with options for timestamps, speaker labels, and JSON output. It is designed for practical speech-to-text usage in repeatable workflows, including interviews, meetings, podcasts, lectures, and automation tasks where consistent transcription matters.

Workflow Automation

Favorites 0GitHub 498

transcribe-video

by rameerez

The transcribe-video skill turns video or audio files into .srt, .vtt, and .txt outputs with AWS Transcribe. Use it for transcribe-video usage when you need captions, a searchable transcript, or a clean text version of spoken content. It also fits transcribe-video for Format Conversion workflows.

Format Conversion

Favorites 0GitHub 23

markitdown

by K-Dense-AI

markitdown converts files and office documents to Markdown for easier reading, chunking, search, and LLM workflows. This markitdown skill supports PDF, DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, ZIP, EPUB, images with OCR, and audio transcription, making it a practical markitdown guide for format conversion.

Format Conversion

Favorites 0GitHub 0

pdf

by openai

Use the pdf skill for PDF Processing tasks where layout, pagination, and rendered output matter. It helps you read, create, edit, and review PDFs with a visual-first workflow: render pages, inspect the result, then adjust. Use it when you need reliable PDF install, pdf usage, and a practical pdf guide for document accuracy.

PDF Processing

Favorites 0GitHub 0

defuddle

by kepano

defuddle extracts clean markdown from web pages with the Defuddle CLI, removing clutter for research, docs, and articles. Use it for standard HTML pages, install with npm, and skip URLs ending in .md.

Web Research

Favorites 0GitHub 19.7k

web-to-markdown

Overview of web-to-markdown skill

Who web-to-markdown is best for

What makes web-to-markdown different

The hard trigger gate matters

Real job-to-be-done

When not to choose this skill

How to Use web-to-markdown skill

Install context before first use

Read these files first

What input web-to-markdown needs

The exact invocation requirement

Turn a rough request into a strong prompt

Practical command patterns to ask for

Best workflow for one page

Best workflow for multiple URLs

Common local setup blockers

Output quality expectations

web-to-markdown skill FAQ

Is web-to-markdown better than an ordinary prompt?

Is web-to-markdown good for beginners?

Does web-to-markdown handle JavaScript-heavy pages?

Can web-to-markdown get past login or verification screens?

When should I not use the web-to-markdown skill?

Does it work with any browser?

Is this only for articles?

How to Improve web-to-markdown skill

Give web-to-markdown explicit output instructions

Use interactive mode only when the page needs it

Test browser detection early

Choose representative pages before a big rollout

Strengthen prompts with page-specific constraints

Validate the first Markdown result, then iterate

Know the main failure modes

Use web-to-markdown for the right output standard

Ratings & Reviews