baoyu-youtube-transcript

by JimLiu

baoyu-youtube-transcript helps extract YouTube transcripts, subtitles, and cover images from a URL or video ID. It supports language selection, translation, markdown or SRT output, cached reformatting, and a fallback from InnerTube API to yt-dlp for more reliable transcript retrieval.

Stars13.2k

Favorites0

Comments0

AddedApr 5, 2026

CategoryFormat Conversion

Install Command

npx skills add JimLiu/baoyu-skills --skill baoyu-youtube-transcript

Curation Score

This skill scores 84/100, which means it is a solid directory listing candidate for users who need reliable YouTube transcript extraction with less guesswork than a generic prompt. The repository shows a real, runnable workflow with explicit triggers, CLI usage, fallback behavior, and tests, so an agent can likely invoke it correctly and produce transcripts, subtitles, or cover images with reasonable confidence.

84/100

Strengths

Strong triggerability: the description names concrete user intents and input patterns such as YouTube URLs, transcript/subtitle requests, and cover-image requests.
Good operational substance: SKILL.md documents usage and the repo includes a working TypeScript/Bun CLI plus 7 supporting scripts for fetching, parsing, caching, and formatting transcripts.
Meaningful agent leverage: it uses YouTube InnerTube directly, falls back to yt-dlp when blocked, supports language selection/translation, chapters, speaker-processing prompt, and caching for re-formatting.

Cautions

Install/runtime setup is only partially clear: SKILL.md notes Bun/npx requirements and runtime resolution, but there is no simple install command in the skill file.
Some advanced behavior still requires interpretation by the agent, especially around speaker identification and chapter processing, which are guided by a prompt rather than a tightly enforced end-to-end workflow.

Video Audio Translation Markdown Cli Bun TypeScript

Overview

Overview of baoyu-youtube-transcript skill

What baoyu-youtube-transcript does well

baoyu-youtube-transcript is a YouTube transcript extraction skill for people who need usable text files, not just captions on screen. It downloads transcripts, subtitles, and cover images from a YouTube URL or video ID, supports language selection and translation, and can reformat cached data into markdown or SRT without fetching again. Its biggest practical advantage is reliability: it uses YouTube’s InnerTube API first and falls back to yt-dlp when direct access is blocked.

Best-fit users and real job-to-be-done

This skill is best for researchers, note-takers, archivists, content repurposers, and agents doing Format Conversion from video into markdown, subtitle, or transcript assets. The real job is usually: “take this video, get the transcript in the language I need, keep timestamps or chapters if useful, and save it in a file structure I can reuse later.”

Key differentiators before you install

Compared with a generic “summarize this YouTube video” prompt, baoyu-youtube-transcript gives file-based outputs, caching, language-aware track selection, and a more deterministic extraction path. The repo also includes a speaker-processing prompt in prompts/speaker-transcript.md, which matters if your end goal is a cleaner editorial transcript rather than raw caption lines.

How to Use baoyu-youtube-transcript skill

Install context and runtime requirements

For baoyu-youtube-transcript install, you need either bun or npx available. The skill’s scripts are in skills/baoyu-youtube-transcript/scripts/, and SKILL.md explicitly resolves runtime as bun first, then npx -y bun. If you are evaluating before adoption, read these files first:

SKILL.md
scripts/main.ts
scripts/youtube.ts
prompts/speaker-transcript.md
scripts/main.test.ts

That path tells you the actual CLI behavior, fallback logic, and post-processing workflow faster than browsing the whole repo.

How baoyu-youtube-transcript usage works in practice

In normal baoyu-youtube-transcript usage, you call the main script with a YouTube URL or 11-character video ID. The script can:

fetch transcript tracks
prefer better subtitle formats such as json3
choose manual vs auto-generated captions
translate when available
output markdown or SRT
cache metadata and transcript payloads under an output directory

The input quality that matters most is not a long prompt; it is precise extraction intent. Good requests specify:

video URL or ID
preferred languages in order
whether generated captions are acceptable
desired output format: markdown or SRT
whether timestamps, chapters, or speakers are needed

A stronger request looks like: “Use baoyu-youtube-transcript on this YouTube URL, prefer en then zh-Hans, allow generated captions, output markdown with timestamps, and save under a reusable output directory.”

Prompting and workflow that reduce guesswork

If you are invoking this through an AI agent, turn a vague goal into an execution-ready instruction. For example:

Extraction: “Fetch the transcript for this video ID in en; if unavailable, use translated en from another track.”
Formatting: “Return markdown with timestamps for review.”
Enhancement: “Then use prompts/speaker-transcript.md to convert the raw transcript into a chaptered, speaker-labeled transcript without translating.”

This two-step workflow matters because speaker labeling is a separate processing task, not the same as raw caption download. The prompt file stresses verbatim fidelity and consistent speaker names, which is useful for interviews, podcasts, and lecture transcripts.

Output structure, caching, and practical tips

The baoyu-youtube-transcript skill stores metadata and transcript cache so repeated reformatting is faster. That is valuable when you want both raw and polished outputs from the same video. Practical tips:

Use a stable outputDir if you revisit videos often.
Keep raw transcript output before applying speaker cleanup.
Use SRT when timing precision matters; use markdown when readability matters.
If chapter extraction matters, check whether the video description contains timestamp chapters, because the scripts parse chapters from description plus duration.

baoyu-youtube-transcript skill FAQ

Is baoyu-youtube-transcript better than a normal prompt?

Yes, when you need reproducible extraction instead of best-effort reasoning. A normal prompt cannot reliably download subtitle tracks, inspect available languages, cache raw assets, or fall back to yt-dlp. baoyu-youtube-transcript is stronger when your task is acquisition and conversion, not just summarization.

When is this skill a poor fit?

It is a poor fit if there is no accessible transcript track and you expect full speech-to-text transcription from audio alone. This repo is built around YouTube transcript/subtitle retrieval, not a standalone ASR pipeline. It is also overkill if you only want a quick human summary and do not need saved files.

Is baoyu-youtube-transcript beginner-friendly?

Moderately. The skill is script-driven rather than click-driven, so basic comfort with bun, npx, paths, and output folders helps. The good news is the repo is implementation-heavy: scripts/main.test.ts shows selection logic, and SKILL.md gives the command patterns you need to start safely.

How to Improve baoyu-youtube-transcript skill

Give better inputs for better outputs

The fastest way to improve baoyu-youtube-transcript results is to be explicit about transcript selection. Mention language priority, whether manual captions should be preferred, and whether auto-generated captions are acceptable. If you skip this, you may get a usable but lower-quality track or an unexpected translated variant.

Handle common failure modes early

Common issues are invalid video identifiers, blocked direct fetches, missing target-language captions, and confusion between “translate subtitles” versus “summarize transcript.” If extraction fails, inspect scripts/youtube.ts behavior conceptually: the skill already has a fallback path, so your next move is usually changing language constraints or allowing generated captions, not rewriting the whole prompt.

Iterate after the first transcript

For baoyu-youtube-transcript for Format Conversion, the best workflow is iterative:

fetch raw transcript
verify language and completeness
re-run in a different format if needed
apply speaker/chapter post-processing

If the first markdown looks messy, do not discard the skill. Instead, keep the cached raw files and rerun formatting or apply prompts/speaker-transcript.md for a cleaner final document. That is where this skill becomes more valuable than a one-off download script.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

PDF Processing

Favorites 0GitHub 0

xlsx

by anthropics

The xlsx skill helps agents read, edit, repair, create, and convert .xlsx, .xlsm, .csv, and .tsv files when the required deliverable is a spreadsheet. It is strongest for template-preserving updates, formula-safe workbook edits, messy tabular cleanup, and practical spreadsheet workflows backed by repo scripts for packing, validation, and recalculation.

Spreadsheet Workflows

Favorites 0GitHub 105.1k

pdf

by anthropics

The pdf skill guides PDF Processing tasks like text extraction, merge and split operations, rendering pages to images, and PDF form workflows. It is especially useful for checking fillable fields, extracting form metadata, and validating non-fillable form layouts with scripts.

PDF Processing

Favorites 0GitHub 105.1k

baoyu-url-to-markdown

by JimLiu

baoyu-url-to-markdown converts live URLs to Markdown with a vendored baoyu-fetch CLI using Chrome CDP, site adapters, and generic fallback. Review Bun runtime needs, first-time EXTEND.md setup, and usage for X, YouTube, Hacker News, and rendered pages.

Format Conversion

Favorites 0GitHub 13.2k

pymatgen

by K-Dense-AI

pymatgen is a Python materials science toolkit for crystal structures, phase diagrams, electronic structure, and file conversion. This pymatgen skill helps with scientific workflows using CIF, POSCAR, VASP, and Materials Project data.

Scientific

Favorites 0GitHub 0

minimax-xlsx

by MiniMax-AI

The minimax-xlsx skill helps create, read, edit, validate, and format Excel workbooks with an Excel-first workflow. Use minimax-xlsx for Spreadsheet Workflows when you need structured files that preserve formulas, styles, sheet layout, and workbook behavior. It supports .xlsx, .xlsm, .csv, and .tsv tasks, including analysis, new workbook creation, minimal-invasive edits, formula repair, and validation. The minimax-xlsx guide is designed for real workbook handoff, not flat tables.

Spreadsheet Workflows

Favorites 0GitHub 0

baoyu-format-markdown

by JimLiu

baoyu-format-markdown formats plain text or messy Markdown into cleaner, publishable Markdown while preserving meaning. It repairs frontmatter, headings, lists, code blocks, quotes, and CJK spacing, making it useful for Format Conversion without rewriting content.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-danger-x-to-markdown

by JimLiu

baoyu-danger-x-to-markdown converts X posts, threads, and some articles into Markdown with YAML front matter. It uses scripts in `scripts/` with `bun` or `npx -y bun`, supports cookie-based access and consent flow, and fits repeatable Format Conversion workflows better than a generic prompt.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-markdown-to-html

by JimLiu

baoyu-markdown-to-html converts Markdown into styled HTML for WeChat-style publishing. It supports themes, code highlighting, math, PlantUML, footnotes, image handling, and optional link citations, with runtime execution through bun or npx -y bun.

Format Conversion

Favorites 0GitHub 13.2k

nutrient-document-processing

by affaan-m

nutrient-document-processing skill for PDF processing and document automation with the Nutrient DWS API. Convert, OCR, extract, redact, sign, watermark, and fill files like PDFs, DOCX, XLSX, PPTX, HTML, and images.

PDF Processing

Favorites 0GitHub 156.2k

speech-to-text

by NoizAI

The speech-to-text skill transcribes supported audio files into plain text, with options for timestamps, speaker labels, and JSON output. It is designed for practical speech-to-text usage in repeatable workflows, including interviews, meetings, podcasts, lectures, and automation tasks where consistent transcription matters.

Workflow Automation

Favorites 0GitHub 498

transcribe-video

by rameerez

The transcribe-video skill turns video or audio files into .srt, .vtt, and .txt outputs with AWS Transcribe. Use it for transcribe-video usage when you need captions, a searchable transcript, or a clean text version of spoken content. It also fits transcribe-video for Format Conversion workflows.

Format Conversion

Favorites 0GitHub 23

markitdown

by K-Dense-AI

markitdown converts files and office documents to Markdown for easier reading, chunking, search, and LLM workflows. This markitdown skill supports PDF, DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, ZIP, EPUB, images with OCR, and audio transcription, making it a practical markitdown guide for format conversion.

Format Conversion

Favorites 0GitHub 0

pdf

by openai

Use the pdf skill for PDF Processing tasks where layout, pagination, and rendered output matter. It helps you read, create, edit, and review PDFs with a visual-first workflow: render pages, inspect the result, then adjust. Use it when you need reliable PDF install, pdf usage, and a practical pdf guide for document accuracy.

PDF Processing

Favorites 0GitHub 0

web-to-markdown

by softaworks

web-to-markdown is a Format Conversion skill that turns live web pages into clean Markdown through the local web2md CLI, using a Chromium-family browser for JS-rendered pages, interactive flows, and batch URL conversion. It only runs when explicitly invoked by name.

Format Conversion

Favorites 0GitHub 1.3k

defuddle

by kepano

defuddle extracts clean markdown from web pages with the Defuddle CLI, removing clutter for research, docs, and articles. Use it for standard HTML pages, install with npm, and skip URLs ending in .md.

Web Research

Favorites 0GitHub 19.7k