transcribe-video

by rameerez

The transcribe-video skill turns video or audio files into .srt, .vtt, and .txt outputs with AWS Transcribe. Use it for transcribe-video usage when you need captions, a searchable transcript, or a clean text version of spoken content. It also fits transcribe-video for Format Conversion workflows.

Stars23

Favorites0

Comments0

AddedMay 9, 2026

CategoryFormat Conversion

Install Command

npx skills add rameerez/claude-code-startup-skills --skill transcribe-video

Curation Score

This skill scores 78/100, which means it is a solid listing candidate for directory users: it has a clear, real workflow for turning video or audio into SRT/VTT and plain text using AWS Transcribe, and the install decision is reasonably straightforward. Users should still expect some setup overhead because it depends on ffmpeg, AWS CLI, and configured AWS permissions.

78/100

Strengths

Explicit trigger and output contract: transcribes a video or audio file path with optional language code and produces .srt, .vtt, and .txt files.
Operational workflow is concrete: prerequisites, audio extraction, temporary S3 upload, AWS Transcribe job, result download, and cleanup are all described.
Good agent leverage from repo content: valid frontmatter, substantial body text, code fences, and file references reduce guesswork versus a generic prompt.

Cautions

Requires external setup and credentials: ffmpeg, AWS CLI, and permissions for s3:* and transcribe:* are mandatory.
No install command or supporting scripts/resources are provided, so users must follow the documented steps manually.

Aws Ffmpeg Transcription Video Audio

Overview

Overview of transcribe-video skill

What transcribe-video does

The transcribe-video skill turns a video or audio file into .srt, .vtt, and .txt outputs using AWS Transcribe. It is most useful when you need captions, a searchable transcript, or a clean text version of spoken content without manually transcribing it. The transcribe-video skill is a good fit if your workflow already includes AWS and you want a repeatable, file-based transcription process.

Who should use it

Use this skill if you work with recorded meetings, interviews, webinars, demos, or course videos and need transcripts fast enough to keep up with production. It is especially useful for people who care about subtitle formats, not just a text dump. If you need transcribe-video for Format Conversion, this skill helps convert raw media into caption and transcript artifacts that are easier to reuse downstream.

Main tradeoffs to know

The biggest advantage is that the workflow is concrete: extract audio, upload it, run a transcribe job, and clean up resources. That makes transcribe-video easier to operationalize than a vague “please transcribe this” prompt. The main limitation is dependency overhead: you need ffmpeg, the AWS CLI, and valid AWS permissions. If those are not already available, the install and setup cost may outweigh the benefit for one-off use.

How to Use transcribe-video skill

Install and readiness check

For transcribe-video install, add the skill with npx skills add rameerez/claude-code-startup-skills --skill transcribe-video. Before running it, confirm ffmpeg and aws are installed and that aws configure has valid credentials. The skill also needs permission to create and delete S3 buckets and start/delete Transcribe jobs, so locked-down AWS accounts can fail even when the command looks correct.

Give the skill a usable input

The skill expects a media file path and optionally a language code such as en-US or es-ES. A weak request is “transcribe this video”; a stronger request is transcribe-video /path/to/demo.mp4 en-US or “Transcribe /work/interview.mp4 to SRT, VTT, and TXT in English, then clean up temp AWS resources.” If language is known, include it. If the file is noisy, long, or multi-speaker, say so up front because those conditions affect accuracy more than the command syntax does.

Recommended workflow

Start by reading SKILL.md, then inspect the repository file paths it references, especially README.md, AGENTS.md, metadata.json, and any rules/, resources/, or references/ folders if they exist in your local setup. In this repo, the source is intentionally compact, so the real value is understanding the process: audio extraction, temporary S3 upload, Transcribe job execution, output download, and cleanup. That sequence matters because failures usually happen at permissions, file naming, or cleanup rather than transcription itself.

Tips that improve output quality

Use a source file with the best available audio track. If the video has multiple audio streams, bad compression, or background music, fix that before transcribing. Prefer explicit filenames and output expectations when you prompt the skill, such as “preserve the base filename” or “I need both subtitle formats and a plain text transcript for editing.” If you want transcribe-video usage to be predictable, ask for the language code, output location, and whether you want the transcript optimized for captions or reading.

transcribe-video skill FAQ

Is this better than a generic prompt?

Usually yes, if you want a repeatable transcription workflow instead of a one-off response. A generic prompt can ask for a transcript, but it will not reliably handle the AWS Transcribe setup, audio extraction, temporary bucket creation, and cleanup steps. The transcribe-video skill is more useful when the job needs files, formats, and operational discipline.

Do I need AWS to use it?

Yes. This skill depends on AWS Transcribe and S3, so it is not a local-only transcription tool. If you cannot use AWS credentials or do not want to manage cloud permissions, this is probably not the right skill. In that case, a local speech-to-text tool may be a better fit.

Is it beginner-friendly?

It is beginner-friendly only if you are comfortable installing command-line tools and granting AWS permissions. The transcription workflow itself is straightforward, but setup can block first use. Beginners usually do best when they copy the repo’s expected file path and language-code pattern exactly, then adjust only one variable at a time.

When should I not use transcribe-video?

Do not use it for tiny, disposable tasks if you do not already have AWS configured. Also avoid it when you need offline processing, custom diarization logic, or deep editorial cleanup beyond basic transcript generation. If your goal is only to summarize spoken content, this skill is more infrastructure than you may need.

How to Improve transcribe-video skill

Provide stronger source context

The best results come from telling the skill what the file is and what matters most in the output. For example: “This is a 42-minute product demo with one speaker and clear audio; generate accurate English captions and a readable transcript.” That is better than a bare path because it helps the workflow prioritize language, formatting, and likely failure points.

Reduce avoidable transcription errors

If the audio is muddy, mixed with music, or captured from a noisy room, improve the source before running the skill. If the video includes multiple languages, say which language should be transcribed. If the main goal is subtitles, mention that explicitly so the output is judged by timing and readability instead of only raw text accuracy. These details matter more than asking for “better quality” in the abstract.

Iterate after the first output

Review the .srt, .vtt, and .txt outputs separately. Captions may be technically correct but too long for display, while the text transcript may need punctuation or speaker cleanup for notes. If the first pass is close but imperfect, rerun transcribe-video with a clearer language code, a better audio source, or a narrower output goal rather than trying to fix everything in post.

Watch the common failure modes

The most common issues are missing ffmpeg, AWS CLI misconfiguration, insufficient IAM permissions, and accidental retention of temporary AWS resources. If a run fails, check prerequisites first, then permissions, then the exact file path. For transcribe-video, successful use is less about prompt cleverness and more about providing a valid media file, the right language hint, and an AWS environment that can complete the job end to end.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

kreuzberg

by kreuzberg-dev

The kreuzberg skill helps you install and use Kreuzberg for document extraction across 91+ formats, including PDFs, Office files, images, HTML, email, and archives. It covers Python, Node.js/TypeScript, Rust, and CLI workflows for OCR, tables, metadata, batch processing, and practical parsing guidance.

PDF Processing

Favorites 0GitHub 0

xlsx

by anthropics

The xlsx skill helps agents read, edit, repair, create, and convert .xlsx, .xlsm, .csv, and .tsv files when the required deliverable is a spreadsheet. It is strongest for template-preserving updates, formula-safe workbook edits, messy tabular cleanup, and practical spreadsheet workflows backed by repo scripts for packing, validation, and recalculation.

Spreadsheet Workflows

Favorites 0GitHub 105.1k

pdf

by anthropics

The pdf skill guides PDF Processing tasks like text extraction, merge and split operations, rendering pages to images, and PDF form workflows. It is especially useful for checking fillable fields, extracting form metadata, and validating non-fillable form layouts with scripts.

PDF Processing

Favorites 0GitHub 105.1k

baoyu-youtube-transcript

by JimLiu

baoyu-youtube-transcript helps extract YouTube transcripts, subtitles, and cover images from a URL or video ID. It supports language selection, translation, markdown or SRT output, cached reformatting, and a fallback from InnerTube API to yt-dlp for more reliable transcript retrieval.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-url-to-markdown

by JimLiu

baoyu-url-to-markdown converts live URLs to Markdown with a vendored baoyu-fetch CLI using Chrome CDP, site adapters, and generic fallback. Review Bun runtime needs, first-time EXTEND.md setup, and usage for X, YouTube, Hacker News, and rendered pages.

Format Conversion

Favorites 0GitHub 13.2k

pymatgen

by K-Dense-AI

pymatgen is a Python materials science toolkit for crystal structures, phase diagrams, electronic structure, and file conversion. This pymatgen skill helps with scientific workflows using CIF, POSCAR, VASP, and Materials Project data.

Scientific

Favorites 0GitHub 0

minimax-xlsx

by MiniMax-AI

The minimax-xlsx skill helps create, read, edit, validate, and format Excel workbooks with an Excel-first workflow. Use minimax-xlsx for Spreadsheet Workflows when you need structured files that preserve formulas, styles, sheet layout, and workbook behavior. It supports .xlsx, .xlsm, .csv, and .tsv tasks, including analysis, new workbook creation, minimal-invasive edits, formula repair, and validation. The minimax-xlsx guide is designed for real workbook handoff, not flat tables.

Spreadsheet Workflows

Favorites 0GitHub 0

baoyu-format-markdown

by JimLiu

baoyu-format-markdown formats plain text or messy Markdown into cleaner, publishable Markdown while preserving meaning. It repairs frontmatter, headings, lists, code blocks, quotes, and CJK spacing, making it useful for Format Conversion without rewriting content.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-danger-x-to-markdown

by JimLiu

baoyu-danger-x-to-markdown converts X posts, threads, and some articles into Markdown with YAML front matter. It uses scripts in `scripts/` with `bun` or `npx -y bun`, supports cookie-based access and consent flow, and fits repeatable Format Conversion workflows better than a generic prompt.

Format Conversion

Favorites 0GitHub 13.2k

baoyu-markdown-to-html

by JimLiu

baoyu-markdown-to-html converts Markdown into styled HTML for WeChat-style publishing. It supports themes, code highlighting, math, PlantUML, footnotes, image handling, and optional link citations, with runtime execution through bun or npx -y bun.

Format Conversion

Favorites 0GitHub 13.2k

nutrient-document-processing

by affaan-m

nutrient-document-processing skill for PDF processing and document automation with the Nutrient DWS API. Convert, OCR, extract, redact, sign, watermark, and fill files like PDFs, DOCX, XLSX, PPTX, HTML, and images.

PDF Processing

Favorites 0GitHub 156.2k

speech-to-text

by NoizAI

The speech-to-text skill transcribes supported audio files into plain text, with options for timestamps, speaker labels, and JSON output. It is designed for practical speech-to-text usage in repeatable workflows, including interviews, meetings, podcasts, lectures, and automation tasks where consistent transcription matters.

Workflow Automation

Favorites 0GitHub 498

markitdown

by K-Dense-AI

markitdown converts files and office documents to Markdown for easier reading, chunking, search, and LLM workflows. This markitdown skill supports PDF, DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, ZIP, EPUB, images with OCR, and audio transcription, making it a practical markitdown guide for format conversion.

Format Conversion

Favorites 0GitHub 0

pdf

by openai

Use the pdf skill for PDF Processing tasks where layout, pagination, and rendered output matter. It helps you read, create, edit, and review PDFs with a visual-first workflow: render pages, inspect the result, then adjust. Use it when you need reliable PDF install, pdf usage, and a practical pdf guide for document accuracy.

PDF Processing

Favorites 0GitHub 0

web-to-markdown

by softaworks

web-to-markdown is a Format Conversion skill that turns live web pages into clean Markdown through the local web2md CLI, using a Chromium-family browser for JS-rendered pages, interactive flows, and batch URL conversion. It only runs when explicitly invoked by name.

Format Conversion

Favorites 0GitHub 1.3k

defuddle

by kepano

defuddle extracts clean markdown from web pages with the Defuddle CLI, removing clutter for research, docs, and articles. Use it for standard HTML pages, install with npm, and skip URLs ending in .md.

Web Research

Favorites 0GitHub 19.7k