O

transcribe

by openai

transcribe turns audio or video into text with optional diarization and known-speaker hints. It is well suited for Technical Writing, meeting notes, interviews, lectures, and content ops when you need a repeatable transcribe skill with clear output formats and less guesswork than a generic prompt.

Stars18.8k
Favorites0
Comments0
AddedMay 11, 2026
CategoryTechnical Writing
Install Command
npx skills add openai/skills --skill transcribe
Curation Score

This skill scores 74/100, which means it is a credible install candidate for directory users: it has a clear transcription use case, a bundled CLI, and enough operational guidance to reduce guesswork versus a generic prompt. It is still somewhat limited because the repository evidence points to a focused audio-transcription workflow rather than a broadly documented end-to-end package.

74/100
Strengths
  • Explicit triggerability for audio/video transcription, speaker labeling, and interview/meeting use cases in SKILL.md.
  • Bundled script and quick reference document the key operating constraints: response formats, chunking strategy, max file size, and known-speaker limits.
  • Operational workflow is concrete: check API key, run the CLI, validate output, and save results in a standard output path.
Cautions
  • The skill is narrow in scope and centered on one transcription workflow, so users needing broader media-processing behavior will need something else.
  • The install path is not fully self-serve in the evidence shown: SKILL.md mentions dependencies, but the excerpt does not show a complete install command or full quick-start example.
Overview

Overview of transcribe skill

What the transcribe skill does

The transcribe skill turns audio or video into text using OpenAI, with optional speaker diarization and known-speaker hints. It is a good fit when you need a reliable transcribe result from recordings, interviews, meetings, lectures, or short video clips, especially when speaker labels matter.

Who should use it

Use this transcribe skill if you want a repeatable workflow rather than a one-off prompt. It is especially useful for Technical Writing, meeting notes, content ops, research interviews, and anyone who needs clean text plus traceable speaker structure.

Why this skill is different

The main advantage is operational clarity: it prefers a bundled CLI, has explicit decision rules for model and output format, and supports diarized output when requested. That makes transcribe easier to run consistently than a generic “please transcribe this” prompt, especially when you care about repeatability and output shape.

How to Use transcribe skill

Install the transcribe skill

Install with npx skills add openai/skills --skill transcribe. If you are using the repository directly, start from skills/.curated/transcribe and keep the bundled workflow intact unless your environment requires a change.

Prepare the right input for transcribe usage

For best transcribe usage, provide:

  • the audio or video file path
  • the desired response format: text, json, or diarized_json
  • an optional language hint
  • known speaker references if you need diarization

A strong prompt looks like: “Transcribe this 18-minute interview, return diarized_json, and label the host and two guests if possible.” That is better than asking for “a transcript” because it tells the skill what output structure and speaker context to optimize for.

Read these files first

Start with SKILL.md, then check references/api.md for format limits and diarization rules. If you are extending or automating the flow, inspect scripts/transcribe_diarize.py and agents/openai.yaml for the default model, CLI behavior, and prompt entrypoint.

Practical workflow tips

Use gpt-4o-mini-transcribe for fast plain transcription, and switch to gpt-4o-transcribe-diarize when speaker labels are important. Keep chunking_strategy on auto for audio longer than about 30 seconds. Make sure OPENAI_API_KEY is set locally before you run; this skill expects a configured environment rather than pasted secrets.

transcribe skill FAQ

Is transcribe good for Technical Writing?

Yes. The transcribe skill is a strong fit for Technical Writing when you need source audio converted into editable text for docs, interviews, or content cleanup. It is less about creative rewriting and more about turning speech into dependable structured text.

When should I not use transcribe?

Do not use transcribe if you only need a rough summary with no transcript, or if your file is too large for the supported request limits without splitting. It is also a poor fit if you want heavy paraphrasing instead of literal speech conversion.

How is this different from a normal prompt?

A normal prompt can ask for transcription, but this transcribe skill adds a reproducible workflow, a preferred CLI, explicit response-format choices, and diarization guidance. That reduces guesswork when you need consistent output across multiple files.

Is transcribe beginner-friendly?

Yes, if you can identify the file and desired output. Beginners usually only need to choose between plain text and diarized output. The main blocker is environment setup, so verify OPENAI_API_KEY first.

How to Improve transcribe skill

Give transcribe better source context

The biggest quality gain usually comes from better inputs, not more prompting. For example, say whether the audio is a podcast, call recording, or lecture; whether there are overlapping speakers; and whether you want verbatim text or cleaned transcript output. That helps transcribe choose a more suitable path.

Use speaker hints when diarization matters

If you know the speaker names, include them as references instead of expecting the model to infer everything from audio alone. This is especially important for transcribe when one person sounds similar to another or when the recording has multiple guests. Known speakers improve label consistency, but only if the references are accurate.

Iterate with one change at a time

If the first transcribe output is weak, change one variable: model, chunking, response format, or speaker hints. Avoid rewriting the whole request at once. For example, if labels are wrong, keep the transcript goal the same and only add speaker references or switch to diarized JSON.

Watch for common failure modes

The most common issues are missing API keys, unsupported file handling, vague output requests, and asking for diarization without usable speaker context. If you are building a transcribe guide for a workflow, document the file types you expect, the preferred output format, and the fallback when the recording is noisy or too long.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...