speech

by openai

Use the speech skill to turn text into spoken audio for narration, voiceover, IVR prompts, accessibility reads, and batch speech generation. It uses the OpenAI Audio API with built-in voices, a bundled CLI, and `OPENAI_API_KEY` for live runs. Custom voice creation is out of scope.

Stars0

Favorites0

Comments0

AddedMay 8, 2026

CategoryDesign Implementation

Install Command

npx skills add openai/skills --skill speech

Curation Score

This skill scores 88/100, which means it is a solid directory listing with good practical value for agents. Users should expect a clearly triggerable speech-generation workflow that is more actionable than a generic prompt, with enough CLI and reference detail to support real installs, though it still depends on network access and the OpenAI API for live output.

88/100

Strengths

Strong triggerability: the frontmatter explicitly scopes use cases like text-to-speech narration, voiceover, accessibility reads, and batch speech generation.
Operationally clear: SKILL.md provides a decision tree for single vs. batch and a step-by-step workflow, backed by a bundled CLI reference.
Good agent leverage: supporting references cover voices, audio API parameters, accessibility defaults, and batch usage, reducing guesswork for execution.

Cautions

Live generation requires `OPENAI_API_KEY` and network access, so it is not fully self-contained for offline use.
Custom voice creation is out of scope, so users needing bespoke voices or advanced audio workflows will need something else.

Audio Accessibility Anthropic OpenAI Cli

Overview

Overview of speech skill

What the speech skill does

The speech skill turns text into spoken audio for narration, voiceover, IVR prompts, accessibility reads, and batch speech generation. It is best when you need reproducible audio output from a prompt, not a freeform “make it sound nice” request.

Who should use it

Use speech if you need the speech install to fit a real workflow: product demos, app onboarding, accessibility assets, or many short clips from structured text. It is a strong match when you care about voice choice, pacing, output format, and consistent generation across runs.

What makes it different

The speech guide is built around the OpenAI Audio API and the bundled CLI, so it favors deterministic use over ad hoc prompting. It uses built-in voices, supports single or batch jobs, and expects OPENAI_API_KEY for live generation. Custom voice creation is out of scope.

How to Use speech skill

Install and locate the workflow

Install with npx skills add openai/skills --skill speech. After that, read SKILL.md first, then references/cli.md for command details, references/audio-api.md for model and parameter limits, and references/prompting.md or references/voice-directions.md for better instruction writing. For quick context, check agents/openai.yaml and references/sample-prompts.md.

Turn a rough goal into a usable prompt

The speech usage pattern works best when you give the skill the exact text to read, the target voice, the delivery style, output format, and any pronunciation constraints. A strong request looks like: “Generate a 45-second product demo voiceover from this script, use cedar, keep it warm and steady, output mp3, and emphasize the product name on first mention.” That is better than “make this sound professional,” because it gives the skill concrete synthesis controls.

Single vs batch workflow

The skill is designed for two paths: one clip or many clips. If you have multiple lines, prompts, or files, treat it as batch and prepare a temporary JSONL file under tmp/, then run the CLI once and delete the JSONL after use. If you have one script, use the single-file path. This decision matters because the skill’s structure and validation steps change with output volume.

What to check before you run

For best results, verify the text verbatim, not just the theme. Confirm the voice, file format, speed, and whether the output must be neutral, expressive, or accessibility-first. The main repository file to inspect for execution is scripts/text_to_speech.py; do not modify it unless the repository maintainer instructs you to.

speech skill FAQ

Is the speech skill only for narration?

No. The speech skill also fits voiceover, accessibility reads, IVR prompts, and short audio prompts. It is less useful for custom voice cloning or creative voice design, which this repo does not cover.

Do I need the CLI to use speech?

For reliable speech usage, yes. The bundled CLI is the intended path for live generation, while --dry-run is useful for checking invocation shape without making an API call. If you only write a generic prompt, you lose the structure that makes the skill reproducible.

Is this beginner friendly?

Yes, if you can provide the exact text and a basic voice direction. The speech install is simple, but the output quality depends on how clearly you define pacing, tone, format, and pronunciation. Beginners usually succeed faster when they start with a short clip and one voice.

When should I not use this skill?

Do not use speech if you need custom voice creation, heavy post-production, or a workflow that depends on modifying the bundled script. It is also a poor fit if you cannot use networked OpenAI API calls or do not have an OPENAI_API_KEY.

How to Improve speech skill

Give the skill fewer ambiguities

The biggest quality gain in speech skill output comes from removing guesswork. Provide the exact text, not a summary; name the intended listener; and specify whether the read should sound like narration, support messaging, accessibility, or an IVR prompt. If a term is hard to pronounce, spell it out or add a pronunciation note.

Tune one variable at a time

When the first pass is close but not right, change only one thing: voice, speed, or instruction style. That makes iteration cleaner than rewriting the whole prompt. For example, if the timing feels rushed, keep the text and voice fixed and adjust only the speed from 1.0 to 0.95.

Use output constraints that matter

The speech guide works better when constraints are operational, not vague. Say “mp3 for quick playback,” “wav for review,” or “steady and neutral for accessibility.” For batch jobs, keep each line narrowly scoped so the skill can preserve consistent delivery across outputs.

Read the right references first

If you want better results from speech for Design Implementation, prioritize references/accessibility.md for neutral reads, references/voiceover.md for presentation-style delivery, and references/sample-prompts.md for prompt shape. These files help you write instructions that the CLI and API can execute without extra interpretation.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

frontend-design

by anthropics

frontend-design helps you turn vague UI ideas into distinctive, production-grade interfaces with real frontend code, strong aesthetic direction, and less generic AI styling.

UI Design

Favorites 1GitHub 105.2k

hyperframes

by heygen-com

hyperframes is a workflow skill for building HTML-based video compositions in HyperFrames. Use it for title cards, overlays, captions, voiceovers, audio-reactive motion, and scene transitions when you need structured, code-first hyperframes for Video Editing. It favors layout, timing, and animation decisions over generic prompt-only video requests.

Video Editing

Favorites 0GitHub 2.7k

figma-generate-library

by figma

figma-generate-library helps you build or update a Figma design system from a codebase with an ordered workflow for tokens, component libraries, documentation, and light/dark theming. Use the figma-generate-library skill when you need a practical guide for Design Systems, not a one-off mockup. It complements figma-use for Plugin API calls.

Design Systems

Favorites 0GitHub 0

winui-app

by openai

The winui-app skill helps you bootstrap, build, and troubleshoot WinUI 3 desktop apps with C# and the Windows App SDK. Use it for environment readiness, new app setup, shell and navigation choices, XAML controls, theming, accessibility, deployment, and launch-fix workflows for Frontend Development.

Frontend Development

Favorites 0GitHub 0

gsap-plugins

by greensock

gsap-plugins helps frontend developers choose, install, and use GSAP plugins correctly. It covers plugin registration, imports, and practical guidance for ScrollToPlugin, ScrollSmoother, Flip, Draggable, Inertia, Observer, SplitText, ScrambleText, SVG plugins, easing tools, and GSDevTools. Use it when you need a clear gsap-plugins guide instead of generic animation advice.

Frontend Development

Favorites 0GitHub 3.2k

ckm:design-system

by nextlevelbuilder

ckm:design-system helps you build three-layer tokens, component specs, CSS variables, Tailwind mappings, and brand-consistent slide assets from a clear token architecture.

Design Systems

Favorites 0GitHub 53.6k

impeccable

by pbakaus

impeccable helps you create distinctive, production-grade frontend UI instead of generic AI-looking interfaces. It supports craft, teach, and extract workflows for pages, web components, app surfaces, posters, and other design-heavy frontends, making the skill useful for UI Design, design context setup, and reusable pattern extraction.

UI Design

Favorites 0GitHub 20.4k

figma

by openai

Use figma to pull design context, screenshots, variables, and assets from the Figma MCP server, then translate Figma nodes into implementation-ready UI decisions. This figma skill is ideal when you have a Figma URL or node ID and need accurate figma usage for design-to-code work, setup, or troubleshooting.

Design Implementation

Favorites 0GitHub 18.6k

archimate

by markdown-viewer

archimate helps you create ArchiMate diagrams in PlantUML with `!include <archimate/Archimate>`, typed element macros, and relationship macros. It fits layered enterprise architecture views for business, application, technology, motivation, and migration planning. Use archimate for Diagramming when you need structured EA notation, not generic cloud or network diagrams.

Diagramming

Favorites 0GitHub 1.1k

tvos-design-guidelines

by ehmo

tvos-design-guidelines is a practical Apple TV design rule set for reviewing tvOS interfaces, focus-based navigation, Siri Remote behavior, 10-foot readability, and media playback. Use this tvos-design-guidelines guide when you need clear constraints, screen-by-screen critique, and implementation checks for living-room UI design.

UI Design

Favorites 0GitHub 357

android-design-guidelines

by ehmo

android-design-guidelines is a practical guide for Material Design 3, Jetpack Compose, and XML layouts. Use it to review Android UI decisions for theming, navigation, accessibility, adaptive layouts, dynamic color, and Material You compliance. Ideal for android-design-guidelines guide and android-design-guidelines for UI Design tasks.

UI Design

Favorites 0GitHub 357

figma-use

by openai

figma-use is the required skill for safe `use_figma` calls in Figma Plugin API workflows. Use the figma-use skill to install and load it before writing, updating, inspecting, or structuring Figma files with JavaScript. It is especially useful for Design Implementation, component work, variables, auto layout, and programmatic file reading.

Design Implementation

Favorites 0GitHub 0

shadcn

by shadcn-ui

Use the shadcn skill to inspect project context, run the right CLI commands, install components, and compose UI with documented patterns for base vs radix, forms, theming, and registries.

UI Design

Favorites 0GitHub 111k

visionos-design-guidelines

by ehmo

The visionos-design-guidelines skill helps you apply Apple Vision Pro rules for spatial UI, eye and hand input, immersive spaces, windows, volumes, and accessibility. Use it when reviewing or designing visionOS interfaces that need comfort, correct placement, and platform-accurate guidance.

UI Design

Favorites 0GitHub 357

swiftui-patterns

by affaan-m

swiftui-patterns is a practical guide for Frontend Development on Apple platforms. It covers SwiftUI state management, NavigationStack flows, view composition, and performance guidance so you can choose the right pattern for real app code. Use the swiftui-patterns skill when refactoring or building screens with clear ownership and fewer re-renders.

Frontend Development

Favorites 0GitHub 156.3k

design-review

by garrytan

design-review is a UX-minded design QA skill for auditing live interfaces, spotting spacing, hierarchy, visual consistency, and interaction issues, then fixing them iteratively with verification. It supports plan-mode review before implementation and is useful when you want a design-review guide for concrete source changes instead of vague advice.

UX Audit

Favorites 0GitHub 91.8k