Audio

Browse agent skills tagged with Audio and compare related workflows across the directory.

18 skills
A
videodb

by affaan-m

videodb helps you ingest video and audio from local files, URLs, RTSP/RTMP live feeds, or desktop capture; search moments with timestamps and playable evidence; and act with clips, overlays, transcription, alerts, and timeline editing. It is a practical videodb guide for VideoDB for Video Editing and live-stream analysis.

Video Editing
Favorites 0GitHub 156.3k
A
video-editing

by affaan-m

The video-editing skill helps you turn existing footage into polished, platform-ready videos faster. It focuses on cutting, structuring, captioning, reframing, and light augmentation for vlogs, tutorials, demos, short clips, and interview edits. Best when you already have raw footage and need a practical video-editing guide.

Video Editing
Favorites 0GitHub 156.3k
A
fal-ai-media

by affaan-m

fal-ai-media is a GitHub skill for unified media generation through fal.ai MCP. It helps users install and use the fal-ai-media skill for image generation, image editing, video, speech, and audio workflows with model search, cost checks, and guided prompts.

Image Generation
Favorites 0GitHub 156.1k
O
transcribe

by openai

transcribe turns audio or video into text with optional diarization and known-speaker hints. It is well suited for Technical Writing, meeting notes, interviews, lectures, and content ops when you need a repeatable transcribe skill with clear output formats and less guesswork than a generic prompt.

Technical Writing
Favorites 0GitHub 18.8k
J
baoyu-youtube-transcript

by JimLiu

baoyu-youtube-transcript helps extract YouTube transcripts, subtitles, and cover images from a URL or video ID. It supports language selection, translation, markdown or SRT output, cached reformatting, and a fallback from InnerTube API to yt-dlp for more reliable transcript retrieval.

Format Conversion
Favorites 0GitHub 13.2k
H
hyperframes

by heygen-com

hyperframes is a workflow skill for building HTML-based video compositions in HyperFrames. Use it for title cards, overlays, captions, voiceovers, audio-reactive motion, and scene transitions when you need structured, code-first hyperframes for Video Editing. It favors layout, timing, and animation decisions over generic prompt-only video requests.

Video Editing
Favorites 0GitHub 2.7k
M
azure-ai-voicelive-ts

by microsoft

azure-ai-voicelive-ts helps you build real-time voice AI apps with the Azure AI Voice Live TypeScript SDK. Use it for Node.js or browser projects that need bidirectional audio, streaming responses, session setup, and function calling. This azure-ai-voicelive-ts guide is useful when you want practical install, usage, and code generation help.

Code Generation
Favorites 0GitHub 2.3k
M
azure-ai-contentunderstanding-py

by microsoft

azure-ai-contentunderstanding-py is the Python skill for Azure AI Content Understanding. It extracts structured content from documents, images, audio, and video for RAG workflows and automation. Use it when you need reliable multimodal extraction, Azure authentication, and repeatable pipeline-ready output.

RAG Workflows
Favorites 0GitHub 2.2k
M
azure-ai-voicelive-java

by microsoft

azure-ai-voicelive-java is an Azure AI VoiceLive SDK skill for Java backend development. It covers install, authentication, WebSocket voice streaming, event handling, and example-driven usage for real-time assistant builds.

Backend Development
Favorites 0GitHub 2.2k
M
azure-ai-voicelive-dotnet

by microsoft

azure-ai-voicelive-dotnet is the .NET skill for building real-time voice AI apps with Azure AI Voice Live. It covers install, setup, auth, and usage guidance for backend development, including bidirectional audio, low-latency sessions, and speech-to-speech workflows.

Backend Development
Favorites 0GitHub 2.2k
M
podcast-generation

by microsoft

podcast-generation helps build AI-generated podcast-style audio from text using Azure OpenAI GPT Realtime Mini over WebSocket. It fits podcast-generation for Full-Stack Development, with guidance for React, Python FastAPI, PCM streaming, transcript capture, and WAV conversion. Use it when you need a practical podcast-generation guide for real app integration, not a generic prompt.

Full-Stack Development
Favorites 0GitHub 2.2k
M
github-issue-creator

by microsoft

github-issue-creator converts raw notes, error logs, voice dictation, and screenshots into crisp GitHub-flavored issue drafts. This github-issue-creator skill helps with Issue Tracking by organizing summary, environment, reproduction steps, expected vs actual behavior, impact, and evidence into a reviewable markdown issue.

Issue Tracking
Favorites 0GitHub 2.2k
P
seedance-2.0-prompter

by pexoai

seedance-2.0-prompter helps turn multimodal Seedance 2.0 assets into structured prompts with clear roles, @asset syntax, and reusable templates for install, setup, and practical usage.

Prompt Writing
Favorites 0GitHub 452
R
transcribe-video

by rameerez

The transcribe-video skill turns video or audio files into .srt, .vtt, and .txt outputs with AWS Transcribe. Use it for transcribe-video usage when you need captions, a searchable transcript, or a clean text version of spoken content. It also fits transcribe-video for Format Conversion workflows.

Format Conversion
Favorites 0GitHub 23
M
detecting-deepfake-audio-in-vishing-attacks

by mukul975

detecting-deepfake-audio-in-vishing-attacks helps security teams analyze audio for AI-generated speech in vishing, fraud, and impersonation cases. It extracts spectral and MFCC-based features, scores suspicious samples, and produces a forensic-style report for review. Ideal for Security Audit and incident response workflows.

Security Audit
Favorites 0GitHub 0
O
speech

by openai

Use the speech skill to turn text into spoken audio for narration, voiceover, IVR prompts, accessibility reads, and batch speech generation. It uses the OpenAI Audio API with built-in voices, a bundled CLI, and `OPENAI_API_KEY` for live runs. Custom voice creation is out of scope.

Design Implementation
Favorites 0GitHub 0
M
azure-ai-voicelive-py

by microsoft

azure-ai-voicelive-py helps you build real-time voice AI apps in Python with Azure AI Voice Live. Use it for bidirectional WebSocket audio, voice assistants, speech-to-speech chat, transcription, avatars, and tool-using voice agents. Best fit for backend development when you need async connections, Azure auth, session control, and low-latency streaming.

Backend Development
Favorites 0GitHub 0
M
azure-ai-transcription-py

by microsoft

azure-ai-transcription-py is a Python skill for Azure AI Transcription. Use it for batch or real-time speech-to-text with timestamps and diarization. It fits backend development, uses subscription key auth, and points you to the right install and usage flow for the Azure client library.

Backend Development
Favorites 0GitHub 0