Audio

Browse agent skills tagged with Audio and compare related workflows across the directory.

25 skills

videodb

by affaan-m

videodb helps you ingest video and audio from local files, URLs, RTSP/RTMP live feeds, or desktop capture; search moments with timestamps and playable evidence; and act with clips, overlays, transcription, alerts, and timeline editing. It is a practical videodb guide for VideoDB for Video Editing and live-stream analysis.

Video Editing

Favorites 0GitHub 156.3k

video-editing

by affaan-m

The video-editing skill helps you turn existing footage into polished, platform-ready videos faster. It focuses on cutting, structuring, captioning, reframing, and light augmentation for vlogs, tutorials, demos, short clips, and interview edits. Best when you already have raw footage and need a practical video-editing guide.

Video Editing

Favorites 0GitHub 156.3k

fal-ai-media

by affaan-m

fal-ai-media is a GitHub skill for unified media generation through fal.ai MCP. It helps users install and use the fal-ai-media skill for image generation, image editing, video, speech, and audio workflows with model search, cost checks, and guided prompts.

Image Generation

Favorites 0GitHub 156.1k

transcribe

by openai

transcribe turns audio or video into text with optional diarization and known-speaker hints. It is well suited for Technical Writing, meeting notes, interviews, lectures, and content ops when you need a repeatable transcribe skill with clear output formats and less guesswork than a generic prompt.

Technical Writing

Favorites 0GitHub 18.8k

baoyu-youtube-transcript

by JimLiu

baoyu-youtube-transcript helps extract YouTube transcripts, subtitles, and cover images from a URL or video ID. It supports language selection, translation, markdown or SRT output, cached reformatting, and a fallback from InnerTube API to yt-dlp for more reliable transcript retrieval.

Format Conversion

Favorites 0GitHub 13.2k

hyperframes

by heygen-com

hyperframes is a workflow skill for building HTML-based video compositions in HyperFrames. Use it for title cards, overlays, captions, voiceovers, audio-reactive motion, and scene transitions when you need structured, code-first hyperframes for Video Editing. It favors layout, timing, and animation decisions over generic prompt-only video requests.

Video Editing

Favorites 0GitHub 2.7k

azure-ai-voicelive-ts

by microsoft

azure-ai-voicelive-ts helps you build real-time voice AI apps with the Azure AI Voice Live TypeScript SDK. Use it for Node.js or browser projects that need bidirectional audio, streaming responses, session setup, and function calling. This azure-ai-voicelive-ts guide is useful when you want practical install, usage, and code generation help.

Code Generation

Favorites 0GitHub 2.3k

azure-ai-contentunderstanding-py

by microsoft

azure-ai-contentunderstanding-py is the Python skill for Azure AI Content Understanding. It extracts structured content from documents, images, audio, and video for RAG workflows and automation. Use it when you need reliable multimodal extraction, Azure authentication, and repeatable pipeline-ready output.

RAG Workflows

Favorites 0GitHub 2.2k

azure-ai-voicelive-java

by microsoft

azure-ai-voicelive-java is an Azure AI VoiceLive SDK skill for Java backend development. It covers install, authentication, WebSocket voice streaming, event handling, and example-driven usage for real-time assistant builds.

Backend Development

Favorites 0GitHub 2.2k

azure-ai-voicelive-dotnet

by microsoft

azure-ai-voicelive-dotnet is the .NET skill for building real-time voice AI apps with Azure AI Voice Live. It covers install, setup, auth, and usage guidance for backend development, including bidirectional audio, low-latency sessions, and speech-to-speech workflows.

Backend Development

Favorites 0GitHub 2.2k

podcast-generation

by microsoft

podcast-generation helps build AI-generated podcast-style audio from text using Azure OpenAI GPT Realtime Mini over WebSocket. It fits podcast-generation for Full-Stack Development, with guidance for React, Python FastAPI, PCM streaming, transcript capture, and WAV conversion. Use it when you need a practical podcast-generation guide for real app integration, not a generic prompt.

Full-Stack Development

Favorites 0GitHub 2.2k

github-issue-creator

by microsoft

github-issue-creator converts raw notes, error logs, voice dictation, and screenshots into crisp GitHub-flavored issue drafts. This github-issue-creator skill helps with Issue Tracking by organizing summary, environment, reproduction steps, expected vs actual behavior, impact, and evidence into a reviewable markdown issue.

Issue Tracking

Favorites 0GitHub 2.2k

speech-to-text

by NoizAI

The speech-to-text skill transcribes supported audio files into plain text, with options for timestamps, speaker labels, and JSON output. It is designed for practical speech-to-text usage in repeatable workflows, including interviews, meetings, podcasts, lectures, and automation tasks where consistent transcription matters.

Workflow Automation

Favorites 0GitHub 498

tts

by NoizAI

The tts skill turns text into speech audio for narration, dubbing, voiceover, and timeline-aligned playback. Use it to generate a voice file from plain text, convert articles or text files to speech, or render SRT-driven audio with timing control. It supports simple and timeline modes, plus backend-aware workflows for repeatable tts usage.

Voice Generation

Favorites 0GitHub 498

sound-fx

by NoizAI

Use the sound-fx skill to turn text prompts into sound effects, foley, ambient beds, creature sounds, and UI noises. It fits sound-fx for Audio Editing, quick prototyping, and downloadable audio assets. Install with NoizAI/skills, then use the script-based workflow with a valid Noiz API key. Not for speech, lyrics, melody, or voice cloning.

Audio Editing

Favorites 0GitHub 498

characteristic-voice

by NoizAI

characteristic-voice is a voice-generation skill for warm, companion-like, emotionally present speech. Use it for comforting replies, morning or night messages, casual banter, and character-style delivery with pauses, laughter, or tenderness. It includes preset-driven workflow and backend support for practical characteristic-voice usage.

Voice Generation

Favorites 0GitHub 498

chat-with-anyone

by NoizAI

chat-with-anyone helps you clone a real person's voice from public audio or design a matching voice from an image, then generate synthetic replies with TTS. It supports practical workflows for roleplay, narration, and voice generation, with guidance on install, source selection, and safe usage.

Voice Generation

Favorites 0GitHub 498

seedance-2.0-prompter

by pexoai

seedance-2.0-prompter helps turn multimodal Seedance 2.0 assets into structured prompts with clear roles, @asset syntax, and reusable templates for install, setup, and practical usage.

Prompt Writing

Favorites 0GitHub 452

transcribe-video

by rameerez

The transcribe-video skill turns video or audio files into .srt, .vtt, and .txt outputs with AWS Transcribe. Use it for transcribe-video usage when you need captions, a searchable transcript, or a clean text version of spoken content. It also fits transcribe-video for Format Conversion workflows.

Format Conversion

Favorites 0GitHub 23

transformers

by K-Dense-AI

The transformers skill helps you use Hugging Face Transformers for model loading, inference, tokenization, and fine-tuning. It is a practical transformers guide for Machine Learning tasks across text, vision, audio, and multimodal workflows, with clear paths for quick baselines and custom training.

Machine Learning

Favorites 0GitHub 0

markitdown

by K-Dense-AI

markitdown converts files and office documents to Markdown for easier reading, chunking, search, and LLM workflows. This markitdown skill supports PDF, DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, ZIP, EPUB, images with OCR, and audio transcription, making it a practical markitdown guide for format conversion.

Format Conversion

Favorites 0GitHub 0

detecting-deepfake-audio-in-vishing-attacks

by mukul975

detecting-deepfake-audio-in-vishing-attacks helps security teams analyze audio for AI-generated speech in vishing, fraud, and impersonation cases. It extracts spectral and MFCC-based features, scores suspicious samples, and produces a forensic-style report for review. Ideal for Security Audit and incident response workflows.

Security Audit

Favorites 0GitHub 0

speech

by openai

Use the speech skill to turn text into spoken audio for narration, voiceover, IVR prompts, accessibility reads, and batch speech generation. It uses the OpenAI Audio API with built-in voices, a bundled CLI, and `OPENAI_API_KEY` for live runs. Custom voice creation is out of scope.

Design Implementation

Favorites 0GitHub 0

azure-ai-voicelive-py

by microsoft

azure-ai-voicelive-py helps you build real-time voice AI apps in Python with Azure AI Voice Live. Use it for bidirectional WebSocket audio, voice assistants, speech-to-speech chat, transcription, avatars, and tool-using voice agents. Best fit for backend development when you need async connections, Azure auth, session control, and low-latency streaming.

Backend Development

Favorites 0GitHub 0