ai-podcast-creation

by inferen-sh

Create AI-powered podcasts and voice content from text using Kokoro TTS, DIA TTS, and the inference.sh CLI. Mix multiple voices, add music, and assemble full episodes for podcasts, audiobooks, and audio newsletters.

Stars0

Favorites0

Comments0

CategoryVoice Generation

Install Command

npx skills add https://github.com/inferen-sh/skills --skill ai-podcast-creation

Audio Video Automation Workflow Cli Ai

Overview

What is ai-podcast-creation?

The ai-podcast-creation skill is a workflow for generating AI-driven podcasts and voice content using the inference.sh CLI. It focuses on turning text prompts into natural-sounding speech with Kokoro TTS and DIA TTS, then using additional tools for music and media merging to assemble complete podcast-style segments.

This skill is tailored for creators who want an automated, script-to-audio pipeline rather than manually recording and editing voice tracks.

Key capabilities

With ai-podcast-creation, you can:

Generate high-quality text-to-speech using Kokoro TTS via infsh app run infsh/kokoro-tts.
Use different predefined voice IDs (e.g., af_sarah, af_nicole, am_michael) to fit hosts, guests, or narrators.
Produce podcast segments and narrations directly from written scripts.
Build multi-voice conversations and character voices by calling the TTS app multiple times with different voice IDs.
Integrate with other inference.sh apps such as DIA TTS, Chatterbox, AI music generation, and media merger for background music and multi-track assembly (as described in the skill).

Who is this skill for?

ai-podcast-creation is a good fit if you are:

A podcast creator or production team wanting to prototype or automate episodes.
A content marketer turning articles or newsletters into audio.
An indie developer or automation engineer building CLI-based media workflows.
A researcher or educator generating lecture-style audio or explainer content.

It is less suitable if you need:

Real-time, interactive voice chat in a browser (this skill is CLI-focused).
Manual DAW-style editing inside the skill itself (you would export audio then edit in a separate tool).

When ai-podcast-creation is a good fit

Use this skill when:

You already write scripts, show notes, or long-form text and want to convert them into spoken audio.
You prefer terminal-based automation and reproducible pipelines over GUI tools.
You want to experiment with voices quickly before committing to a more complex production setup.

Consider other options if you:

Need deeply customized audio post-processing inside a DAW only.
Cannot install or use the inference.sh CLI (infsh), which is required for this skill.

How to Use

Prerequisites

To run ai-podcast-creation, you need:

Access to a terminal on macOS, Linux, or WSL/compatible environment.
The inference.sh CLI (infsh) installed.
A valid inference.sh account and credentials to run infsh login.

The skill’s own SKILL.md explicitly notes:

Requires inference.sh CLI (infsh). Install instructions

Follow that link for the official CLI installation steps before using this skill.

1. Install the ai-podcast-creation skill

Use the Agent Skills CLI to add the skill from the inferen-sh/skills repository:

npx skills add https://github.com/inferen-sh/skills --skill ai-podcast-creation

This pulls in the ai-podcast-creation guide and metadata so your agent or toolchain can reference it.

2. Set up inference.sh CLI

Once the CLI is installed, authenticate:

infsh login

Follow the prompts to complete the login with your inference.sh account.

After logging in, you can call apps like infsh/kokoro-tts directly from your terminal or scripted workflows.

3. Generate your first podcast segment

The quickest way to test ai-podcast-creation is to run the Kokoro TTS example from SKILL.md:

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to the AI Frontiers podcast. Today we explore the latest developments in generative AI.",
  "voice": "am_michael"
}'

This command:

Sends the prompt text to the infsh/kokoro-tts app.
Uses the am_michael voice (an American male, authoritative style recommended for documentary or tech content).
Returns generated speech audio, which you can save or pipe into further processing, depending on your CLI configuration.

4. Choose the right voice

The skill’s documentation provides a voice table under Available Voices → Kokoro TTS. Example voices include:

af_sarah – American female, warm; suitable for hosts and narrators.
af_nicole – American female, professional; suitable for news or business shows.
am_michael – American male, authoritative; suitable for tech or documentary podcasts.

You can swap out the voice in your command:

infsh app run infsh/kokoro-tts --input '{
  "prompt": "In today\'s episode, we break down three key trends in machine learning.",
  "voice": "af_nicole"
}'

By running multiple commands with different voices and prompts, you can create multi-speaker segments and later merge them with music or effects using other apps described by the skill (e.g., media merger).

5. Build a repeatable workflow

Once you are comfortable generating individual lines, wrap your process into scripts. For example, you might:

Store your episode script in a file like episode01.txt.
Split it into segments for host intro, guest answers, and outro.
Call infsh app run infsh/kokoro-tts for each segment with different voices.
Use additional inference.sh apps (AI music generation, media merger) to add intro music, background beds, or crossfades as suggested in the skill description.

Although the repository excerpt provided focuses on Kokoro TTS, the SKILL description indicates support for DIA TTS and Chatterbox. You would follow similar infsh app run patterns for those apps, using their documented parameters.

6. Explore the skill documentation in the repo

After installation, open the skill files for deeper guidance:

SKILL.md – Primary guide for ai-podcast-creation, including the quick start and details on available voices.
Other referenced folders in the repository (e.g., guides/content/ai-podcast-creation) – Contain extended content and examples for working with TTS and media workflows.

Use these documents to refine:

Voice selection for different show formats.
How you chain together TTS, music, and media merging.
How to adapt the workflow to your existing automation or CI/CD systems.

FAQ

What does ai-podcast-creation actually do?

ai-podcast-creation is a documented workflow that shows you how to use the inference.sh CLI, Kokoro TTS, DIA TTS, Chatterbox, and related apps to generate podcast-style audio from text. It gives you voice options, command examples, and guidance for assembling full episodes with music and editing tools.

Do I need the inference.sh CLI to use this skill?

Yes. The skill explicitly requires the inference.sh CLI (infsh). You must install it and run infsh login before you can execute commands like:

infsh app run infsh/kokoro-tts --input '{"prompt": "...", "voice": "am_michael"}'

Without infsh, the ai-podcast-creation workflow cannot run.

Can I create multi-voice conversations with this skill?

Yes. While the code excerpt shows a single-voice example, the skill’s description emphasizes multi-voice conversations. You implement this by:

Calling the TTS app multiple times with different voice IDs for each speaker.
Generating separate audio clips for each line or segment.
Combining those clips (and optionally music) with a media merging tool, as indicated in the skill description.

Is this a full podcast editor or DAW replacement?

No. ai-podcast-creation focuses on generation and assembly using CLI apps. It is excellent for:

Script-to-audio conversion.
Multi-voice and AI-generated music creation.
Automated or batch workflows.

For detailed waveform editing, mixing, or mastering, you would still use a dedicated DAW (e.g., Audacity, Reaper, etc.) after generating your audio files.

Can I use ai-podcast-creation for audiobooks and voiceovers?

Yes. The skill description explicitly lists audiobooks, voice content, and audio newsletters as use cases. The same TTS commands you use for podcasts can narrate long-form text, training materials, or promotional scripts. You simply adapt your script structure and voice choices to the format.

How does ai-podcast-creation compare to browser-based AI podcast tools?

Browser-based tools usually provide a GUI, whereas ai-podcast-creation is CLI-first and scriptable. Choose ai-podcast-creation if you:

Prefer automation and reproducible command-line workflows.
Want to integrate voice generation into existing pipelines, cron jobs, or CI.

Choose a browser tool if you:

Need a point-and-click interface.
Do not plan to work with terminals or scripts.

Where can I find the list of available voices?

The voice list for Kokoro TTS appears under Available Voices → Kokoro TTS in SKILL.md. Open that file in the inferen-sh/skills repository to see each voice ID, its description, and recommendations (e.g., host, narrator, news).

How do I troubleshoot if my command fails?

If infsh app run fails:

Confirm the inference.sh CLI is installed correctly using the official install guide.
Run infsh login again to ensure your session is valid.
Double-check your JSON in --input is valid (proper quotes and escaping).
Verify the app name (infsh/kokoro-tts) and voice IDs match those documented in SKILL.md.

If issues persist, refer to the main inference.sh documentation or repository issues for environment-specific help.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

ai-music-generation

by inferen-sh

Generate AI music and full songs from text prompts using ElevenLabs Music, Diffrythm, and Tencent Song Generation via the inference.sh CLI. Ideal for background tracks, soundtracks, social clips, podcasts, and royalty-free music. Supports fast song generation, instrumentals, and full vocal songs.

Voice Generation

Favorites 0GitHub 0

dialogue-audio

by inferen-sh

Create realistic multi-speaker dialogue audio with Dia TTS and ElevenLabs via the inference.sh CLI. The dialogue-audio skill helps you control speakers, emotion, pacing, and conversation flow for podcasts, audiobooks, explainers, character scenes, and other conversational content.

Voice Generation

Favorites 0GitHub 0

elevenlabs-stt

by inferen-sh

High-accuracy ElevenLabs speech-to-text via inference.sh CLI using Scribe v1/v2 models. Supports transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, and subtitle generation for meetings, podcasts, and other audio workflows.

Audio Editing

Favorites 0GitHub 0

elevenlabs-music

by inferen-sh

Generate original AI music from text prompts using the inference.sh CLI and ElevenLabs. Control duration, style, and mood to create royalty-free background music, soundtracks, jingles, podcasts beds, and game audio directly from your terminal.

Audio Editing

Favorites 0GitHub 0

elevenlabs-voice-changer

by inferen-sh

ElevenLabs voice changer skill using the inference.sh CLI (infsh) to transform recorded speech into a different synthetic voice while preserving content and emotion. Supports eleven_multilingual_sts_v2 (70+ languages) and eleven_english_sts_v2 for speech-to-speech, accent change, and voice disguise in content creation, dubbing, and character voices.

Voice Generation

Favorites 0GitHub 0

elevenlabs-dialogue

by inferen-sh

Generate polished multi-speaker dialogue audio with ElevenLabs via the inference.sh CLI. Turn structured scripts into natural-sounding conversations with multiple voices in a single file for podcasts, audiobooks, explainers, tutorials, character dialogue, and video scripts.

Voice Generation

Favorites 0GitHub 0

ai-voice-cloning

by inferen-sh

ai-voice-cloning is an inference.sh-based skill for AI voice generation, text-to-speech, and voice cloning from the CLI. It wraps ElevenLabs, Kokoro TTS, DIA, Chatterbox, Higgs, and VibeVoice models for natural speech, multi-voice narration, and voice transformation for audio and video projects.

Voice Generation

Favorites 0GitHub 0

elevenlabs-tts

by inferen-sh

ElevenLabs text-to-speech via inference.sh CLI, with 22+ premium voices, multilingual support, and fast model options for production voice generation workflows.

Voice Generation

Favorites 0GitHub 0