I

ai-podcast-creation

by inferen-sh

Create AI-powered podcasts and voice content from text using Kokoro TTS, DIA TTS, and the inference.sh CLI. Mix multiple voices, add music, and assemble full episodes for podcasts, audiobooks, and audio newsletters.

Stars0
Favorites0
Comments0
CategoryVoice Generation
Install Command
npx skills add https://github.com/inferen-sh/skills --skill ai-podcast-creation
Overview

Overview

What is ai-podcast-creation?

The ai-podcast-creation skill is a workflow for generating AI-driven podcasts and voice content using the inference.sh CLI. It focuses on turning text prompts into natural-sounding speech with Kokoro TTS and DIA TTS, then using additional tools for music and media merging to assemble complete podcast-style segments.

This skill is tailored for creators who want an automated, script-to-audio pipeline rather than manually recording and editing voice tracks.

Key capabilities

With ai-podcast-creation, you can:

  • Generate high-quality text-to-speech using Kokoro TTS via infsh app run infsh/kokoro-tts.
  • Use different predefined voice IDs (e.g., af_sarah, af_nicole, am_michael) to fit hosts, guests, or narrators.
  • Produce podcast segments and narrations directly from written scripts.
  • Build multi-voice conversations and character voices by calling the TTS app multiple times with different voice IDs.
  • Integrate with other inference.sh apps such as DIA TTS, Chatterbox, AI music generation, and media merger for background music and multi-track assembly (as described in the skill).

Who is this skill for?

ai-podcast-creation is a good fit if you are:

  • A podcast creator or production team wanting to prototype or automate episodes.
  • A content marketer turning articles or newsletters into audio.
  • An indie developer or automation engineer building CLI-based media workflows.
  • A researcher or educator generating lecture-style audio or explainer content.

It is less suitable if you need:

  • Real-time, interactive voice chat in a browser (this skill is CLI-focused).
  • Manual DAW-style editing inside the skill itself (you would export audio then edit in a separate tool).

When ai-podcast-creation is a good fit

Use this skill when:

  • You already write scripts, show notes, or long-form text and want to convert them into spoken audio.
  • You prefer terminal-based automation and reproducible pipelines over GUI tools.
  • You want to experiment with voices quickly before committing to a more complex production setup.

Consider other options if you:

  • Need deeply customized audio post-processing inside a DAW only.
  • Cannot install or use the inference.sh CLI (infsh), which is required for this skill.

How to Use

Prerequisites

To run ai-podcast-creation, you need:

  • Access to a terminal on macOS, Linux, or WSL/compatible environment.
  • The inference.sh CLI (infsh) installed.
  • A valid inference.sh account and credentials to run infsh login.

The skill’s own SKILL.md explicitly notes:

Requires inference.sh CLI (infsh). Install instructions

Follow that link for the official CLI installation steps before using this skill.

1. Install the ai-podcast-creation skill

Use the Agent Skills CLI to add the skill from the inferen-sh/skills repository:

npx skills add https://github.com/inferen-sh/skills --skill ai-podcast-creation

This pulls in the ai-podcast-creation guide and metadata so your agent or toolchain can reference it.

2. Set up inference.sh CLI

Once the CLI is installed, authenticate:

infsh login

Follow the prompts to complete the login with your inference.sh account.

After logging in, you can call apps like infsh/kokoro-tts directly from your terminal or scripted workflows.

3. Generate your first podcast segment

The quickest way to test ai-podcast-creation is to run the Kokoro TTS example from SKILL.md:

infsh app run infsh/kokoro-tts --input '{
  "prompt": "Welcome to the AI Frontiers podcast. Today we explore the latest developments in generative AI.",
  "voice": "am_michael"
}'

This command:

  • Sends the prompt text to the infsh/kokoro-tts app.
  • Uses the am_michael voice (an American male, authoritative style recommended for documentary or tech content).
  • Returns generated speech audio, which you can save or pipe into further processing, depending on your CLI configuration.

4. Choose the right voice

The skill’s documentation provides a voice table under Available Voices → Kokoro TTS. Example voices include:

  • af_sarah – American female, warm; suitable for hosts and narrators.
  • af_nicole – American female, professional; suitable for news or business shows.
  • am_michael – American male, authoritative; suitable for tech or documentary podcasts.

You can swap out the voice in your command:

infsh app run infsh/kokoro-tts --input '{
  "prompt": "In today\'s episode, we break down three key trends in machine learning.",
  "voice": "af_nicole"
}'

By running multiple commands with different voices and prompts, you can create multi-speaker segments and later merge them with music or effects using other apps described by the skill (e.g., media merger).

5. Build a repeatable workflow

Once you are comfortable generating individual lines, wrap your process into scripts. For example, you might:

  • Store your episode script in a file like episode01.txt.
  • Split it into segments for host intro, guest answers, and outro.
  • Call infsh app run infsh/kokoro-tts for each segment with different voices.
  • Use additional inference.sh apps (AI music generation, media merger) to add intro music, background beds, or crossfades as suggested in the skill description.

Although the repository excerpt provided focuses on Kokoro TTS, the SKILL description indicates support for DIA TTS and Chatterbox. You would follow similar infsh app run patterns for those apps, using their documented parameters.

6. Explore the skill documentation in the repo

After installation, open the skill files for deeper guidance:

  • SKILL.md – Primary guide for ai-podcast-creation, including the quick start and details on available voices.
  • Other referenced folders in the repository (e.g., guides/content/ai-podcast-creation) – Contain extended content and examples for working with TTS and media workflows.

Use these documents to refine:

  • Voice selection for different show formats.
  • How you chain together TTS, music, and media merging.
  • How to adapt the workflow to your existing automation or CI/CD systems.

FAQ

What does ai-podcast-creation actually do?

ai-podcast-creation is a documented workflow that shows you how to use the inference.sh CLI, Kokoro TTS, DIA TTS, Chatterbox, and related apps to generate podcast-style audio from text. It gives you voice options, command examples, and guidance for assembling full episodes with music and editing tools.

Do I need the inference.sh CLI to use this skill?

Yes. The skill explicitly requires the inference.sh CLI (infsh). You must install it and run infsh login before you can execute commands like:

infsh app run infsh/kokoro-tts --input '{"prompt": "...", "voice": "am_michael"}'

Without infsh, the ai-podcast-creation workflow cannot run.

Can I create multi-voice conversations with this skill?

Yes. While the code excerpt shows a single-voice example, the skill’s description emphasizes multi-voice conversations. You implement this by:

  • Calling the TTS app multiple times with different voice IDs for each speaker.
  • Generating separate audio clips for each line or segment.
  • Combining those clips (and optionally music) with a media merging tool, as indicated in the skill description.

Is this a full podcast editor or DAW replacement?

No. ai-podcast-creation focuses on generation and assembly using CLI apps. It is excellent for:

  • Script-to-audio conversion.
  • Multi-voice and AI-generated music creation.
  • Automated or batch workflows.

For detailed waveform editing, mixing, or mastering, you would still use a dedicated DAW (e.g., Audacity, Reaper, etc.) after generating your audio files.

Can I use ai-podcast-creation for audiobooks and voiceovers?

Yes. The skill description explicitly lists audiobooks, voice content, and audio newsletters as use cases. The same TTS commands you use for podcasts can narrate long-form text, training materials, or promotional scripts. You simply adapt your script structure and voice choices to the format.

How does ai-podcast-creation compare to browser-based AI podcast tools?

Browser-based tools usually provide a GUI, whereas ai-podcast-creation is CLI-first and scriptable. Choose ai-podcast-creation if you:

  • Prefer automation and reproducible command-line workflows.
  • Want to integrate voice generation into existing pipelines, cron jobs, or CI.

Choose a browser tool if you:

  • Need a point-and-click interface.
  • Do not plan to work with terminals or scripts.

Where can I find the list of available voices?

The voice list for Kokoro TTS appears under Available Voices → Kokoro TTS in SKILL.md. Open that file in the inferen-sh/skills repository to see each voice ID, its description, and recommendations (e.g., host, narrator, news).

How do I troubleshoot if my command fails?

If infsh app run fails:

  • Confirm the inference.sh CLI is installed correctly using the official install guide.
  • Run infsh login again to ensure your session is valid.
  • Double-check your JSON in --input is valid (proper quotes and escaping).
  • Verify the app name (infsh/kokoro-tts) and voice IDs match those documented in SKILL.md.

If issues persist, refer to the main inference.sh documentation or repository issues for environment-specific help.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...