elevenlabs-dialogue

by inferen-sh

Generate polished multi-speaker dialogue audio with ElevenLabs via the inference.sh CLI. Turn structured scripts into natural-sounding conversations with multiple voices in a single file for podcasts, audiobooks, explainers, tutorials, character dialogue, and video scripts.

Stars0

Favorites0

Comments0

CategoryVoice Generation

Install Command

npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-dialogue

Audio Video Workflow Cli JavaScript Sdk API

Overview

What is elevenlabs-dialogue?

The elevenlabs-dialogue skill is a focused audio-generation tool that turns a structured script into natural multi-speaker dialogue using ElevenLabs voices. It runs through the inference.sh (infsh) CLI, so you can generate polished conversation audio directly from the command line or from agents that can call Bash.

Instead of stitching single lines or voices together manually, you define dialogue segments in a simple JSON structure (text + voice per line). The skill sends this to the elevenlabs/text-to-dialogue app via infsh and returns a single, mixed dialogue audio file.

Who is elevenlabs-dialogue for?

This skill is designed for people who need repeatable, script-driven dialogue audio, especially when you want different characters or speakers in the same track:

Podcast and interview creators who want quick draft dialogue or synthetic Q&A voices.
Video and course creators producing explainers, walkthroughs, or tutorials with two or more speakers.
Audiobook, fiction, and game writers who need character dialogue with distinct voices.
Product and marketing teams creating conversational demos or product tours.
Developers and automation-focused users integrating ElevenLabs dialogue into CI, agents, or batch workflows via CLI.

If your workflow is already command-line oriented or uses agent skills that can run Bash (infsh *), elevenlabs-dialogue gives you a clean way to script entire conversations.

What problems does elevenlabs-dialogue solve?

This skill helps you:

Generate multi-voice dialogue in one pass – specify multiple speakers and get a single, ready-to-use audio file.
Stay script-driven – define all dialogue in structured JSON, ideal for version control and automation.
Control voice casting – pick from 22+ ElevenLabs voices and pair them for different scenarios.
Speed up iteration – change lines, voices, or ordering and regenerate the full conversation quickly.

It is especially useful when you need consistent, repeatable dialogue assets rather than ad-hoc one-off lines.

When is elevenlabs-dialogue a good fit?

Use elevenlabs-dialogue when:

You are comfortable using a CLI or running commands via an agent.
You want multi-speaker audio rather than a single narrator.
Your dialogue is scripted (podcasts, explainers, training content, story scenes).
You want to leverage ElevenLabs premium voices via inference.sh.

It may not be the best fit when:

You only need a single voice reading long-form text (a simpler text-to-speech tool may be enough).
You are not able or willing to install and authenticate the inference.sh CLI.
You need heavy-duty post-production editing (you will still likely bring the generated audio into a DAW for final polish).

How to Use

Prerequisites

Before using the elevenlabs-dialogue skill, make sure you have:

A working inference.sh CLI (infsh) installation.
Access to the ElevenLabs-backed app elevenlabs/text-to-dialogue through inference.sh.
An environment (local or agent) that can run Bash with infsh.

The upstream SKILL definition specifies:

allowed-tools: Bash(infsh *) – meaning usage is designed around infsh commands in Bash.

1. Install the elevenlabs-dialogue skill

To add this skill from the inferen-sh/skills repository, use the standard skills installer:

npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-dialogue

This pulls the elevenlabs-dialogue configuration and metadata into your skills environment so agents or workflows that understand this registry can call it.

After installation, open the SKILL.md file in the skill directory if you want to see the upstream quick start and additional voice information.

2. Set up inference.sh (infsh)

The skill relies on the infsh CLI to call the underlying ElevenLabs dialogue app.

Install the inference.sh CLI following the official instructions:
- See the cli-install.md referenced in the SKILL file (URL: https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md).
Log in from your terminal so infsh can access your account and apps:

infsh login

Ensure this succeeds before you try running the dialogue app.

3. Run a basic dialogue generation

Once infsh is configured, you can generate multi-speaker dialogue with a single command. The upstream quick start example looks like this:

infsh app run elevenlabs/text-to-dialogue --input '{
  "segments": [
    {"text": "Have you tried the new feature?", "voice": "george"},
    {"text": "Not yet, but I heard it is amazing.", "voice": "aria"},
    {"text": "You should check it out today.", "voice": "george"}
  ]
}'

Key points:

elevenlabs/text-to-dialogue is the app that powers elevenlabs-dialogue.
segments is an array of dialogue turns.
Each segment specifies:
- text: what the speaker says.
- voice: which ElevenLabs voice to use.

The output is a synthesized audio file with all segments arranged as a single conversation.

4. Structure your own dialogue scripts

To use elevenlabs-dialogue effectively in real projects:

Draft your conversation in a text editor.
Convert it into the JSON segments structure.
Map each character or speaker to a chosen voice name.
Run via infsh app run as shown above.

Example for a short product demo dialogue:

infsh app run elevenlabs/text-to-dialogue --input '{
  "segments": [
    {"text": "Welcome to the analytics dashboard.", "voice": "aria"},
    {"text": "Here you can track your key performance metrics.", "voice": "brian"},
    {"text": "Let me show you how to create a new report.", "voice": "aria"}
  ]
}'

This pattern works well in scripts, CI, or any agent that can construct JSON and call Bash.

5. Choose and combine voices

The SKILL documentation notes 22+ premium voices available for each speaker and provides popular pairings such as:

Interview: george + aria for professional Q&A.
Casual chat: brian + sarah for a relaxed tone.

To make the most of elevenlabs-dialogue:

Assign a consistent voice per character so listeners can easily follow who is speaking.
Use different pairings for different content types (e.g., more formal voices for B2B explainers, warmer voices for storytelling).
Keep a small mapping file in your project (e.g. voices.json) that defines which character uses which voice name.

6. Integrate into your workflow

Because elevenlabs-dialogue is CLI-driven, it fits naturally into automated audio workflows:

For audio and video production – generate dialogue tracks, then import them into your DAW or video editor for music, sound design, and timing.
For docs and tutorials – script product walkthroughs and generate conversational narrations.
For agents – let an agent build the segments JSON from context or user prompts, then call infsh app run to produce dialogue on demand.

The skill itself does not handle editing, layering, or distribution; it focuses on generation. Downstream tools should handle mixing, trimming, and export.

FAQ

What does the elevenlabs-dialogue skill actually do?

The elevenlabs-dialogue skill orchestrates multi-speaker dialogue generation with ElevenLabs voices via the inference.sh CLI. You provide a list of dialogue segments (text + voice), and it returns a single, mixed audio file where each line is spoken by the specified voice in sequence.

How is elevenlabs-dialogue different from regular text-to-speech?

Typical text-to-speech tools generate audio for a single speaker or a single block of text at a time. elevenlabs-dialogue is designed for conversations: multiple lines, multiple voices, one final audio track. This makes it better suited for interviews, character dialogue, scripted chats, and two-speaker explainers.

Do I need inference.sh installed to use elevenlabs-dialogue?

Yes. The skill relies on the inference.sh (infsh) CLI. You must:

Install the CLI using the official cli-install.md instructions.
Run infsh login to authenticate.

Without infsh, the elevenlabs-dialogue commands and agents that depend on it will not work.

Can I choose any ElevenLabs voice?

The SKILL documentation notes 22+ premium voices available for use. You reference voices by name in each segment, for example "voice": "george" or "voice": "aria". Exact voice availability and naming is managed by the ElevenLabs integration behind elevenlabs/text-to-dialogue.

What kind of projects is elevenlabs-dialogue best for?

Ideal use cases include:

Synthetic podcast segments or interview mockups.
Video explainers with two or more presenters.
Audiobook scenes with multiple characters.
Tutorials and product tours where different speakers guide the user.
Character dialogue for prototypes, demos, or game design.

If you only need a single narrator, a simpler text-to-speech tool may be enough; elevenlabs-dialogue shines when you want multiple voices interacting.

Can I edit the audio after generation?

Yes. elevenlabs-dialogue focuses on generating the dialogue track. You can import the resulting audio file into any audio editor or video editor to:

Adjust timing and pacing.
Add music, sound effects, or ambience.
Apply EQ, compression, and mastering.

The skill does not include an editor itself; it is intended to integrate into an existing audio/video production workflow.

How do I start quickly with elevenlabs-dialogue?

Install the skill:

npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-dialogue

Install and log in to the inference.sh CLI.
Copy the quick start example from above and run it with infsh app run.
Replace the sample segments with your own script and voice choices.

From there you can iterate on your dialogue structure and embed the command into scripts, agents, or build pipelines.

Where can I see more details for elevenlabs-dialogue?

For the most precise, up-to-date usage notes, open the upstream SKILL.md file in the inferen-sh/skills repository under tools/audio/elevenlabs-dialogue. That file includes the official description, quick start snippet, and voice pairing guidance used as the basis for this overview.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

elevenlabs-tts

by inferen-sh

ElevenLabs text-to-speech via inference.sh CLI, with 22+ premium voices, multilingual support, and fast model options for production voice generation workflows.

Voice Generation

Favorites 0GitHub 0

elevenlabs-dubbing

by inferen-sh

elevenlabs-dubbing lets you automatically dub and translate audio or video into 29 languages using the inference.sh CLI, preserving the original speakers’ voices. Ideal for video editors, podcasters, and localization teams who need fast, high‑quality multilingual versions of existing content.

Video Editing

Favorites 0GitHub 0

dialogue-audio

by inferen-sh

Create realistic multi-speaker dialogue audio with Dia TTS and ElevenLabs via the inference.sh CLI. The dialogue-audio skill helps you control speakers, emotion, pacing, and conversation flow for podcasts, audiobooks, explainers, character scenes, and other conversational content.

Voice Generation

Favorites 0GitHub 0

ai-podcast-creation

by inferen-sh

Create AI-powered podcasts and voice content from text using Kokoro TTS, DIA TTS, and the inference.sh CLI. Mix multiple voices, add music, and assemble full episodes for podcasts, audiobooks, and audio newsletters.

Voice Generation

Favorites 0GitHub 0

elevenlabs-music

by inferen-sh

Generate original AI music from text prompts using the inference.sh CLI and ElevenLabs. Control duration, style, and mood to create royalty-free background music, soundtracks, jingles, podcasts beds, and game audio directly from your terminal.

Audio Editing

Favorites 0GitHub 0

ai-music-generation

by inferen-sh

Generate AI music and full songs from text prompts using ElevenLabs Music, Diffrythm, and Tencent Song Generation via the inference.sh CLI. Ideal for background tracks, soundtracks, social clips, podcasts, and royalty-free music. Supports fast song generation, instrumentals, and full vocal songs.

Voice Generation

Favorites 0GitHub 0

ai-voice-cloning

by inferen-sh

ai-voice-cloning is an inference.sh-based skill for AI voice generation, text-to-speech, and voice cloning from the CLI. It wraps ElevenLabs, Kokoro TTS, DIA, Chatterbox, Higgs, and VibeVoice models for natural speech, multi-voice narration, and voice transformation for audio and video projects.

Voice Generation

Favorites 0GitHub 0

elevenlabs-voice-changer

by inferen-sh

ElevenLabs voice changer skill using the inference.sh CLI (infsh) to transform recorded speech into a different synthetic voice while preserving content and emotion. Supports eleven_multilingual_sts_v2 (70+ languages) and eleven_english_sts_v2 for speech-to-speech, accent change, and voice disguise in content creation, dubbing, and character voices.

Voice Generation

Favorites 0GitHub 0