elevenlabs-dialogue
by inferen-shGenerate polished multi-speaker dialogue audio with ElevenLabs via the inference.sh CLI. Turn structured scripts into natural-sounding conversations with multiple voices in a single file for podcasts, audiobooks, explainers, tutorials, character dialogue, and video scripts.
Overview
What is elevenlabs-dialogue?
The elevenlabs-dialogue skill is a focused audio-generation tool that turns a structured script into natural multi-speaker dialogue using ElevenLabs voices. It runs through the inference.sh (infsh) CLI, so you can generate polished conversation audio directly from the command line or from agents that can call Bash.
Instead of stitching single lines or voices together manually, you define dialogue segments in a simple JSON structure (text + voice per line). The skill sends this to the elevenlabs/text-to-dialogue app via infsh and returns a single, mixed dialogue audio file.
Who is elevenlabs-dialogue for?
This skill is designed for people who need repeatable, script-driven dialogue audio, especially when you want different characters or speakers in the same track:
- Podcast and interview creators who want quick draft dialogue or synthetic Q&A voices.
- Video and course creators producing explainers, walkthroughs, or tutorials with two or more speakers.
- Audiobook, fiction, and game writers who need character dialogue with distinct voices.
- Product and marketing teams creating conversational demos or product tours.
- Developers and automation-focused users integrating ElevenLabs dialogue into CI, agents, or batch workflows via CLI.
If your workflow is already command-line oriented or uses agent skills that can run Bash (infsh *), elevenlabs-dialogue gives you a clean way to script entire conversations.
What problems does elevenlabs-dialogue solve?
This skill helps you:
- Generate multi-voice dialogue in one pass – specify multiple speakers and get a single, ready-to-use audio file.
- Stay script-driven – define all dialogue in structured JSON, ideal for version control and automation.
- Control voice casting – pick from 22+ ElevenLabs voices and pair them for different scenarios.
- Speed up iteration – change lines, voices, or ordering and regenerate the full conversation quickly.
It is especially useful when you need consistent, repeatable dialogue assets rather than ad-hoc one-off lines.
When is elevenlabs-dialogue a good fit?
Use elevenlabs-dialogue when:
- You are comfortable using a CLI or running commands via an agent.
- You want multi-speaker audio rather than a single narrator.
- Your dialogue is scripted (podcasts, explainers, training content, story scenes).
- You want to leverage ElevenLabs premium voices via inference.sh.
It may not be the best fit when:
- You only need a single voice reading long-form text (a simpler text-to-speech tool may be enough).
- You are not able or willing to install and authenticate the inference.sh CLI.
- You need heavy-duty post-production editing (you will still likely bring the generated audio into a DAW for final polish).
How to Use
Prerequisites
Before using the elevenlabs-dialogue skill, make sure you have:
- A working inference.sh CLI (
infsh) installation. - Access to the ElevenLabs-backed app
elevenlabs/text-to-dialoguethrough inference.sh. - An environment (local or agent) that can run Bash with
infsh.
The upstream SKILL definition specifies:
allowed-tools: Bash(infsh *)– meaning usage is designed aroundinfshcommands in Bash.
1. Install the elevenlabs-dialogue skill
To add this skill from the inferen-sh/skills repository, use the standard skills installer:
npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-dialogue
This pulls the elevenlabs-dialogue configuration and metadata into your skills environment so agents or workflows that understand this registry can call it.
After installation, open the SKILL.md file in the skill directory if you want to see the upstream quick start and additional voice information.
2. Set up inference.sh (infsh)
The skill relies on the infsh CLI to call the underlying ElevenLabs dialogue app.
- Install the inference.sh CLI following the official instructions:
- See the
cli-install.mdreferenced in the SKILL file (URL:https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md).
- See the
- Log in from your terminal so
infshcan access your account and apps:
infsh login
Ensure this succeeds before you try running the dialogue app.
3. Run a basic dialogue generation
Once infsh is configured, you can generate multi-speaker dialogue with a single command. The upstream quick start example looks like this:
infsh app run elevenlabs/text-to-dialogue --input '{
"segments": [
{"text": "Have you tried the new feature?", "voice": "george"},
{"text": "Not yet, but I heard it is amazing.", "voice": "aria"},
{"text": "You should check it out today.", "voice": "george"}
]
}'
Key points:
elevenlabs/text-to-dialogueis the app that powers elevenlabs-dialogue.segmentsis an array of dialogue turns.- Each segment specifies:
text: what the speaker says.voice: which ElevenLabs voice to use.
The output is a synthesized audio file with all segments arranged as a single conversation.
4. Structure your own dialogue scripts
To use elevenlabs-dialogue effectively in real projects:
- Draft your conversation in a text editor.
- Convert it into the JSON
segmentsstructure. - Map each character or speaker to a chosen voice name.
- Run via
infsh app runas shown above.
Example for a short product demo dialogue:
infsh app run elevenlabs/text-to-dialogue --input '{
"segments": [
{"text": "Welcome to the analytics dashboard.", "voice": "aria"},
{"text": "Here you can track your key performance metrics.", "voice": "brian"},
{"text": "Let me show you how to create a new report.", "voice": "aria"}
]
}'
This pattern works well in scripts, CI, or any agent that can construct JSON and call Bash.
5. Choose and combine voices
The SKILL documentation notes 22+ premium voices available for each speaker and provides popular pairings such as:
- Interview:
george+ariafor professional Q&A. - Casual chat:
brian+sarahfor a relaxed tone.
To make the most of elevenlabs-dialogue:
- Assign a consistent voice per character so listeners can easily follow who is speaking.
- Use different pairings for different content types (e.g., more formal voices for B2B explainers, warmer voices for storytelling).
- Keep a small mapping file in your project (e.g.
voices.json) that defines which character uses which voice name.
6. Integrate into your workflow
Because elevenlabs-dialogue is CLI-driven, it fits naturally into automated audio workflows:
- For audio and video production – generate dialogue tracks, then import them into your DAW or video editor for music, sound design, and timing.
- For docs and tutorials – script product walkthroughs and generate conversational narrations.
- For agents – let an agent build the
segmentsJSON from context or user prompts, then callinfsh app runto produce dialogue on demand.
The skill itself does not handle editing, layering, or distribution; it focuses on generation. Downstream tools should handle mixing, trimming, and export.
FAQ
What does the elevenlabs-dialogue skill actually do?
The elevenlabs-dialogue skill orchestrates multi-speaker dialogue generation with ElevenLabs voices via the inference.sh CLI. You provide a list of dialogue segments (text + voice), and it returns a single, mixed audio file where each line is spoken by the specified voice in sequence.
How is elevenlabs-dialogue different from regular text-to-speech?
Typical text-to-speech tools generate audio for a single speaker or a single block of text at a time. elevenlabs-dialogue is designed for conversations: multiple lines, multiple voices, one final audio track. This makes it better suited for interviews, character dialogue, scripted chats, and two-speaker explainers.
Do I need inference.sh installed to use elevenlabs-dialogue?
Yes. The skill relies on the inference.sh (infsh) CLI. You must:
- Install the CLI using the official
cli-install.mdinstructions. - Run
infsh loginto authenticate.
Without infsh, the elevenlabs-dialogue commands and agents that depend on it will not work.
Can I choose any ElevenLabs voice?
The SKILL documentation notes 22+ premium voices available for use. You reference voices by name in each segment, for example "voice": "george" or "voice": "aria". Exact voice availability and naming is managed by the ElevenLabs integration behind elevenlabs/text-to-dialogue.
What kind of projects is elevenlabs-dialogue best for?
Ideal use cases include:
- Synthetic podcast segments or interview mockups.
- Video explainers with two or more presenters.
- Audiobook scenes with multiple characters.
- Tutorials and product tours where different speakers guide the user.
- Character dialogue for prototypes, demos, or game design.
If you only need a single narrator, a simpler text-to-speech tool may be enough; elevenlabs-dialogue shines when you want multiple voices interacting.
Can I edit the audio after generation?
Yes. elevenlabs-dialogue focuses on generating the dialogue track. You can import the resulting audio file into any audio editor or video editor to:
- Adjust timing and pacing.
- Add music, sound effects, or ambience.
- Apply EQ, compression, and mastering.
The skill does not include an editor itself; it is intended to integrate into an existing audio/video production workflow.
How do I start quickly with elevenlabs-dialogue?
- Install the skill:
npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-dialogue - Install and log in to the inference.sh CLI.
- Copy the quick start example from above and run it with
infsh app run. - Replace the sample
segmentswith your own script and voice choices.
From there you can iterate on your dialogue structure and embed the command into scripts, agents, or build pipelines.
Where can I see more details for elevenlabs-dialogue?
For the most precise, up-to-date usage notes, open the upstream SKILL.md file in the inferen-sh/skills repository under tools/audio/elevenlabs-dialogue. That file includes the official description, quick start snippet, and voice pairing guidance used as the basis for this overview.
