I

elevenlabs-tts

by inferen-sh

ElevenLabs text-to-speech via inference.sh CLI, with 22+ premium voices, multilingual support, and fast model options for production voice generation workflows.

Stars0
Favorites0
Comments0
AddedMar 27, 2026
CategoryVoice Generation
Install Command
npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-tts
Overview

Overview

What is elevenlabs-tts?

The elevenlabs-tts skill connects the ElevenLabs text-to-speech API to the inference.sh (infsh) CLI, giving you a fast, scriptable way to turn text into high-quality speech. It exposes ElevenLabs models and voice options as a reusable tool inside the inferen-sh skills ecosystem.

This skill focuses on premium, natural-sounding voices with support for 32 languages and multiple performance tiers so you can choose between maximum quality or ultra-low latency.

Key capabilities

  • Text-to-speech generation from plain text
  • 22+ premium voices accessible via the CLI
  • Model selection for different speed/quality trade-offs:
    • eleven_multilingual_v2 – highest quality, multilingual
    • eleven_turbo_v2_5 – balanced speed and quality
    • eleven_flash_v2_5 – ultra-fast, low latency
  • Voice selection from the ElevenLabs voice library
  • Designed for CLI and automation workflows using infsh

Who is elevenlabs-tts for?

This skill is aimed at users who:

  • Already use, or are comfortable with, a command line interface
  • Want to automate or batch-produce voiceovers and narration
  • Need consistent, reusable voices across projects
  • Work within the inference.sh / inferen-sh skills ecosystem

Typical users include:

  • Video editors and creators who need voiceovers for YouTube, product demos, and explainer videos
  • Podcasters and audio producers generating intros, outros, and segments
  • E-learning and training teams producing course narration
  • Developers building IVR, assistants, or accessibility features that require natural speech

When is elevenlabs-tts a good fit?

Use elevenlabs-tts when you:

  • Need reliable, production-ready voices rather than experimental models
  • Want to run everything from the CLI rather than a web UI
  • Need to script or schedule TTS generation as part of CI, pipelines, or batch jobs
  • Are already using, or willing to install, the inference.sh CLI (infsh)

It’s not an ideal fit if you:

  • Only want a point-and-click web interface for manual use
  • Need fine-grained audio editing (cutting, mixing, effects) inside the skill itself — you’ll generate audio here, then edit in a DAW (e.g., Audacity, Reaper, Premiere)
  • Cannot use external CLIs or outbound network access in your environment

How to Use

Prerequisites

Before using elevenlabs-tts, make sure you have:

  • inference.sh CLI (infsh) installed
  • A working infsh login configured
  • Access to the ElevenLabs TTS app through inference.sh

You can find CLI install instructions in the repository’s cli-install.md referenced from SKILL.md.

Step 1 – Install the elevenlabs-tts skill

From a compatible Agent Skills / inferen-sh environment, add the skill:

npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-tts

This pulls the elevenlabs-tts skill from the inferen-sh/skills repository and registers it so your agents or workflows can call it.

Step 2 – Log in with the inference.sh CLI

The skill relies on the infsh CLI to talk to the ElevenLabs backend.

infsh login

Follow the prompts to authenticate. Once you’re logged in, the CLI can run the ElevenLabs TTS app on your behalf.

Step 3 – Run a basic text-to-speech conversion

The quickest way to see elevenlabs-tts in action is by calling the ElevenLabs TTS app directly via infsh:

infsh app run elevenlabs/tts --input '{"text": "Hello, welcome to our product demo.", "voice": "aria"}'

This example:

  • Sends the text "Hello, welcome to our product demo."
  • Uses the "aria" voice (a sample voice ID from the ElevenLabs voice library)
  • Returns generated speech audio (e.g., as a file or stream depending on your infsh configuration)

Once the skill is integrated, your agents can call this same capability programmatically.

Step 4 – Choose the right ElevenLabs model

The elevenlabs-tts skill supports multiple models, each tuned for a specific balance of quality and latency:

  • eleven_multilingual_v2

    • Best for: highest quality, long-form content, and 32-language support
    • Typical use: audiobooks, course narration, branded voiceovers
  • eleven_turbo_v2_5

    • Best for: a balanced mix of quality and speed
    • Typical use: product demos, marketing videos, internal training
  • eleven_flash_v2_5

    • Best for: ultra-low latency responses where speed is critical
    • Typical use: chatbots, assistants, IVR systems that must respond quickly

How you specify the model may depend on your infsh app run configuration or agent wiring. Check your local toolchain docs on how to pass model IDs as parameters when leveraging this skill.

Step 5 – Integrate into your workflows

Once installed and tested, you can:

  • Wire elevenlabs-tts into agent prompts so text responses are automatically converted to speech
  • Use it in CLI scripts to batch-generate voiceovers from a list of text files
  • Add it to CI pipelines to automatically produce updated narration when documentation or scripts change

For deeper context on how the skill is defined and any helper logic, open the following repo file:

  • tools/audio/elevenlabs-tts/SKILL.md

That file documents the skill metadata, description, and any specific notes about allowed tools (it currently allows Bash via infsh).


FAQ

What does the elevenlabs-tts skill actually do?

The elevenlabs-tts skill provides a preconfigured way for agents and CLI workflows to call ElevenLabs text-to-speech through the inference.sh CLI. It focuses on generating natural-sounding speech audio from plain text, with access to multiple models and voices.

Do I need the inference.sh CLI to use elevenlabs-tts?

Yes. The repository’s SKILL.md explicitly references infsh and the inference.sh CLI as a requirement. You must install the CLI, run infsh login, and ensure it can access the elevenlabs/tts app.

What kinds of projects is elevenlabs-tts best for?

This skill is well-suited for:

  • Voiceovers for product demos, tutorials, and marketing videos
  • Audiobooks and long-form narration, especially using eleven_multilingual_v2
  • E-learning and training narration
  • Podcasts and trailers (intros, outros, scripted segments)
  • Accessibility and IVR systems that need clear, natural voices

Can I use elevenlabs-tts for real-time applications?

For more responsive use cases, choose eleven_turbo_v2_5 or eleven_flash_v2_5, which are designed for lower latency than the highest-quality multilingual model. Actual “real-time” behavior will depend on your network and integration, but these models are intended to support faster turnarounds.

How many voices does elevenlabs-tts support?

The skill description in SKILL.md notes 22+ premium voices. You can select among these using the voice field (for example, "aria") when calling infsh app run elevenlabs/tts or when wiring the skill into your agents.

Does elevenlabs-tts support multiple languages?

Yes. The eleven_multilingual_v2 model is described as supporting 32 languages, making elevenlabs-tts suitable for multilingual narration and global products. Other models may be more optimized for latency but still offer broad language support through ElevenLabs.

Where can I see how the skill is configured?

Look in the inferen-sh/skills repository under:

  • tools/audio/elevenlabs-tts/SKILL.md

This file contains the official description, allowed tools, and pointers to installation information for the inference.sh CLI.

Can I edit audio inside elevenlabs-tts?

No. The elevenlabs-tts skill focuses on audio generation, not editing. You’ll typically:

  1. Use elevenlabs-tts to generate clean speech audio from text.
  2. Import that audio into a DAW or video editor (e.g., Audacity, Reaper, Premiere, Resolve) for cutting, mixing, and adding effects.

What if I only want a web UI, not a CLI?

If you prefer a purely web-based workflow, elevenlabs-tts may not be the best fit, because it is built around the inference.sh CLI and agent skills ecosystem. In that case, consider using ElevenLabs’ own web dashboard or other UI-focused tools.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...