elevenlabs-voice-changer
by inferen-shElevenLabs voice changer skill using the inference.sh CLI (infsh) to transform recorded speech into a different synthetic voice while preserving content and emotion. Supports eleven_multilingual_sts_v2 (70+ languages) and eleven_english_sts_v2 for speech-to-speech, accent change, and voice disguise in content creation, dubbing, and character voices.
Overview
What is elevenlabs-voice-changer?
elevenlabs-voice-changer is a skill that connects the ElevenLabs speech-to-speech voice changer to the inference.sh command-line interface (infsh). It lets you send an existing audio recording and receive the same speech back in a different synthetic voice, while preserving what is said and how it is expressed.
Under the hood, the skill calls the ElevenLabs voice-changer app via infsh app run elevenlabs/voice-changer, so you don’t need to wire up APIs manually. You describe the input audio and the target voice, and the service returns transformed audio.
Key capabilities
- Speech-to-speech conversion – turn any spoken audio into a new voice without re-recording.
- Multilingual support (70+ languages) – via
eleven_multilingual_sts_v2. - English-optimized model – via
eleven_english_sts_v2for higher quality English results. - Accent and style changes – swap accents, tone, or persona using ElevenLabs’ premium voices.
- Voice disguise and privacy – anonymize or mask your real voice for public content.
Who is this skill for?
This skill is a good fit if you:
- Create YouTube, TikTok, or social media content and want to change or upgrade your narration voice.
- Produce podcasts or voiceovers and need quick language, accent, or voice swaps.
- Work in marketing or product explainers and want multiple branded voices without hiring different actors.
- Build AI characters or demos and need consistent, reusable voices.
It is less suitable if you:
- Need a visual GUI-only workflow with timeline editing (it is CLI-focused).
- Require completely offline processing (it depends on inference.sh and ElevenLabs in the cloud).
- Want fine-grained audio engineering tools like EQ, mixing, or multi-track editing; this is focused on voice transformation, not full DAW features.
Models and voice options
The elevenlabs-voice-changer skill exposes the same models described in the repository:
- Multilingual STS v2 – model ID:
eleven_multilingual_sts_v2(default, supports 70+ languages). - English STS v2 – model ID:
eleven_english_sts_v2(optimized for English speech).
It can use the 22+ premium ElevenLabs voices also available in their TTS products, including defaults like:
george– British, authoritative (default voice in the docs).aria– American, conversational.
You select these voices by passing the voice parameter when calling the app.
How to Use
1. Prerequisites and installation
Before using elevenlabs-voice-changer, you must have the inference.sh CLI installed and authenticated.
-
Install inference.sh CLI (
infsh)
Follow the official instructions from the repository:
https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md -
Log in to inference.sh using your account:
infsh login -
Add the skill (Agent Skills Finder / skills registry)
If you are using this as a skill inside the skills collection, add it with:npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-voice-changer
After these steps, your environment is ready to call the ElevenLabs voice changer app via infsh.
2. Basic voice transformation
The quickest way to try elevenlabs-voice-changer is to run the provided example from the skill docs:
infsh login
# Transform voice
infsh app run elevenlabs/voice-changer --input '{"audio": "https://recording.mp3", "voice": "aria"}'
In this example:
audiois a URL pointing to your input recording (e.g., an.mp3file hosted online).voiceis the target ElevenLabs voice ID (ariain this case).
The app processes the recording and returns a new audio file with the same speech content, but in the aria voice.
3. Choosing models and languages
By default, the skill is configured to use:
eleven_multilingual_sts_v2for broad language coverage (70+ languages).
If your use case is strictly English and you want an English-optimized model, configure the app input or your workflow to use:
eleven_english_sts_v2for improved English clarity and prosody.
The exact field for selecting the model is handled inside the ElevenLabs app configuration, but when you choose models, use these IDs as referenced in the skill documentation.
4. Working with different voices and accents
To experiment with different accents or styles, change the voice parameter in your --input JSON.
Examples (pattern):
# British, authoritative
infsh app run elevenlabs/voice-changer --input '{"audio": "https://recording.mp3", "voice": "george"}'
# American, conversational
infsh app run elevenlabs/voice-changer --input '{"audio": "https://recording.mp3", "voice": "aria"}'
You can repeat the same original audio with multiple runs and different voice IDs to quickly audition voices for your project.
5. Integrating into your workflow
Because elevenlabs-voice-changer runs entirely through the CLI, it integrates well with scripted or automated pipelines:
- Batch processing – loop over a folder of audio URLs or pre-uploaded recordings and call
infsh app runrepeatedly. - Content localization – record once, then transform narrations to different accents or voices for different markets.
- Voice anonymization – post-process recorded calls, interviews, or user submissions before publishing.
If you are using a broader agent framework or orchestration layer, you can call this skill as a step in your pipeline wherever “voice conversion” or “dubbing” is required.
6. Files to review in the repository
When you open the skill in the inferen-sh/skills repository, start with:
SKILL.md– high-level description, capabilities, and quick start command you can copy and adapt.
Other common files in the skills repository (like AGENTS.md, metadata.json, and rules/ or scripts/ folders when present in other tools) show how skills fit into larger agent workflows. For elevenlabs-voice-changer, SKILL.md is the main documentation.
FAQ
What does elevenlabs-voice-changer actually do?
elevenlabs-voice-changer uses the ElevenLabs speech-to-speech models, called through the inference.sh CLI, to convert an existing voice recording into a different AI-generated voice. It keeps the wording and emotion of the original but changes how the voice sounds.
How do I install elevenlabs-voice-changer?
You don’t install the skill as a standalone app. Instead, you:
-
Install the
infshCLI using the instructions at:
https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md -
Run
infsh loginto authenticate. -
Optionally, register the skill in your skills setup with:
npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-voice-changer
After that, you can call the ElevenLabs voice changer app with infsh app run elevenlabs/voice-changer.
Do I need an ElevenLabs account to use this?
The skill itself is a wrapper around ElevenLabs’ models running via inference.sh. Any underlying requirements for ElevenLabs usage (such as accounts, credits, or quotas) are handled by your inference.sh and ElevenLabs setup. Check the inference.sh and ElevenLabs documentation for current access and billing details.
Can I run elevenlabs-voice-changer locally without the cloud?
The repository documentation shows the skill running via infsh against an online ElevenLabs app. It does not document a fully offline mode. Expect to need network access to inference.sh and the ElevenLabs backend.
What audio formats can I use as input?
The example uses an .mp3 file served over HTTP ("https://recording.mp3"). The specific format and size limits are governed by the ElevenLabs app itself. For best results, use common web audio formats (like mp3) hosted at a stable URL.
Can I use my own custom voice?
The skill description focuses on the standard ElevenLabs voice set (22+ premium voices) such as george and aria. It does not describe custom voice-training flows. If you need a bespoke voice, consult ElevenLabs’ own documentation to see how custom voices integrate with their speech-to-speech app.
Is this good for real-time voice changing?
The repository shows file-based speech-to-speech usage via CLI, where you provide a recorded file URL and get a processed file back. It does not describe real-time or live call voice conversion, so treat it as an asynchronous, file-based tool rather than a live voice changer.
When should I not use elevenlabs-voice-changer?
Consider other tools if you:
- Need a full DAW or nonlinear editor for detailed audio mixing and mastering.
- Require live, low-latency voice effects for streaming or gaming.
- Must run everything offline without cloud services.
For scripted, repeatable speech-to-speech voice conversion via CLI, elevenlabs-voice-changer is a strong fit.
Where can I see or modify the configuration?
Open the skill in the inferen-sh/skills GitHub repository under:
tools/audio/elevenlabs-voice-changer/
Review SKILL.md there to see the official quick start, models, and voice options, and adapt the example commands to your environment.
