elevenlabs-voice-isolator
by inferen-shCLI-driven ElevenLabs voice isolator skill for removing background noise and isolating vocals from audio via inference.sh. Ideal for podcast cleanup, interviews, music vocals, noisy recordings, and audio restoration workflows.
Overview
What is elevenlabs-voice-isolator?
The elevenlabs-voice-isolator skill is a command-line audio cleanup tool that uses the ElevenLabs Voice Isolator app through the inference.sh (infsh) CLI. It focuses on removing background noise and isolating spoken voice or vocals from an input audio file.
It is built as a reusable skill inside the inferen-sh/skills repository, so you can call it from compatible agent environments or from your own terminal as long as you have the infsh CLI set up.
Key capabilities
Using the ElevenLabs voice isolator model via infsh, this skill can:
- Remove ambient background noise (room tone, hum, traffic, crowd noise)
- Isolate voices or vocals from a noisy recording
- Clean up podcast tracks and interview recordings
- Improve intelligibility of speech in difficult environments
- Support common audio formats (WAV, MP3, FLAC, OGG, AAC)
- Handle long recordings (up to 1 hour, 500MB per file as indicated in the skill docs)
Who is this skill for?
Use elevenlabs-voice-isolator if you:
- Record podcasts and want cleaner voice tracks without manual noise reduction
- Capture remote interviews and need to reduce background noise from guests
- Work with music demos or vocal takes and want to better isolate the vocal line
- Maintain audio archives and want basic speech-focused restoration
- Build AI agents or automation that must clean audio on the fly using a CLI tool
If you already use ffmpeg or a DAW but want a higher-level voice isolation step accessible from the terminal or an agent, this skill fits that niche.
When it’s a good fit (and when it isn’t)
A good fit when:
- Your main goal is voice isolation or speech cleanup, not full multitrack audio mixing.
- You are comfortable running CLI commands (Bash) and working with URLs or local files.
- You can install and authenticate the inference.sh CLI (
infsh).
Not the best fit when:
- You need deep editing, multitrack mixing, or effects chains inside a GUI DAW.
- Your workflow is entirely offline and you cannot use the
infshCLI or external model calls. - You require fine-grained, frame-level control over the DSP process instead of a model-driven isolator.
How to Use
Prerequisites
Before using elevenlabs-voice-isolator, make sure you have:
-
inference.sh CLI (
infsh) installed- The skill’s quick start references
infshand links to CLI install instructions. - Follow the latest installation instructions from:
https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md
- The skill’s quick start references
-
Access to the ElevenLabs Voice Isolator app via infsh
- The skill calls
elevenlabs/voice-isolatorthroughinfsh app run.
- The skill calls
-
Bash-capable environment
- The skill’s
allowed-toolsincludeBash(infsh *), so it is designed for Bash shells and CLI workflows.
- The skill’s
Basic installation in an agent skills environment
If you are using an environment that supports npx skills and the inferen-sh/skills repository, you can add the skill with:
npx skills add https://github.com/inferen-sh/skills --skill elevenlabs-voice-isolator
``
This makes the elevenlabs-voice-isolator skill available alongside other tools from the same repo. Once added, your agent or tooling can invoke the underlying `infsh` commands defined by the skill.
### Log in to inference.sh
Before running any audio isolation, authenticate the CLI:
```bash
infsh login
Follow the prompts to complete login. This step is required for the subsequent infsh app run commands to work.
Run a simple voice isolation command
The core usage pattern for elevenlabs-voice-isolator through infsh looks like this:
infsh app run elevenlabs/voice-isolator --input '{"audio": "https://noisy-recording.mp3"}'
Replace https://noisy-recording.mp3 with the URL to your own noisy audio file. The app processes the input and returns a response (typically JSON) with references to the cleaned audio.
Supported audio formats and limits
According to the skill documentation, the ElevenLabs voice isolator supports:
- WAV – up to 500MB, max 1 hour
- MP3 – up to 500MB, max 1 hour
- FLAC – up to 500MB, max 1 hour
- OGG – up to 500MB, max 1 hour
- AAC – up to 500MB, max 1 hour
For best stability, stay within these sizes and durations when preparing audio for elevenlabs-voice-isolator.
Example: Clean up a podcast recording
This example mirrors the skill’s own quick-start scenario for podcast cleanup:
# Remove background noise from a podcast recording
infsh app run elevenlabs/voice-isolator --input '{"audio": "https://noisy-podcast.mp3"}'
Use this pattern for any spoken-word content where you want clearer narration or dialogue. Host your file somewhere accessible over HTTPS (or follow current infsh guidance for local file usage if supported in your environment).
Example: Clean an interview recording
To improve an interview with room noise or street sounds, adjust the input URL:
infsh app run elevenlabs/voice-isolator --input '{"audio": "https://noisy-interview-file.mp3"}'
You can integrate this command into scripts that automatically clean every new interview file before editing.
Integrating with your own tools and agents
Because elevenlabs-voice-isolator is defined as a skill in inferen-sh/skills:
- Agents: An AI agent that can call
Bash(infsh *)can use this skill to clean audio as part of a pipeline (for example, isolation → transcription → summarization). - CLI pipelines: You can wrap
infsh app run elevenlabs/voice-isolatorinside shell scripts, CI workflows, or batch processing tools. - Audio post-production: Use this as a pre-processing step before importing the cleaned file into a DAW or editor like Audacity, Reaper, or Adobe Audition.
Files and configuration to inspect
Within the inferen-sh/skills repository, open:
tools/audio/elevenlabs-voice-isolator/SKILL.md
This file describes the skill, its description, and the example usage commands. There is no complex per-user configuration exposed in the skill file, but the CLI and app may offer additional options documented elsewhere in the inference.sh ecosystem.
FAQ
What does elevenlabs-voice-isolator actually do to my audio?
The elevenlabs-voice-isolator skill sends your audio to the ElevenLabs Voice Isolator model via the inference.sh CLI. The model focuses on separating and enhancing the voice while reducing background noise. The result is an audio output where speech or vocals are clearer and less noisy, suitable for podcasts, interviews, and similar content.
Do I need the inference.sh CLI to use elevenlabs-voice-isolator?
Yes. The published quick start shows usage through the inference.sh CLI (infsh). You must install and authenticate infsh before running the example commands or integrating the skill into an agent.
Which audio formats can I process?
Based on the skill’s documentation, elevenlabs-voice-isolator supports:
- WAV, MP3, FLAC, OGG, and AAC
- Up to 500MB file size and 1 hour duration per file
If your files exceed these limits, trim or downsample them before processing.
Can I run elevenlabs-voice-isolator on local files instead of URLs?
The examples in SKILL.md use HTTPS URLs for the audio field. Whether local paths are supported depends on current infsh capabilities and configuration. Check the latest inference.sh CLI documentation for how to reference local files (for example, via upload or local path conventions) and adapt your --input argument accordingly.
Is elevenlabs-voice-isolator suitable for music production?
It can be helpful for isolating vocals or cleaning noisy demo recordings, but it is not a full music production suite. Use it as a pre-processing or utility step, then finish detailed mixing and mastering in your DAW.
How does this differ from traditional noise reduction in a DAW?
Traditional DAW noise reduction often requires noise prints, manual tuning, and real-time monitoring. elevenlabs-voice-isolator is a model-based, batch-style process accessed via CLI. You pass an audio file, the model performs isolation and noise removal, and you receive a processed output. This is convenient for automated or large-scale cleanup, especially when paired with agents or scripts.
What if I just want a simple denoise filter without voice isolation?
The elevenlabs-voice-isolator skill focuses on voice isolation and background removal together. If you only need basic denoising or EQ, a local ffmpeg filter or DAW plugin may be simpler. Use this skill when you specifically want voice separation and enhanced speech clarity driven by the ElevenLabs model.
Where can I learn more or troubleshoot issues?
For the most accurate and current details:
- Open
tools/audio/elevenlabs-voice-isolator/SKILL.mdin theinferen-sh/skillsrepository. - Review the general
infshinstallation and usage guide atcli-install.mdin the same repo. - Consult inference.sh and ElevenLabs documentation for service-specific limits, authentication, and error codes.
If something fails, start by confirming infsh login succeeds, your audio URL is reachable, and your file respects the supported formats and size/duration limits.
