ai-music-generation
by inferen-shGenerate AI music and full songs from text prompts using ElevenLabs Music, Diffrythm, and Tencent Song Generation via the inference.sh CLI. Ideal for background tracks, soundtracks, social clips, podcasts, and royalty-free music. Supports fast song generation, instrumentals, and full vocal songs.
Overview
What is ai-music-generation?
The ai-music-generation skill lets you generate original music and full songs from simple text prompts using the inference.sh CLI (infsh). It connects your agent or CLI workflow to multiple AI music models, so you can quickly create background tracks, intros, jingles, and full vocal songs without leaving your terminal.
Under the hood, ai-music-generation calls hosted apps on inference.sh, giving you a clean, repeatable way to script and automate music creation.
Key capabilities
With ai-music-generation you can:
- Turn text prompts into music: Describe genre, mood, tempo, and instrumentation in natural language.
- Generate full songs or short clips: Create quick stings for social media or longer tracks for videos and podcasts.
- Choose between multiple models (via inference.sh apps):
- ElevenLabs Music (
elevenlabs/music): Up to ~10 minutes, commercial-use friendly licensing. - Diffrythm (
infsh/diffrythm): Fast text-to-song generation, good for rapid iteration. - Tencent Song Generation (
infsh/tencent-song-generation): Full songs with vocals.
- ElevenLabs Music (
- Create different formats of audio:
- Instrumentals
- Backing tracks
- Full vocal songs
- Ambient soundtracks and loops
Who is this skill for?
ai-music-generation is a good fit if you:
- Produce YouTube, TikTok, or social content and need quick, unique background music.
- Make podcasts and want intros, outros, and segment stings.
- Build games or apps and need dynamic soundtracks or loops.
- Work in marketing or creative agencies and want fast demo music for client mockups.
- Run agents or automation workflows that need to generate on-demand audio.
It is designed for technical users who are comfortable with the command line and want to integrate AI music generation into scripts, CI pipelines, or agent frameworks.
When is ai-music-generation not a good fit?
This skill may not be ideal if you:
- Need a GUI-based music editor or DAW (e.g., Ableton, Logic) – this is CLI-first.
- Want to edit or remix existing audio; ai-music-generation is focused on generating new music, not detailed audio editing.
- Require offline or on-prem generation – models are accessed remotely via inference.sh.
- Are not comfortable managing a CLI tool or external API-like service.
If you mainly need fine-grained waveform editing, multi-track mixing, or mastering, combine this skill with a traditional audio editor; use ai-music-generation only for the creation step.
How to Use
Prerequisites
Before installing the ai-music-generation skill, make sure you have:
- Node.js and npx available (to install the skill into your agent skills setup).
- The inference.sh CLI (
infsh) installed and configured.
To install the inference.sh CLI, follow the official instructions from the repository:
- Install guide:
https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md
Once infsh is installed, run:
infsh login
and complete the login flow so the CLI can access the music models.
Install the ai-music-generation skill
Use npx to add the skill from the inferen-sh/skills repository:
npx skills add https://github.com/inferen-sh/skills --skill ai-music-generation
This pulls the ai-music-generation skill metadata and supporting files into your local skills environment, so your agents or tools can call it.
Recommended files to review after installation:
SKILL.md– high-level description and supported tools.- Any nearby
tools/audio/utilities in the repository – useful for broader audio workflows.
Quick start: generate your first AI song
Once infsh is logged in, you can immediately generate a track using the Diffrythm model, which is optimised for fast text-to-song creation.
Run this from your terminal:
infsh app run infsh/diffrythm --input '{"prompt": "upbeat electronic dance track"}'
What this does:
infsh app run infsh/diffrythmselects the Diffrythm music app.--input '{"prompt": "..."}'passes a JSON payload with your prompt text.- The app returns an audio file (or URL) you can play, download, or feed into your pipeline.
You can change the prompt to control genre, mood, tempo, and more, for example:
infsh app run infsh/diffrythm --input '{"prompt": "cinematic orchestral soundtrack, slow build, inspiring"}'
Choosing the right model
The ai-music-generation skill surfaces three main music models via inference.sh:
ElevenLabs Music (elevenlabs/music)
Best when you need:
- Longer tracks (up to around 10 minutes).
- Commercial licensing suitable for business or client work.
- High-quality, polished background music.
Example call:
infsh app run elevenlabs/music --input '{"prompt": "lofi chillhop beat with warm piano and vinyl crackle"}'
Diffrythm (infsh/diffrythm)
Best when you need:
- Fast feedback and iteration on ideas.
- Short to medium-length songs for social clips or concept demos.
Example call:
infsh app run infsh/diffrythm --input '{"prompt": "high-energy rock track with driving guitars"}'
Tencent Song Generation (infsh/tencent-song-generation)
Best when you need:
- Full songs with vocals, not just instrumentals.
- More song-like structures for demos or concept pieces.
Example call:
infsh app run infsh/tencent-song-generation --input '{"prompt": "emotional pop ballad with powerful female vocals"}'
Integrating with agents and workflows
Once the ai-music-generation skill is added to your skills setup, you can:
- Expose it as a tool an LLM-based agent can call when it needs music.
- Wire it into scripts that:
- Take a text brief (e.g., a marketing campaign description).
- Generate several prompt variations.
- Call
infshwith different models. - Save the resulting audio into a content folder or asset pipeline.
A simple CLI-oriented workflow might look like:
- Accept a description and target duration from the user.
- Build a structured JSON
--inputfor the chosen app. - Run
infsh app run ...from your script. - Store the output file path and optionally log metadata for reuse.
Because all calls go through infsh, it is easy to integrate this into CI jobs, cron tasks, or chat-style agents that respond with generated music links.
Best practices for prompts
To get better results from ai-music-generation models, try prompts that include:
- Genre: "lofi hip hop", "cinematic orchestral", "synthwave".
- Mood: "relaxing", "dark and tense", "uplifting".
- Tempo / energy: "slow and atmospheric", "high energy", "mid-tempo groove".
- Key elements: "warm piano", "heavy bass", "female vocals", "acoustic guitar".
- Use case: "for a podcast intro", "for a game boss fight", "for a product launch video".
Example prompt:
infsh app run infsh/diffrythm --input '{
"prompt": "driving synthwave track, nostalgic 80s vibe, steady 120 bpm, for a tech product trailer"
}'
FAQ
What does ai-music-generation actually install?
ai-music-generation adds a skill definition (from inferen-sh/skills) that describes how an agent can use the inference.sh CLI to call supported music-generation apps. It does not install the music models themselves; those are hosted and accessed remotely via infsh.
Do I need the inference.sh CLI to use ai-music-generation?
Yes. The skill relies on the inference.sh CLI (infsh) to communicate with the AI music models. Without infsh installed, logged in, and configured, calls to the underlying apps (like infsh/diffrythm or elevenlabs/music) will not work.
Which AI music models are supported?
ai-music-generation is built around these models available via inference.sh:
- ElevenLabs Music (
elevenlabs/music) – longer tracks, commercial-friendly licensing. - Diffrythm (
infsh/diffrythm) – fast, general-purpose song generation. - Tencent Song Generation (
infsh/tencent-song-generation) – full songs with vocals.
You select the model by choosing the appropriate app ID in your infsh app run command.
Can I use ai-music-generation for commercial projects?
The skill itself is just an integration layer. Whether you can use the generated audio commercially depends on each model’s licensing and the inference.sh terms. The SKILL metadata notes that ElevenLabs Music supports commercial licensing, but you should always review the current terms on:
- The inference.sh documentation for each app.
- The model provider’s site (e.g., ElevenLabs) for their latest license.
Does this skill edit existing audio files?
No. ai-music-generation focuses on creating new music and songs from text prompts. For editing, mixing, or mastering existing audio, you will need to use other audio-editing tools or DAWs and treat ai-music-generation as the source audio generator.
Can I control song length, structure, or vocals?
The level of control depends on the underlying app:
- ElevenLabs Music: supports longer durations (up to around 10 minutes); check its parameters in the inference.sh docs.
- Diffrythm: geared toward fast, default-length song generation.
- Tencent Song Generation: focused on full songs with vocals.
Where supported, you can add duration or style hints to your prompt or additional fields in the JSON --input. Refer to the specific app’s documentation on inference.sh for all available parameters.
Is ai-music-generation suitable for non-technical users?
Not directly. ai-music-generation assumes you are comfortable with:
- Running CLI commands.
- Editing JSON in
--inputarguments. - Installing and configuring
infsh.
Non-technical users will typically interact with a UI, chatbot, or custom tool that sits on top of this skill, while developers connect that interface to ai-music-generation under the hood.
How do I troubleshoot if music generation fails?
If a command fails:
-
Confirm
infshis installed and on yourPATH. -
Run
infsh loginagain to ensure your session is valid. -
Check your command syntax, especially JSON quotes in
--input. -
Try a simple prompt with a known app, for example:
infsh app run infsh/diffrythm --input '{"prompt": "simple piano melody"}' -
Review any error messages from
infsh– they usually indicate authentication, quota, or input-format issues.
If problems persist, consult the main inferen-sh/skills repository and inference.sh documentation for current limits or service status.
