ai-music-generation

by inferen-sh

Generate AI music and full songs from text prompts using ElevenLabs Music, Diffrythm, and Tencent Song Generation via the inference.sh CLI. Ideal for background tracks, soundtracks, social clips, podcasts, and royalty-free music. Supports fast song generation, instrumentals, and full vocal songs.

Stars0

Favorites0

Comments0

AddedMar 27, 2026

CategoryVoice Generation

Install Command

npx skills add https://github.com/inferen-sh/skills --skill ai-music-generation

Audio Video Marketing Ai Cli

Overview

What is ai-music-generation?

The ai-music-generation skill lets you generate original music and full songs from simple text prompts using the inference.sh CLI (infsh). It connects your agent or CLI workflow to multiple AI music models, so you can quickly create background tracks, intros, jingles, and full vocal songs without leaving your terminal.

Under the hood, ai-music-generation calls hosted apps on inference.sh, giving you a clean, repeatable way to script and automate music creation.

Key capabilities

With ai-music-generation you can:

Turn text prompts into music: Describe genre, mood, tempo, and instrumentation in natural language.
Generate full songs or short clips: Create quick stings for social media or longer tracks for videos and podcasts.
Choose between multiple models (via inference.sh apps):
- ElevenLabs Music (elevenlabs/music): Up to ~10 minutes, commercial-use friendly licensing.
- Diffrythm (infsh/diffrythm): Fast text-to-song generation, good for rapid iteration.
- Tencent Song Generation (infsh/tencent-song-generation): Full songs with vocals.
Create different formats of audio:
- Instrumentals
- Backing tracks
- Full vocal songs
- Ambient soundtracks and loops

Who is this skill for?

ai-music-generation is a good fit if you:

Produce YouTube, TikTok, or social content and need quick, unique background music.
Make podcasts and want intros, outros, and segment stings.
Build games or apps and need dynamic soundtracks or loops.
Work in marketing or creative agencies and want fast demo music for client mockups.
Run agents or automation workflows that need to generate on-demand audio.

It is designed for technical users who are comfortable with the command line and want to integrate AI music generation into scripts, CI pipelines, or agent frameworks.

When is ai-music-generation not a good fit?

This skill may not be ideal if you:

Need a GUI-based music editor or DAW (e.g., Ableton, Logic) – this is CLI-first.
Want to edit or remix existing audio; ai-music-generation is focused on generating new music, not detailed audio editing.
Require offline or on-prem generation – models are accessed remotely via inference.sh.
Are not comfortable managing a CLI tool or external API-like service.

If you mainly need fine-grained waveform editing, multi-track mixing, or mastering, combine this skill with a traditional audio editor; use ai-music-generation only for the creation step.

How to Use

Prerequisites

Before installing the ai-music-generation skill, make sure you have:

Node.js and npx available (to install the skill into your agent skills setup).
The inference.sh CLI (infsh) installed and configured.

To install the inference.sh CLI, follow the official instructions from the repository:

Install guide: https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md

Once infsh is installed, run:

infsh login

and complete the login flow so the CLI can access the music models.

Install the ai-music-generation skill

Use npx to add the skill from the inferen-sh/skills repository:

npx skills add https://github.com/inferen-sh/skills --skill ai-music-generation

This pulls the ai-music-generation skill metadata and supporting files into your local skills environment, so your agents or tools can call it.

Recommended files to review after installation:

SKILL.md – high-level description and supported tools.
Any nearby tools/audio/ utilities in the repository – useful for broader audio workflows.

Quick start: generate your first AI song

Once infsh is logged in, you can immediately generate a track using the Diffrythm model, which is optimised for fast text-to-song creation.

Run this from your terminal:

infsh app run infsh/diffrythm --input '{"prompt": "upbeat electronic dance track"}'

What this does:

infsh app run infsh/diffrythm selects the Diffrythm music app.
--input '{"prompt": "..."}' passes a JSON payload with your prompt text.
The app returns an audio file (or URL) you can play, download, or feed into your pipeline.

You can change the prompt to control genre, mood, tempo, and more, for example:

infsh app run infsh/diffrythm --input '{"prompt": "cinematic orchestral soundtrack, slow build, inspiring"}'

Choosing the right model

The ai-music-generation skill surfaces three main music models via inference.sh:

ElevenLabs Music (`elevenlabs/music`)

Best when you need:

Longer tracks (up to around 10 minutes).
Commercial licensing suitable for business or client work.
High-quality, polished background music.

Example call:

infsh app run elevenlabs/music --input '{"prompt": "lofi chillhop beat with warm piano and vinyl crackle"}'

Diffrythm (`infsh/diffrythm`)

Best when you need:

Fast feedback and iteration on ideas.
Short to medium-length songs for social clips or concept demos.

Example call:

infsh app run infsh/diffrythm --input '{"prompt": "high-energy rock track with driving guitars"}'

Tencent Song Generation (`infsh/tencent-song-generation`)

Best when you need:

Full songs with vocals, not just instrumentals.
More song-like structures for demos or concept pieces.

Example call:

infsh app run infsh/tencent-song-generation --input '{"prompt": "emotional pop ballad with powerful female vocals"}'

Integrating with agents and workflows

Once the ai-music-generation skill is added to your skills setup, you can:

Expose it as a tool an LLM-based agent can call when it needs music.
Wire it into scripts that:
- Take a text brief (e.g., a marketing campaign description).
- Generate several prompt variations.
- Call infsh with different models.
- Save the resulting audio into a content folder or asset pipeline.

A simple CLI-oriented workflow might look like:

Accept a description and target duration from the user.
Build a structured JSON --input for the chosen app.
Run infsh app run ... from your script.
Store the output file path and optionally log metadata for reuse.

Because all calls go through infsh, it is easy to integrate this into CI jobs, cron tasks, or chat-style agents that respond with generated music links.

Best practices for prompts

To get better results from ai-music-generation models, try prompts that include:

Genre: "lofi hip hop", "cinematic orchestral", "synthwave".
Mood: "relaxing", "dark and tense", "uplifting".
Tempo / energy: "slow and atmospheric", "high energy", "mid-tempo groove".
Key elements: "warm piano", "heavy bass", "female vocals", "acoustic guitar".
Use case: "for a podcast intro", "for a game boss fight", "for a product launch video".

Example prompt:

infsh app run infsh/diffrythm --input '{
  "prompt": "driving synthwave track, nostalgic 80s vibe, steady 120 bpm, for a tech product trailer"
}'

FAQ

What does ai-music-generation actually install?

ai-music-generation adds a skill definition (from inferen-sh/skills) that describes how an agent can use the inference.sh CLI to call supported music-generation apps. It does not install the music models themselves; those are hosted and accessed remotely via infsh.

Do I need the inference.sh CLI to use ai-music-generation?

Yes. The skill relies on the inference.sh CLI (infsh) to communicate with the AI music models. Without infsh installed, logged in, and configured, calls to the underlying apps (like infsh/diffrythm or elevenlabs/music) will not work.

Which AI music models are supported?

ai-music-generation is built around these models available via inference.sh:

ElevenLabs Music (elevenlabs/music) – longer tracks, commercial-friendly licensing.
Diffrythm (infsh/diffrythm) – fast, general-purpose song generation.
Tencent Song Generation (infsh/tencent-song-generation) – full songs with vocals.

You select the model by choosing the appropriate app ID in your infsh app run command.

Can I use ai-music-generation for commercial projects?

The skill itself is just an integration layer. Whether you can use the generated audio commercially depends on each model’s licensing and the inference.sh terms. The SKILL metadata notes that ElevenLabs Music supports commercial licensing, but you should always review the current terms on:

The inference.sh documentation for each app.
The model provider’s site (e.g., ElevenLabs) for their latest license.

Does this skill edit existing audio files?

No. ai-music-generation focuses on creating new music and songs from text prompts. For editing, mixing, or mastering existing audio, you will need to use other audio-editing tools or DAWs and treat ai-music-generation as the source audio generator.

Can I control song length, structure, or vocals?

The level of control depends on the underlying app:

ElevenLabs Music: supports longer durations (up to around 10 minutes); check its parameters in the inference.sh docs.
Diffrythm: geared toward fast, default-length song generation.
Tencent Song Generation: focused on full songs with vocals.

Where supported, you can add duration or style hints to your prompt or additional fields in the JSON --input. Refer to the specific app’s documentation on inference.sh for all available parameters.

Is ai-music-generation suitable for non-technical users?

Not directly. ai-music-generation assumes you are comfortable with:

Running CLI commands.
Editing JSON in --input arguments.
Installing and configuring infsh.

Non-technical users will typically interact with a UI, chatbot, or custom tool that sits on top of this skill, while developers connect that interface to ai-music-generation under the hood.

How do I troubleshoot if music generation fails?

If a command fails:

Confirm infsh is installed and on your PATH.
Run infsh login again to ensure your session is valid.
Check your command syntax, especially JSON quotes in --input.

Try a simple prompt with a known app, for example:

infsh app run infsh/diffrythm --input '{"prompt": "simple piano melody"}'

Review any error messages from infsh – they usually indicate authentication, quota, or input-format issues.

If problems persist, consult the main inferen-sh/skills repository and inference.sh documentation for current limits or service status.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

elevenlabs-dialogue

by inferen-sh

Generate polished multi-speaker dialogue audio with ElevenLabs via the inference.sh CLI. Turn structured scripts into natural-sounding conversations with multiple voices in a single file for podcasts, audiobooks, explainers, tutorials, character dialogue, and video scripts.

Voice Generation

Favorites 0GitHub 0

elevenlabs-music

by inferen-sh

Generate original AI music from text prompts using the inference.sh CLI and ElevenLabs. Control duration, style, and mood to create royalty-free background music, soundtracks, jingles, podcasts beds, and game audio directly from your terminal.

Audio Editing

Favorites 0GitHub 0

elevenlabs-voice-changer

by inferen-sh

ElevenLabs voice changer skill using the inference.sh CLI (infsh) to transform recorded speech into a different synthetic voice while preserving content and emotion. Supports eleven_multilingual_sts_v2 (70+ languages) and eleven_english_sts_v2 for speech-to-speech, accent change, and voice disguise in content creation, dubbing, and character voices.

Voice Generation

Favorites 0GitHub 0

elevenlabs-dubbing

by inferen-sh

elevenlabs-dubbing lets you automatically dub and translate audio or video into 29 languages using the inference.sh CLI, preserving the original speakers’ voices. Ideal for video editors, podcasters, and localization teams who need fast, high‑quality multilingual versions of existing content.

Video Editing

Favorites 0GitHub 0

elevenlabs-stt

by inferen-sh

High-accuracy ElevenLabs speech-to-text via inference.sh CLI using Scribe v1/v2 models. Supports transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, and subtitle generation for meetings, podcasts, and other audio workflows.

Audio Editing

Favorites 0GitHub 0

elevenlabs-tts

by inferen-sh

ElevenLabs text-to-speech via inference.sh CLI, with 22+ premium voices, multilingual support, and fast model options for production voice generation workflows.

Voice Generation

Favorites 0GitHub 0

ai-voice-cloning

by inferen-sh

ai-voice-cloning is an inference.sh-based skill for AI voice generation, text-to-speech, and voice cloning from the CLI. It wraps ElevenLabs, Kokoro TTS, DIA, Chatterbox, Higgs, and VibeVoice models for natural speech, multi-voice narration, and voice transformation for audio and video projects.

Voice Generation

Favorites 0GitHub 0

dialogue-audio

by inferen-sh

Create realistic multi-speaker dialogue audio with Dia TTS and ElevenLabs via the inference.sh CLI. The dialogue-audio skill helps you control speakers, emotion, pacing, and conversation flow for podcasts, audiobooks, explainers, character scenes, and other conversational content.

Voice Generation

Favorites 0GitHub 0

ai-music-generation

Overview

What is ai-music-generation?

Key capabilities

Who is this skill for?

When is ai-music-generation not a good fit?

How to Use

Prerequisites

Install the ai-music-generation skill

Quick start: generate your first AI song

Choosing the right model

ElevenLabs Music (elevenlabs/music)

Diffrythm (infsh/diffrythm)

Tencent Song Generation (infsh/tencent-song-generation)

Integrating with agents and workflows

Best practices for prompts

FAQ

What does ai-music-generation actually install?

Do I need the inference.sh CLI to use ai-music-generation?

Which AI music models are supported?

Can I use ai-music-generation for commercial projects?

Does this skill edit existing audio files?

Can I control song length, structure, or vocals?

Is ai-music-generation suitable for non-technical users?

How do I troubleshoot if music generation fails?

Ratings & Reviews

ElevenLabs Music (`elevenlabs/music`)

Diffrythm (`infsh/diffrythm`)

Tencent Song Generation (`infsh/tencent-song-generation`)