ai-avatar-video
by inferen-shGenerate AI avatar and talking head videos from an image and audio track using the inference.sh CLI. ai-avatar-video wraps OmniHuman, Fabric, and PixVerse Lipsync apps for audio-driven avatars, lipsync videos, and virtual presenters, ideal for marketing, explainers, and social content workflows.
Overview
What is ai-avatar-video?
ai-avatar-video is a CLI-focused skill for creating AI avatar and talking head videos using the inference.sh platform. It lets you send an image and an audio file to pre-built video apps (OmniHuman, Fabric, PixVerse Lipsync) and receive a rendered video where the avatar speaks and lip-syncs to your audio.
This skill is designed for Bash-based workflows and uses the infsh CLI under the hood.
Key capabilities
- AI talking head generation from a single portrait image
- Audio-driven avatars: map voiceover MP3/other supported audio to a digital human
- Lipsync videos using dedicated lipsync models
- Virtual presenters and AI presenters for explainers, product tours, or announcements
- Model choice via inference.sh apps:
- OmniHuman 1.5 – multi-character, higher quality
- OmniHuman 1.0 – single-character avatar
- Fabric 1.0 – “image talks” lipsync
- PixVerse Lipsync – focused lipsync generation
Who is ai-avatar-video for?
ai-avatar-video is a good fit if you:
- Produce marketing videos, short promos, or social media content
- Need AI spokesperson or virtual presenter clips without hiring talent
- Want to prototype digital humans or virtual influencers from still images
- Prefer CLI and automation (Bash, scripting, CI pipelines) over manual web tools
It is less suitable if you:
- Need a full video editor (timelines, effects, multi-track editing)
- Require a purely offline workflow with no external API calls
- Want a GUI-only solution instead of command-line tools
How it works at a glance
- Install and log in to the
infshCLI. - Choose a model (e.g.,
bytedance/omnihuman-1-5). - Provide an
image_urlandaudio_urlin JSON. - Run
infsh app run ...and download the resulting video.
ai-avatar-video focuses on the video generation step and can be embedded inside larger automation or post-production pipelines.
How to Use
Installation and prerequisites
1. Install the skill
Use the skills CLI to add the skill to your environment:
npx skills add https://github.com/inferen-sh/skills --skill ai-avatar-video
This pulls the ai-avatar-video skill definition from the inferen-sh/skills repository under tools/video/ai-avatar-video.
2. Install the inference.sh CLI (infsh)
ai-avatar-video assumes you have the infsh CLI installed and available in your shell. Follow the official instructions:
- CLI install guide:
https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md
After installing, log in:
infsh login
You will be guided through authentication so the CLI can call inference.sh apps.
Basic workflow: create an AI avatar video
1. Prepare your media assets
- Image: A clear, front-facing portrait image hosted at a reachable URL, e.g.
https://portrait.jpg. - Audio: A speech or voiceover file (e.g., MP3) hosted at a reachable URL, e.g.
https://speech.mp3.
You can use object storage, a web server, or any hosting that provides direct URLs.
2. Run OmniHuman 1.5 for a high-quality avatar
Use the bytedance/omnihuman-1-5 app for multi-character and best-quality talking heads:
infsh app run bytedance/omnihuman-1-5 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
The CLI will process the request and print output information, typically including a URL where you can download the generated video.
3. Try alternative models
Switch the app ID to explore different trade-offs.
OmniHuman 1.0 – single-character avatar
infsh app run bytedance/omnihuman-1-0 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
Fabric 1.0 – image talks with lipsync
infsh app run falai/fabric-1-0 --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
PixVerse Lipsync – focused lipsync generation
infsh app run falai/pixverse-lipsync --input '{
"image_url": "https://portrait.jpg",
"audio_url": "https://speech.mp3"
}'
Choose the app based on your quality needs and output style. The exact options and outputs are defined by the respective inference.sh apps.
Integrating ai-avatar-video into workflows
Bash and CLI automation
ai-avatar-video is designed for *Bash (infsh ) use, so it fits well into scripts such as:
- Batch-generating videos from a list of images and voiceovers
- Nightly jobs that produce updated marketing or product videos
- CI/CD steps that render release announcement videos when you tag a release
Example loop (conceptual):
while read image audio; do
infsh app run bytedance/omnihuman-1-5 --input "{\"image_url\": \"$image\", \"audio_url\": \"$audio\"}"
done < avatar_jobs.txt
Combining with editing and publishing tools
The skill focuses on generating the talking-head clip. You can then:
- Bring the output into a video editor for overlays, subtitles, or B-roll
- Feed the clip into social media schedulers or marketing automation
- Use accompanying skills (if available in your environment) for captioning or reformatting
Files and structure to inspect
After installing the skill from the repository, useful references include:
SKILL.md– Core description, quick start commands, and model overviewtools/video/ai-avatar-video/– Location in the repo for context alongside other video tools
Reviewing these files will help you align your implementation with the intended usage patterns.
FAQ
When should I use ai-avatar-video instead of web-based avatar tools?
Use ai-avatar-video when you want CLI-first, scriptable control over avatar video generation. If you are comfortable with Bash and want to plug AI avatar creation into pipelines, build tools, or back-end services, this skill is a strong fit.
If you prefer to design everything visually in the browser and never touch a terminal, a purely web-based product may be more convenient.
Do I need the inference.sh CLI for ai-avatar-video?
Yes. The skill is built around the infsh CLI and the underlying inference.sh apps. You must:
- Install the CLI using the official instructions.
- Run
infsh login. - Use
infsh app run ...commands as shown in the quick start.
Without the CLI, ai-avatar-video cannot call the models it relies on.
Which model should I start with?
For most use cases, start with OmniHuman 1.5 (bytedance/omnihuman-1-5) because it is noted as multi-character and best quality.
You might choose alternatives when:
- OmniHuman 1.0: You only need a simpler, single-character avatar.
- Fabric 1.0: You want a straightforward “image talks with lipsync” style.
- PixVerse Lipsync: You are primarily focused on lipsync behavior.
Experiment across a few clips to see which app fits your visual and timing expectations.
What kind of input image works best?
While specifics depend on the underlying apps, you generally get better results with:
- A clear, front-facing portrait
- Good lighting and visible facial features
- Minimal obstructions (no heavy shadows or occluding objects)
The closer your input matches a clean studio headshot, the more natural the avatar movement and lipsync will tend to look.
Can I automate social media or marketing video production with this skill?
Yes. ai-avatar-video is well suited for:
- Generating recurring marketing updates with an AI presenter
- Creating social media talking-head clips from scripted audio
- Integrating with other CLI tools for resizing, captioning, or uploading
You can orchestrate the entire flow in Bash or your preferred automation tooling, using this skill as the avatar-generation step.
Is ai-avatar-video a full video editor?
No. ai-avatar-video focuses on generating AI avatar / talking-head segments from image + audio using inference.sh apps. It does not replace a full non-linear editor.
For full productions, treat the generated video as one asset in your editing timeline, and use your usual video editing tools for cuts, transitions, titles, and effects.
Where can I see or modify the skill definition?
The skill lives in the inferen-sh/skills repository under:
tools/video/ai-avatar-video
Open SKILL.md for the primary description and quick start. You can browse the directory tree in the repository to understand how this skill sits alongside other CLI tools for video workflows.
