ai-avatar-video

by inferen-sh

Generate AI avatar and talking head videos from an image and audio track using the inference.sh CLI. ai-avatar-video wraps OmniHuman, Fabric, and PixVerse Lipsync apps for audio-driven avatars, lipsync videos, and virtual presenters, ideal for marketing, explainers, and social content workflows.

Stars0

Favorites0

Comments0

AddedMar 27, 2026

CategoryVideo Editing

Install Command

npx skills add https://github.com/inferen-sh/skills --skill ai-avatar-video

Video Audio Marketing Social Media Automation Bash Cli

Overview

What is ai-avatar-video?

ai-avatar-video is a CLI-focused skill for creating AI avatar and talking head videos using the inference.sh platform. It lets you send an image and an audio file to pre-built video apps (OmniHuman, Fabric, PixVerse Lipsync) and receive a rendered video where the avatar speaks and lip-syncs to your audio.

This skill is designed for Bash-based workflows and uses the infsh CLI under the hood.

Key capabilities

AI talking head generation from a single portrait image
Audio-driven avatars: map voiceover MP3/other supported audio to a digital human
Lipsync videos using dedicated lipsync models
Virtual presenters and AI presenters for explainers, product tours, or announcements
Model choice via inference.sh apps:
- OmniHuman 1.5 – multi-character, higher quality
- OmniHuman 1.0 – single-character avatar
- Fabric 1.0 – “image talks” lipsync
- PixVerse Lipsync – focused lipsync generation

Who is ai-avatar-video for?

ai-avatar-video is a good fit if you:

Produce marketing videos, short promos, or social media content
Need AI spokesperson or virtual presenter clips without hiring talent
Want to prototype digital humans or virtual influencers from still images
Prefer CLI and automation (Bash, scripting, CI pipelines) over manual web tools

It is less suitable if you:

Need a full video editor (timelines, effects, multi-track editing)
Require a purely offline workflow with no external API calls
Want a GUI-only solution instead of command-line tools

How it works at a glance

Install and log in to the infsh CLI.
Choose a model (e.g., bytedance/omnihuman-1-5).
Provide an image_url and audio_url in JSON.
Run infsh app run ... and download the resulting video.

ai-avatar-video focuses on the video generation step and can be embedded inside larger automation or post-production pipelines.

How to Use

Installation and prerequisites

1. Install the skill

Use the skills CLI to add the skill to your environment:

npx skills add https://github.com/inferen-sh/skills --skill ai-avatar-video

This pulls the ai-avatar-video skill definition from the inferen-sh/skills repository under tools/video/ai-avatar-video.

2. Install the inference.sh CLI (`infsh`)

ai-avatar-video assumes you have the infsh CLI installed and available in your shell. Follow the official instructions:

CLI install guide: https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md

After installing, log in:

infsh login

You will be guided through authentication so the CLI can call inference.sh apps.

Basic workflow: create an AI avatar video

1. Prepare your media assets

Image: A clear, front-facing portrait image hosted at a reachable URL, e.g. https://portrait.jpg.
Audio: A speech or voiceover file (e.g., MP3) hosted at a reachable URL, e.g. https://speech.mp3.

You can use object storage, a web server, or any hosting that provides direct URLs.

2. Run OmniHuman 1.5 for a high-quality avatar

Use the bytedance/omnihuman-1-5 app for multi-character and best-quality talking heads:

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

The CLI will process the request and print output information, typically including a URL where you can download the generated video.

3. Try alternative models

Switch the app ID to explore different trade-offs.

OmniHuman 1.0 – single-character avatar

infsh app run bytedance/omnihuman-1-0 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Fabric 1.0 – image talks with lipsync

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

PixVerse Lipsync – focused lipsync generation

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Choose the app based on your quality needs and output style. The exact options and outputs are defined by the respective inference.sh apps.

Integrating ai-avatar-video into workflows

Bash and CLI automation

ai-avatar-video is designed for *Bash (infsh ) use, so it fits well into scripts such as:

Batch-generating videos from a list of images and voiceovers
Nightly jobs that produce updated marketing or product videos
CI/CD steps that render release announcement videos when you tag a release

Example loop (conceptual):

while read image audio; do
  infsh app run bytedance/omnihuman-1-5 --input "{\"image_url\": \"$image\", \"audio_url\": \"$audio\"}"
done < avatar_jobs.txt

Combining with editing and publishing tools

The skill focuses on generating the talking-head clip. You can then:

Bring the output into a video editor for overlays, subtitles, or B-roll
Feed the clip into social media schedulers or marketing automation
Use accompanying skills (if available in your environment) for captioning or reformatting

Files and structure to inspect

After installing the skill from the repository, useful references include:

SKILL.md – Core description, quick start commands, and model overview
tools/video/ai-avatar-video/ – Location in the repo for context alongside other video tools

Reviewing these files will help you align your implementation with the intended usage patterns.

FAQ

When should I use ai-avatar-video instead of web-based avatar tools?

Use ai-avatar-video when you want CLI-first, scriptable control over avatar video generation. If you are comfortable with Bash and want to plug AI avatar creation into pipelines, build tools, or back-end services, this skill is a strong fit.

If you prefer to design everything visually in the browser and never touch a terminal, a purely web-based product may be more convenient.

Do I need the inference.sh CLI for ai-avatar-video?

Yes. The skill is built around the infsh CLI and the underlying inference.sh apps. You must:

Install the CLI using the official instructions.
Run infsh login.
Use infsh app run ... commands as shown in the quick start.

Without the CLI, ai-avatar-video cannot call the models it relies on.

Which model should I start with?

For most use cases, start with OmniHuman 1.5 (bytedance/omnihuman-1-5) because it is noted as multi-character and best quality.

You might choose alternatives when:

OmniHuman 1.0: You only need a simpler, single-character avatar.
Fabric 1.0: You want a straightforward “image talks with lipsync” style.
PixVerse Lipsync: You are primarily focused on lipsync behavior.

Experiment across a few clips to see which app fits your visual and timing expectations.

What kind of input image works best?

While specifics depend on the underlying apps, you generally get better results with:

A clear, front-facing portrait
Good lighting and visible facial features
Minimal obstructions (no heavy shadows or occluding objects)

The closer your input matches a clean studio headshot, the more natural the avatar movement and lipsync will tend to look.

Yes. ai-avatar-video is well suited for:

Generating recurring marketing updates with an AI presenter
Creating social media talking-head clips from scripted audio
Integrating with other CLI tools for resizing, captioning, or uploading

You can orchestrate the entire flow in Bash or your preferred automation tooling, using this skill as the avatar-generation step.

Is ai-avatar-video a full video editor?

No. ai-avatar-video focuses on generating AI avatar / talking-head segments from image + audio using inference.sh apps. It does not replace a full non-linear editor.

For full productions, treat the generated video as one asset in your editing timeline, and use your usual video editing tools for cuts, transitions, titles, and effects.

Where can I see or modify the skill definition?

The skill lives in the inferen-sh/skills repository under:

tools/video/ai-avatar-video

Open SKILL.md for the primary description and quick start. You can browse the directory tree in the repository to understand how this skill sits alongside other CLI tools for video workflows.

Ratings & Reviews

No ratings yet

Share your review

0/10000

Latest reviews

Saving...

more skill

elevenlabs-dubbing

by inferen-sh

elevenlabs-dubbing lets you automatically dub and translate audio or video into 29 languages using the inference.sh CLI, preserving the original speakers’ voices. Ideal for video editors, podcasters, and localization teams who need fast, high‑quality multilingual versions of existing content.

Video Editing

Favorites 0GitHub 0

ai-social-media-content

by inferen-sh

AI-powered social media content generator for TikTok, Instagram, YouTube, and X. Use the inference.sh CLI to create platform-ready videos, reels, shorts, thumbnails, images, captions, and hashtags with models like FLUX, Veo, Seedance, Wan, Kokoro TTS, and Claude.

Social Media

Favorites 0GitHub 0

ai-video-generation

by inferen-sh

Generate AI videos with Google Veo, Seedance, Wan, Grok and 40+ models via the inference.sh CLI. Supports text-to-video, image-to-video, lipsync, avatar animation, video upscaling, and foley sound for social media clips, marketing content, explainers, and product demos.

Video Editing

Favorites 0GitHub 0

elevenlabs-sound-effects

by inferen-sh

Generate AI sound effects from text prompts using ElevenLabs via the inference.sh CLI. Ideal for video editors, game developers, podcasters, filmmakers, and content creators who need fast, royalty-free sound design. Supports text-to-sound-effect, adjustable duration, and prompt control for cinematic, ambient, and game-ready SFX.

Audio Editing

Favorites 0GitHub 0

paywall-upgrade-cro

by coreyhaines31

Expert guidance for designing and optimizing in-app paywalls, upgrade screens, upsell modals, and feature gates to convert free or trial users into paying customers.

Landing Pages

Favorites 0GitHub 0

fastapi-templates

by wshobson

Create production-ready FastAPI projects with async patterns, dependency injection, and comprehensive error handling. Use when building new FastAPI applications or setting up backend API projects.

Backend Development

Favorites 0GitHub 32.4K

arrange

by pbakaus

Enhance UI layout, spacing, and visual rhythm. Fixes monotonous grids, inconsistent spacing, and weak visual hierarchy. Ideal for designers and developers facing layout, grouping, or composition issues.

UI Design

Favorites 0GitHub 0

copywriting

by coreyhaines31

Expert conversion-focused copywriting support for websites and marketing pages, helping you write, rewrite, or improve persuasive copy for homepages, landing pages, pricing, feature, about, and product pages.

SEO Content

Favorites 0GitHub 0

ai-avatar-video

Overview

What is ai-avatar-video?

Key capabilities

Who is ai-avatar-video for?

How it works at a glance

How to Use

Installation and prerequisites

1. Install the skill

2. Install the inference.sh CLI (infsh)

Basic workflow: create an AI avatar video

1. Prepare your media assets

2. Run OmniHuman 1.5 for a high-quality avatar

3. Try alternative models

Integrating ai-avatar-video into workflows

Bash and CLI automation

Combining with editing and publishing tools

Files and structure to inspect

FAQ

When should I use ai-avatar-video instead of web-based avatar tools?

Do I need the inference.sh CLI for ai-avatar-video?

Which model should I start with?

What kind of input image works best?

Can I automate social media or marketing video production with this skill?

Is ai-avatar-video a full video editor?

Where can I see or modify the skill definition?

Ratings & Reviews

2. Install the inference.sh CLI (`infsh`)