I

ai-avatar-video

by inferen-sh

Generate AI avatar and talking head videos from an image and audio track using the inference.sh CLI. ai-avatar-video wraps OmniHuman, Fabric, and PixVerse Lipsync apps for audio-driven avatars, lipsync videos, and virtual presenters, ideal for marketing, explainers, and social content workflows.

Stars0
Favorites0
Comments0
AddedMar 27, 2026
CategoryVideo Editing
Install Command
npx skills add https://github.com/inferen-sh/skills --skill ai-avatar-video
Overview

Overview

What is ai-avatar-video?

ai-avatar-video is a CLI-focused skill for creating AI avatar and talking head videos using the inference.sh platform. It lets you send an image and an audio file to pre-built video apps (OmniHuman, Fabric, PixVerse Lipsync) and receive a rendered video where the avatar speaks and lip-syncs to your audio.

This skill is designed for Bash-based workflows and uses the infsh CLI under the hood.

Key capabilities

  • AI talking head generation from a single portrait image
  • Audio-driven avatars: map voiceover MP3/other supported audio to a digital human
  • Lipsync videos using dedicated lipsync models
  • Virtual presenters and AI presenters for explainers, product tours, or announcements
  • Model choice via inference.sh apps:
    • OmniHuman 1.5 – multi-character, higher quality
    • OmniHuman 1.0 – single-character avatar
    • Fabric 1.0 – “image talks” lipsync
    • PixVerse Lipsync – focused lipsync generation

Who is ai-avatar-video for?

ai-avatar-video is a good fit if you:

  • Produce marketing videos, short promos, or social media content
  • Need AI spokesperson or virtual presenter clips without hiring talent
  • Want to prototype digital humans or virtual influencers from still images
  • Prefer CLI and automation (Bash, scripting, CI pipelines) over manual web tools

It is less suitable if you:

  • Need a full video editor (timelines, effects, multi-track editing)
  • Require a purely offline workflow with no external API calls
  • Want a GUI-only solution instead of command-line tools

How it works at a glance

  1. Install and log in to the infsh CLI.
  2. Choose a model (e.g., bytedance/omnihuman-1-5).
  3. Provide an image_url and audio_url in JSON.
  4. Run infsh app run ... and download the resulting video.

ai-avatar-video focuses on the video generation step and can be embedded inside larger automation or post-production pipelines.

How to Use

Installation and prerequisites

1. Install the skill

Use the skills CLI to add the skill to your environment:

npx skills add https://github.com/inferen-sh/skills --skill ai-avatar-video

This pulls the ai-avatar-video skill definition from the inferen-sh/skills repository under tools/video/ai-avatar-video.

2. Install the inference.sh CLI (infsh)

ai-avatar-video assumes you have the infsh CLI installed and available in your shell. Follow the official instructions:

  • CLI install guide: https://raw.githubusercontent.com/inference-sh/skills/refs/heads/main/cli-install.md

After installing, log in:

infsh login

You will be guided through authentication so the CLI can call inference.sh apps.

Basic workflow: create an AI avatar video

1. Prepare your media assets

  • Image: A clear, front-facing portrait image hosted at a reachable URL, e.g. https://portrait.jpg.
  • Audio: A speech or voiceover file (e.g., MP3) hosted at a reachable URL, e.g. https://speech.mp3.

You can use object storage, a web server, or any hosting that provides direct URLs.

2. Run OmniHuman 1.5 for a high-quality avatar

Use the bytedance/omnihuman-1-5 app for multi-character and best-quality talking heads:

infsh app run bytedance/omnihuman-1-5 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

The CLI will process the request and print output information, typically including a URL where you can download the generated video.

3. Try alternative models

Switch the app ID to explore different trade-offs.

OmniHuman 1.0 – single-character avatar

infsh app run bytedance/omnihuman-1-0 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Fabric 1.0 – image talks with lipsync

infsh app run falai/fabric-1-0 --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

PixVerse Lipsync – focused lipsync generation

infsh app run falai/pixverse-lipsync --input '{
  "image_url": "https://portrait.jpg",
  "audio_url": "https://speech.mp3"
}'

Choose the app based on your quality needs and output style. The exact options and outputs are defined by the respective inference.sh apps.

Integrating ai-avatar-video into workflows

Bash and CLI automation

ai-avatar-video is designed for *Bash (infsh ) use, so it fits well into scripts such as:

  • Batch-generating videos from a list of images and voiceovers
  • Nightly jobs that produce updated marketing or product videos
  • CI/CD steps that render release announcement videos when you tag a release

Example loop (conceptual):

while read image audio; do
  infsh app run bytedance/omnihuman-1-5 --input "{\"image_url\": \"$image\", \"audio_url\": \"$audio\"}"
done < avatar_jobs.txt

Combining with editing and publishing tools

The skill focuses on generating the talking-head clip. You can then:

  • Bring the output into a video editor for overlays, subtitles, or B-roll
  • Feed the clip into social media schedulers or marketing automation
  • Use accompanying skills (if available in your environment) for captioning or reformatting

Files and structure to inspect

After installing the skill from the repository, useful references include:

  • SKILL.md – Core description, quick start commands, and model overview
  • tools/video/ai-avatar-video/ – Location in the repo for context alongside other video tools

Reviewing these files will help you align your implementation with the intended usage patterns.

FAQ

When should I use ai-avatar-video instead of web-based avatar tools?

Use ai-avatar-video when you want CLI-first, scriptable control over avatar video generation. If you are comfortable with Bash and want to plug AI avatar creation into pipelines, build tools, or back-end services, this skill is a strong fit.

If you prefer to design everything visually in the browser and never touch a terminal, a purely web-based product may be more convenient.

Do I need the inference.sh CLI for ai-avatar-video?

Yes. The skill is built around the infsh CLI and the underlying inference.sh apps. You must:

  1. Install the CLI using the official instructions.
  2. Run infsh login.
  3. Use infsh app run ... commands as shown in the quick start.

Without the CLI, ai-avatar-video cannot call the models it relies on.

Which model should I start with?

For most use cases, start with OmniHuman 1.5 (bytedance/omnihuman-1-5) because it is noted as multi-character and best quality.

You might choose alternatives when:

  • OmniHuman 1.0: You only need a simpler, single-character avatar.
  • Fabric 1.0: You want a straightforward “image talks with lipsync” style.
  • PixVerse Lipsync: You are primarily focused on lipsync behavior.

Experiment across a few clips to see which app fits your visual and timing expectations.

What kind of input image works best?

While specifics depend on the underlying apps, you generally get better results with:

  • A clear, front-facing portrait
  • Good lighting and visible facial features
  • Minimal obstructions (no heavy shadows or occluding objects)

The closer your input matches a clean studio headshot, the more natural the avatar movement and lipsync will tend to look.

Can I automate social media or marketing video production with this skill?

Yes. ai-avatar-video is well suited for:

  • Generating recurring marketing updates with an AI presenter
  • Creating social media talking-head clips from scripted audio
  • Integrating with other CLI tools for resizing, captioning, or uploading

You can orchestrate the entire flow in Bash or your preferred automation tooling, using this skill as the avatar-generation step.

Is ai-avatar-video a full video editor?

No. ai-avatar-video focuses on generating AI avatar / talking-head segments from image + audio using inference.sh apps. It does not replace a full non-linear editor.

For full productions, treat the generated video as one asset in your editing timeline, and use your usual video editing tools for cuts, transitions, titles, and effects.

Where can I see or modify the skill definition?

The skill lives in the inferen-sh/skills repository under:

  • tools/video/ai-avatar-video

Open SKILL.md for the primary description and quick start. You can browse the directory tree in the repository to understand how this skill sits alongside other CLI tools for video workflows.

Ratings & Reviews

No ratings yet
Share your review
Sign in to leave a rating and comment for this skill.
G
0/10000
Latest reviews
Saving...