PackedAvatar

PackedAvatar is a self-contained talking-head generation runtime that bundles the SadTalker-based avatar pipeline into a single .pt artifact.

It supports generating animated talking avatars from:

  • a single image + audio
  • a prebuilt AvatarBank identity
  • explicit avatar conditioning bundles
  • motion transfer bundles
  • reference-video driving
  • optional Wav2Lip post-processing

All core runtime assets are packaged inside PackedAvatar.pt.

Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally.


What is included

PackedAvatar.pt contains:

  • SadTalker source code snapshot
  • SadTalker checkpoints
  • AvatarBank identity system
  • Bria RMBG 2.0 background removal assets
  • Wav2Lip GAN checkpoint
  • BFM / face model assets
  • configuration files
  • runtime manifests and hashes
  • cached avatar metadata

This is a runtime artifact, not a training checkpoint.


Repository contents

  • PackedAvatar.pt โ€” full bundled runtime
  • PackedAvatar.py โ€” loader + inference engine
  • requirements.txt โ€” dependencies
  • README.md โ€” usage guide

Features

  • Single-file deployment (.pt) for the main runtime
  • Full SadTalker pipeline bundled
  • AvatarBank identity system
  • Image / avatar / motion / video conditioning
  • Automatic background removal (Bria RMBG)
  • Optional Wav2Lip GAN post-processing
  • CPU / CUDA
  • Automatic caching and extraction system
  • CLI + Python API support

Requirements

  • Python 3.10+
  • PyTorch
  • FFmpeg (for reference-video audio extraction)
  • Dependencies listed in requirements.txt

GPU is recommended; CPU is supported.


Quick start

1) Install dependencies

pip install -r requirements.txt

2) Place the bundle

PackedAvatar.pt

3) Basic generation

from PackedAvatar import PackedAvatar

model = PackedAvatar("PackedAvatar.pt")

video = model.generate(
    source_image="person.jpg",
    driven_audio="speech.wav"
)

print(video)

AvatarBank usage

Generate directly from a prebuilt identity:

video = model.generate(
    avatar_id="Rebecca",
    driven_audio="speech.wav"
)

No source image is required for this path.

If avatar_id is omitted, the runtime selects a default avatar from the packed bank.


Prepacked AvatarBank table

The following avatars are prepacked in the bank.

Female

Style Names
anime Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara
cyber Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina
drawn Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica
paint Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda
real Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon

Male

Style Names
anime Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert
cyber Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean
drawn Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas
paint Jacob, Justin, Michael, Nicholas, Steven, William, Zachary
real Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph

There are 100 avatars total in the bank.


Default avatar

If no avatar is explicitly selected, the runtime resolves a default in this order:

  1. defaults.default_avatar from the manifest, if present and valid
  2. first real-style male avatar
  3. any real-style avatar
  4. any male avatar
  5. any avatar with embeddings
  6. first available avatar entry

Source image mode

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav"
)

Pipeline:

image โ†’ face detection โ†’ crop โ†’ 3DMM extraction โ†’ animation

Background removal (Bria RMBG)

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    remove_background=True
)

Pipeline:

image โ†’ Bria RMBG โ†’ foreground โ†’ SadTalker โ†’ video

Explicit avatar conditioning

avatar_condition may be:

  • a Python dict
  • a .pt / .pth file
  • a .mat file

When avatar_condition is provided, it supersedes source_image-driven conditioning.

video = model.generate(
    avatar_condition="my_avatar_condition.pt",
    driven_audio="speech.wav"
)

A valid avatar bundle can include fields such as:

  • coeff_3dmm
  • motion_3dmm
  • full_3dmm
  • crop_preview
  • crop_info

Motion conditioning

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    motion_condition="motion.pt"
)

Supported motion inputs include:

  • motion_3dmm
  • coeff_3dmm
  • full_3dmm_seq
  • full_3dmm

Reference-video driving

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    use_ref_video=True,
    ref_video="reference.mp4",
    ref_info="pose"
)

Supported ref_info values:

  • pose
  • blink
  • pose+blink
  • all

When ref_info="all", the runtime uses the reference video coefficients directly.


Wav2Lip GAN (optional)

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    use_wav2lip=True,
    wav2lip_repo="/path/to/Wav2Lip"
)

Post-processes the SadTalker output for improved lip sync.

The bundled checkpoint checkpoints/wav2lip_gan.pth is used automatically.

If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing.


Idle mode

Generate with silent audio instead of an input file:

video = model.generate(
    avatar_id="Aaron",
    use_idle_mode=True,
    length_of_audio=4
)

Still mode

Reduces head movement:

still_mode=True

Expression control

exp_scale=1.2
  • higher values โ†’ more expressive motion
  • lower values โ†’ more neutral motion

Face render backend

facerender="facevid2vid"

Device selection

Automatically chooses:

  • CUDA when available
  • Apple Silicon MPS on macOS when available
  • CPU fallback otherwise

Override:

PackedAvatar(device="cuda")

Python API (full example)

from PackedAvatar import PackedAvatar

model = PackedAvatar(
    packed_pt_path="PackedAvatar.pt",
    device="cuda",
    cache_dir="./cache"
)

video = model.generate(
    source_image="speaker.png",
    driven_audio="speech.wav",
    remove_background=True,
    use_wav2lip=True,
    size=512,
    exp_scale=1.2,
    pose_style=1,
    still_mode=False
)

print(video)

Preprocessing helpers

The runtime exposes an embedding extraction helper for image or video conditioning:

bundle = model.extract_embeddings(
    input_path="test_image.png",
    crop_or_resize="crop",
    pic_size=256
)

Camel-case alias:

bundle = model.ExtractEmbeddings("test_image.png")

The returned bundle can be saved and reused as avatar_condition or motion_condition.


CLI usage

Basic

python PackedAvatar.py \
  --source-image person.jpg \
  --driven-audio speech.wav

AvatarBank

python PackedAvatar.py \
  --avatar-id Rebecca \
  --driven-audio speech.wav

Background removal

python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --remove-background

Wav2Lip

python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --use-wav2lip \
  --wav2lip-repo /path/to/Wav2Lip

Reference video driving

python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --use-ref-video \
  --ref-video reference.mp4 \
  --ref-info pose+blink

Idle mode

python PackedAvatar.py \
  --avatar-id Aaron \
  --use-idle-mode \
  --length-of-audio 5

Explicit avatar conditioning bundle

python PackedAvatar.py \
  --avatar-condition avatar_condition.pt \
  --driven-audio speech.wav

Motion conditioning bundle

python PackedAvatar.py \
  --motion-condition motion_condition.pt \
  --driven-audio speech.wav

How it works

PackedAvatar runs a full multimodal pipeline.

1. Asset extraction

  • extracts SadTalker + checkpoints from .pt
  • verifies SHA256 hashes
  • builds the runtime cache

2. Avatar resolution

Priority:

avatar_condition
โ†’ source_image-driven SadTalker path
โ†’ avatar_id / default AvatarBank resolution

If avatar_condition is provided, it supersedes source_image conditioning.

3. Preprocessing

  • face detection
  • cropping
  • 3DMM extraction

4. Motion generation

  • audio โ†’ facial coefficients
  • or motion transfer injection

5. Rendering

  • SadTalker / PIRender animation
  • frame synthesis

6. Optional post-processing

  • Wav2Lip GAN lip-sync enhancement

AvatarBank API

PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime.

The AvatarBank is loaded automatically when the model is initialized.

from PackedAvatar import PackedAvatar

model = PackedAvatar("PackedAvatar.pt")

List available avatars

avatars = model.list_avatars()

print(avatars)

Returns:

[
    "Aaron",
    "Rebecca",
    "Amy",
    ...
]

Search by name

Perform a fuzzy search against avatar IDs.

matches = model.fuzzy_search_avatar("rebeca")

print(matches)

Example:

["Rebecca"]

Query by gender and style

Search the bank using metadata filters.

female_anime = model.query_avatars(
    gender="female",
    style="anime"
)

print(female_anime)

Example:

[
    "Alison",
    "Amber",
    "Andrea",
    ...
]

Fuzzy matching is supported:

model.query_avatars(
    gender="femal",
    style="anim"
)

Retrieve avatar metadata

metadata = model.get_avatar_metadata("Rebecca")

print(metadata)

Example:

{
    "gender": "female",
    "style": "anime"
}

Retrieve avatar conditioning

Access the full conditioning bundle used internally by PackedAvatar.

avatar = model.get_avatar("Rebecca")

Returned fields may include:

{
    "avatar_id": "Rebecca",
    "gender": "female",
    "style": "anime",
    "coeff_3dmm": ...,
    "motion_3dmm": ...,
    "full_3dmm": ...,
    "crop_info": ...,
    "crop_preview": ...
}

This bundle can be passed directly as an avatar_condition.

video = model.generate(
    avatar_condition=avatar,
    driven_audio="speech.wav"
)

Retrieve avatar previews

Preview images stored in the AvatarBank can be accessed directly.

preview = model.get_avatar_preview("Rebecca")

preview.show()

Returns a PIL image.


Delete avatars

Remove an avatar from the in-memory runtime.

model.delete_avatar("Rebecca")

This only affects the current runtime session.

To persist changes, save the bank.


Save AvatarBank changes

model.save_avatar_bank("ModifiedAvatarBank.pt")

This writes the current AvatarBank state to disk.


Check whether an avatar exists

exists = model.avatar_bank.exists("Rebecca")

print(exists)

Returns:

True

Direct AvatarBank access

The underlying AvatarBank runtime is also exposed.

bank = model.avatar_bank

Available methods include:

bank.available_ids()
bank.list_avatars()
bank.exists(...)
bank.query(...)
bank.fuzzy_search_id(...)
bank.get_avatar(...)
bank.get_metadata(...)
bank.get_preview(...)
bank.delete_avatar(...)
bank.save(...)

This provides full programmatic access to the packed avatar database.


First run vs later runs

First run

  • extract bundle
  • build cache
  • initialize models
  • download a couple of auxiliary face-analysis weights if they are not already cached locally

Later runs

  • reuse cache
  • skip the auxiliary downloads when the files are already present
  • faster startup

Performance notes

  • GPU is strongly recommended for 512 resolution
  • CPU is supported but slower
  • Wav2Lip increases runtime cost
  • RMBG adds preprocessing overhead

Why PackedAvatar?

Compared to a standard SadTalker setup:

  • single .pt deployment artifact
  • no model downloads for the main runtime
  • no external repos required for core use
  • built-in AvatarBank system
  • built-in background removal
  • optional lip-sync enhancement
  • fully offline execution after first-run helper caching
  • reproducible runtime via bundle hashing

Notes

  • This repo is inference-only
  • Bundles are treated as trusted artifacts
  • Cache is auto-invalidated when the bundle changes
  • All runtime dependencies are resolved internally

Credits

Built on top of:

  • SadTalker
  • FaceVid2Vid / PIRender
  • Wav2Lip GAN
  • Bria RMBG
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using HiMind/Packed-Avatar 1