PackedAvatar

PackedAvatar is a self-contained talking-head generation runtime that bundles the SadTalker-based avatar pipeline into a single .pt artifact.

It supports generating animated talking avatars from:

a single image + audio
a prebuilt AvatarBank identity
explicit avatar conditioning bundles
motion transfer bundles
reference-video driving
optional Wav2Lip post-processing

All core runtime assets are packaged inside PackedAvatar.pt.

Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally.

What is included

PackedAvatar.pt contains:

SadTalker source code snapshot
SadTalker checkpoints
AvatarBank identity system
Bria RMBG 2.0 background removal assets
Wav2Lip GAN checkpoint
BFM / face model assets
configuration files
runtime manifests and hashes
cached avatar metadata

This is a runtime artifact, not a training checkpoint.

Repository contents

PackedAvatar.pt — full bundled runtime
PackedAvatar.py — loader + inference engine
requirements.txt — dependencies
README.md — usage guide

Features

Single-file deployment (.pt) for the main runtime
Full SadTalker pipeline bundled
AvatarBank identity system
Image / avatar / motion / video conditioning
Automatic background removal (Bria RMBG)
Optional Wav2Lip GAN post-processing
CPU / CUDA
Automatic caching and extraction system
CLI + Python API support

Requirements

Python 3.10+
PyTorch
FFmpeg (for reference-video audio extraction)
Dependencies listed in requirements.txt

GPU is recommended; CPU is supported.

Quick start

1) Install dependencies

pip install -r requirements.txt

2) Place the bundle

PackedAvatar.pt

3) Basic generation

from PackedAvatar import PackedAvatar

model = PackedAvatar("PackedAvatar.pt")

video = model.generate(
    source_image="person.jpg",
    driven_audio="speech.wav"
)

print(video)

AvatarBank usage

Generate directly from a prebuilt identity:

video = model.generate(
    avatar_id="Rebecca",
    driven_audio="speech.wav"
)

No source image is required for this path.

If avatar_id is omitted, the runtime selects a default avatar from the packed bank.

Prepacked AvatarBank table

The following avatars are prepacked in the bank.

Female

Style	Names
anime	Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara
cyber	Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina
drawn	Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica
paint	Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda
real	Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon

Male

Style	Names
anime	Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert
cyber	Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean
drawn	Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas
paint	Jacob, Justin, Michael, Nicholas, Steven, William, Zachary
real	Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph

There are 100 avatars total in the bank.

Default avatar

If no avatar is explicitly selected, the runtime resolves a default in this order:

defaults.default_avatar from the manifest, if present and valid
first real-style male avatar
any real-style avatar
any male avatar
any avatar with embeddings
first available avatar entry

Source image mode

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav"
)

Pipeline:

image → face detection → crop → 3DMM extraction → animation

Background removal (Bria RMBG)

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    remove_background=True
)

Pipeline:

image → Bria RMBG → foreground → SadTalker → video

Explicit avatar conditioning

avatar_condition may be:

a Python dict
a .pt / .pth file
a .mat file

When avatar_condition is provided, it supersedes source_image-driven conditioning.

video = model.generate(
    avatar_condition="my_avatar_condition.pt",
    driven_audio="speech.wav"
)

A valid avatar bundle can include fields such as:

coeff_3dmm
motion_3dmm
full_3dmm
crop_preview
crop_info

Motion conditioning

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    motion_condition="motion.pt"
)

Supported motion inputs include:

motion_3dmm
coeff_3dmm
full_3dmm_seq
full_3dmm

Reference-video driving

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    use_ref_video=True,
    ref_video="reference.mp4",
    ref_info="pose"
)

Supported ref_info values:

pose
blink
pose+blink
all

When ref_info="all", the runtime uses the reference video coefficients directly.

Wav2Lip GAN (optional)

video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    use_wav2lip=True,
    wav2lip_repo="/path/to/Wav2Lip"
)

Post-processes the SadTalker output for improved lip sync.

The bundled checkpoint checkpoints/wav2lip_gan.pth is used automatically.

If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing.

Idle mode

Generate with silent audio instead of an input file:

video = model.generate(
    avatar_id="Aaron",
    use_idle_mode=True,
    length_of_audio=4
)

Still mode

Reduces head movement:

still_mode=True

Expression control

exp_scale=1.2

higher values → more expressive motion
lower values → more neutral motion

Face render backend

facerender="facevid2vid"

Device selection

Automatically chooses:

CUDA when available
Apple Silicon MPS on macOS when available
CPU fallback otherwise

Override:

PackedAvatar(device="cuda")

Python API (full example)

from PackedAvatar import PackedAvatar

model = PackedAvatar(
    packed_pt_path="PackedAvatar.pt",
    device="cuda",
    cache_dir="./cache"
)

video = model.generate(
    source_image="speaker.png",
    driven_audio="speech.wav",
    remove_background=True,
    use_wav2lip=True,
    size=512,
    exp_scale=1.2,
    pose_style=1,
    still_mode=False
)

print(video)

Preprocessing helpers

The runtime exposes an embedding extraction helper for image or video conditioning:

bundle = model.extract_embeddings(
    input_path="test_image.png",
    crop_or_resize="crop",
    pic_size=256
)

Camel-case alias:

bundle = model.ExtractEmbeddings("test_image.png")

The returned bundle can be saved and reused as avatar_condition or motion_condition.

CLI usage

Basic

python PackedAvatar.py \
  --source-image person.jpg \
  --driven-audio speech.wav

AvatarBank

python PackedAvatar.py \
  --avatar-id Rebecca \
  --driven-audio speech.wav

Background removal

python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --remove-background

Wav2Lip

python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --use-wav2lip \
  --wav2lip-repo /path/to/Wav2Lip

Reference video driving

python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --use-ref-video \
  --ref-video reference.mp4 \
  --ref-info pose+blink

Idle mode

python PackedAvatar.py \
  --avatar-id Aaron \
  --use-idle-mode \
  --length-of-audio 5

Explicit avatar conditioning bundle

python PackedAvatar.py \
  --avatar-condition avatar_condition.pt \
  --driven-audio speech.wav

Motion conditioning bundle

python PackedAvatar.py \
  --motion-condition motion_condition.pt \
  --driven-audio speech.wav

How it works

PackedAvatar runs a full multimodal pipeline.

1. Asset extraction

extracts SadTalker + checkpoints from .pt
verifies SHA256 hashes
builds the runtime cache

2. Avatar resolution

Priority:

avatar_condition
→ source_image-driven SadTalker path
→ avatar_id / default AvatarBank resolution

If avatar_condition is provided, it supersedes source_image conditioning.

3. Preprocessing

face detection
cropping
3DMM extraction

4. Motion generation

audio → facial coefficients
or motion transfer injection

5. Rendering

SadTalker / PIRender animation
frame synthesis

6. Optional post-processing

Wav2Lip GAN lip-sync enhancement

AvatarBank API

PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime.

The AvatarBank is loaded automatically when the model is initialized.

from PackedAvatar import PackedAvatar

model = PackedAvatar("PackedAvatar.pt")

List available avatars

avatars = model.list_avatars()

print(avatars)

Returns:

[
    "Aaron",
    "Rebecca",
    "Amy",
    ...
]

Search by name

Perform a fuzzy search against avatar IDs.

matches = model.fuzzy_search_avatar("rebeca")

print(matches)

Example:

["Rebecca"]

Query by gender and style

Search the bank using metadata filters.

female_anime = model.query_avatars(
    gender="female",
    style="anime"
)

print(female_anime)

Example:

[
    "Alison",
    "Amber",
    "Andrea",
    ...
]

Fuzzy matching is supported:

model.query_avatars(
    gender="femal",
    style="anim"
)

Retrieve avatar metadata

metadata = model.get_avatar_metadata("Rebecca")

print(metadata)

Example:

{
    "gender": "female",
    "style": "anime"
}

Retrieve avatar conditioning

Access the full conditioning bundle used internally by PackedAvatar.

avatar = model.get_avatar("Rebecca")

Returned fields may include:

{
    "avatar_id": "Rebecca",
    "gender": "female",
    "style": "anime",
    "coeff_3dmm": ...,
    "motion_3dmm": ...,
    "full_3dmm": ...,
    "crop_info": ...,
    "crop_preview": ...
}

This bundle can be passed directly as an avatar_condition.

video = model.generate(
    avatar_condition=avatar,
    driven_audio="speech.wav"
)

Retrieve avatar previews

Preview images stored in the AvatarBank can be accessed directly.

preview = model.get_avatar_preview("Rebecca")

preview.show()

Returns a PIL image.

Delete avatars

Remove an avatar from the in-memory runtime.

model.delete_avatar("Rebecca")

This only affects the current runtime session.

To persist changes, save the bank.

Save AvatarBank changes

model.save_avatar_bank("ModifiedAvatarBank.pt")

This writes the current AvatarBank state to disk.

Check whether an avatar exists

exists = model.avatar_bank.exists("Rebecca")

print(exists)

Returns:

True

Direct AvatarBank access

The underlying AvatarBank runtime is also exposed.

bank = model.avatar_bank

Available methods include:

bank.available_ids()
bank.list_avatars()
bank.exists(...)
bank.query(...)
bank.fuzzy_search_id(...)
bank.get_avatar(...)
bank.get_metadata(...)
bank.get_preview(...)
bank.delete_avatar(...)
bank.save(...)

This provides full programmatic access to the packed avatar database.

First run vs later runs

First run

extract bundle
build cache
initialize models
download a couple of auxiliary face-analysis weights if they are not already cached locally

Later runs

reuse cache
skip the auxiliary downloads when the files are already present
faster startup

Performance notes

GPU is strongly recommended for 512 resolution
CPU is supported but slower
Wav2Lip increases runtime cost
RMBG adds preprocessing overhead

Why PackedAvatar?

Compared to a standard SadTalker setup:

single .pt deployment artifact
no model downloads for the main runtime
no external repos required for core use
built-in AvatarBank system
built-in background removal
optional lip-sync enhancement
fully offline execution after first-run helper caching
reproducible runtime via bundle hashing

Notes

This repo is inference-only
Bundles are treated as trusted artifacts
Cache is auto-invalidated when the bundle changes
All runtime dependencies are resolved internally

Credits

Built on top of:

SadTalker
FaceVid2Vid / PIRender
Wav2Lip GAN
Bria RMBG

Downloads last month: 16

PackedAvatar

What is included

Repository contents

Features

Requirements

Quick start

1) Install dependencies

2) Place the bundle

3) Basic generation

AvatarBank usage

Prepacked AvatarBank table

Female

Male

Default avatar

Source image mode

Background removal (Bria RMBG)

Explicit avatar conditioning

Motion conditioning

Reference-video driving

Wav2Lip GAN (optional)

Idle mode

Still mode

Expression control

Face render backend

Device selection

Python API (full example)

Preprocessing helpers

CLI usage

Basic

AvatarBank

Background removal

Wav2Lip

Reference video driving

Idle mode

Explicit avatar conditioning bundle

Motion conditioning bundle

How it works

1. Asset extraction

2. Avatar resolution

3. Preprocessing

4. Motion generation

5. Rendering

6. Optional post-processing

AvatarBank API

List available avatars

Search by name

Query by gender and style

Retrieve avatar metadata

Retrieve avatar conditioning

Retrieve avatar previews

Delete avatars

Save AvatarBank changes

Check whether an avatar exists

Direct AvatarBank access

First run vs later runs

First run

Later runs

Performance notes

Why PackedAvatar?

Notes

Credits

Space using HiMind/Packed-Avatar 1