- PackedAvatar
- What is included
- Repository contents
- Features
- Requirements
- Quick start
- AvatarBank usage
- Prepacked AvatarBank table
- Default avatar
- Source image mode
- Background removal (Bria RMBG)
- Explicit avatar conditioning
- Motion conditioning
- Reference-video driving
- Wav2Lip GAN (optional)
- Idle mode
- Still mode
- Expression control
- Face render backend
- Device selection
- Python API (full example)
- Preprocessing helpers
- CLI usage
- How it works
- AvatarBank API
- First run vs later runs
- Performance notes
- Why PackedAvatar?
- Notes
- Credits
PackedAvatar
PackedAvatar is a self-contained talking-head generation runtime that bundles the SadTalker-based avatar pipeline into a single .pt artifact.
It supports generating animated talking avatars from:
- a single image + audio
- a prebuilt AvatarBank identity
- explicit avatar conditioning bundles
- motion transfer bundles
- reference-video driving
- optional Wav2Lip post-processing
All core runtime assets are packaged inside PackedAvatar.pt.
Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally.
What is included
PackedAvatar.pt contains:
- SadTalker source code snapshot
- SadTalker checkpoints
- AvatarBank identity system
- Bria RMBG 2.0 background removal assets
- Wav2Lip GAN checkpoint
- BFM / face model assets
- configuration files
- runtime manifests and hashes
- cached avatar metadata
This is a runtime artifact, not a training checkpoint.
Repository contents
PackedAvatar.ptโ full bundled runtimePackedAvatar.pyโ loader + inference enginerequirements.txtโ dependenciesREADME.mdโ usage guide
Features
- Single-file deployment (
.pt) for the main runtime - Full SadTalker pipeline bundled
- AvatarBank identity system
- Image / avatar / motion / video conditioning
- Automatic background removal (Bria RMBG)
- Optional Wav2Lip GAN post-processing
- CPU / CUDA
- Automatic caching and extraction system
- CLI + Python API support
Requirements
- Python 3.10+
- PyTorch
- FFmpeg (for reference-video audio extraction)
- Dependencies listed in
requirements.txt
GPU is recommended; CPU is supported.
Quick start
1) Install dependencies
pip install -r requirements.txt
2) Place the bundle
PackedAvatar.pt
3) Basic generation
from PackedAvatar import PackedAvatar
model = PackedAvatar("PackedAvatar.pt")
video = model.generate(
source_image="person.jpg",
driven_audio="speech.wav"
)
print(video)
AvatarBank usage
Generate directly from a prebuilt identity:
video = model.generate(
avatar_id="Rebecca",
driven_audio="speech.wav"
)
No source image is required for this path.
If avatar_id is omitted, the runtime selects a default avatar from the packed bank.
Prepacked AvatarBank table
The following avatars are prepacked in the bank.
Female
| Style | Names |
|---|---|
| anime | Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara |
| cyber | Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina |
| drawn | Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica |
| paint | Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda |
| real | Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon |
Male
| Style | Names |
|---|---|
| anime | Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert |
| cyber | Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean |
| drawn | Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas |
| paint | Jacob, Justin, Michael, Nicholas, Steven, William, Zachary |
| real | Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph |
There are 100 avatars total in the bank.
Default avatar
If no avatar is explicitly selected, the runtime resolves a default in this order:
defaults.default_avatarfrom the manifest, if present and valid- first real-style male avatar
- any real-style avatar
- any male avatar
- any avatar with embeddings
- first available avatar entry
Source image mode
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav"
)
Pipeline:
image โ face detection โ crop โ 3DMM extraction โ animation
Background removal (Bria RMBG)
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
remove_background=True
)
Pipeline:
image โ Bria RMBG โ foreground โ SadTalker โ video
Explicit avatar conditioning
avatar_condition may be:
- a Python
dict - a
.pt/.pthfile - a
.matfile
When avatar_condition is provided, it supersedes source_image-driven conditioning.
video = model.generate(
avatar_condition="my_avatar_condition.pt",
driven_audio="speech.wav"
)
A valid avatar bundle can include fields such as:
coeff_3dmmmotion_3dmmfull_3dmmcrop_previewcrop_info
Motion conditioning
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
motion_condition="motion.pt"
)
Supported motion inputs include:
motion_3dmmcoeff_3dmmfull_3dmm_seqfull_3dmm
Reference-video driving
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
use_ref_video=True,
ref_video="reference.mp4",
ref_info="pose"
)
Supported ref_info values:
poseblinkpose+blinkall
When ref_info="all", the runtime uses the reference video coefficients directly.
Wav2Lip GAN (optional)
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
use_wav2lip=True,
wav2lip_repo="/path/to/Wav2Lip"
)
Post-processes the SadTalker output for improved lip sync.
The bundled checkpoint checkpoints/wav2lip_gan.pth is used automatically.
If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing.
Idle mode
Generate with silent audio instead of an input file:
video = model.generate(
avatar_id="Aaron",
use_idle_mode=True,
length_of_audio=4
)
Still mode
Reduces head movement:
still_mode=True
Expression control
exp_scale=1.2
- higher values โ more expressive motion
- lower values โ more neutral motion
Face render backend
facerender="facevid2vid"
Device selection
Automatically chooses:
- CUDA when available
- Apple Silicon MPS on macOS when available
- CPU fallback otherwise
Override:
PackedAvatar(device="cuda")
Python API (full example)
from PackedAvatar import PackedAvatar
model = PackedAvatar(
packed_pt_path="PackedAvatar.pt",
device="cuda",
cache_dir="./cache"
)
video = model.generate(
source_image="speaker.png",
driven_audio="speech.wav",
remove_background=True,
use_wav2lip=True,
size=512,
exp_scale=1.2,
pose_style=1,
still_mode=False
)
print(video)
Preprocessing helpers
The runtime exposes an embedding extraction helper for image or video conditioning:
bundle = model.extract_embeddings(
input_path="test_image.png",
crop_or_resize="crop",
pic_size=256
)
Camel-case alias:
bundle = model.ExtractEmbeddings("test_image.png")
The returned bundle can be saved and reused as avatar_condition or motion_condition.
CLI usage
Basic
python PackedAvatar.py \
--source-image person.jpg \
--driven-audio speech.wav
AvatarBank
python PackedAvatar.py \
--avatar-id Rebecca \
--driven-audio speech.wav
Background removal
python PackedAvatar.py \
--source-image portrait.png \
--driven-audio speech.wav \
--remove-background
Wav2Lip
python PackedAvatar.py \
--source-image portrait.png \
--driven-audio speech.wav \
--use-wav2lip \
--wav2lip-repo /path/to/Wav2Lip
Reference video driving
python PackedAvatar.py \
--source-image portrait.png \
--driven-audio speech.wav \
--use-ref-video \
--ref-video reference.mp4 \
--ref-info pose+blink
Idle mode
python PackedAvatar.py \
--avatar-id Aaron \
--use-idle-mode \
--length-of-audio 5
Explicit avatar conditioning bundle
python PackedAvatar.py \
--avatar-condition avatar_condition.pt \
--driven-audio speech.wav
Motion conditioning bundle
python PackedAvatar.py \
--motion-condition motion_condition.pt \
--driven-audio speech.wav
How it works
PackedAvatar runs a full multimodal pipeline.
1. Asset extraction
- extracts SadTalker + checkpoints from
.pt - verifies SHA256 hashes
- builds the runtime cache
2. Avatar resolution
Priority:
avatar_condition
โ source_image-driven SadTalker path
โ avatar_id / default AvatarBank resolution
If avatar_condition is provided, it supersedes source_image conditioning.
3. Preprocessing
- face detection
- cropping
- 3DMM extraction
4. Motion generation
- audio โ facial coefficients
- or motion transfer injection
5. Rendering
- SadTalker / PIRender animation
- frame synthesis
6. Optional post-processing
- Wav2Lip GAN lip-sync enhancement
AvatarBank API
PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime.
The AvatarBank is loaded automatically when the model is initialized.
from PackedAvatar import PackedAvatar
model = PackedAvatar("PackedAvatar.pt")
List available avatars
avatars = model.list_avatars()
print(avatars)
Returns:
[
"Aaron",
"Rebecca",
"Amy",
...
]
Search by name
Perform a fuzzy search against avatar IDs.
matches = model.fuzzy_search_avatar("rebeca")
print(matches)
Example:
["Rebecca"]
Query by gender and style
Search the bank using metadata filters.
female_anime = model.query_avatars(
gender="female",
style="anime"
)
print(female_anime)
Example:
[
"Alison",
"Amber",
"Andrea",
...
]
Fuzzy matching is supported:
model.query_avatars(
gender="femal",
style="anim"
)
Retrieve avatar metadata
metadata = model.get_avatar_metadata("Rebecca")
print(metadata)
Example:
{
"gender": "female",
"style": "anime"
}
Retrieve avatar conditioning
Access the full conditioning bundle used internally by PackedAvatar.
avatar = model.get_avatar("Rebecca")
Returned fields may include:
{
"avatar_id": "Rebecca",
"gender": "female",
"style": "anime",
"coeff_3dmm": ...,
"motion_3dmm": ...,
"full_3dmm": ...,
"crop_info": ...,
"crop_preview": ...
}
This bundle can be passed directly as an avatar_condition.
video = model.generate(
avatar_condition=avatar,
driven_audio="speech.wav"
)
Retrieve avatar previews
Preview images stored in the AvatarBank can be accessed directly.
preview = model.get_avatar_preview("Rebecca")
preview.show()
Returns a PIL image.
Delete avatars
Remove an avatar from the in-memory runtime.
model.delete_avatar("Rebecca")
This only affects the current runtime session.
To persist changes, save the bank.
Save AvatarBank changes
model.save_avatar_bank("ModifiedAvatarBank.pt")
This writes the current AvatarBank state to disk.
Check whether an avatar exists
exists = model.avatar_bank.exists("Rebecca")
print(exists)
Returns:
True
Direct AvatarBank access
The underlying AvatarBank runtime is also exposed.
bank = model.avatar_bank
Available methods include:
bank.available_ids()
bank.list_avatars()
bank.exists(...)
bank.query(...)
bank.fuzzy_search_id(...)
bank.get_avatar(...)
bank.get_metadata(...)
bank.get_preview(...)
bank.delete_avatar(...)
bank.save(...)
This provides full programmatic access to the packed avatar database.
First run vs later runs
First run
- extract bundle
- build cache
- initialize models
- download a couple of auxiliary face-analysis weights if they are not already cached locally
Later runs
- reuse cache
- skip the auxiliary downloads when the files are already present
- faster startup
Performance notes
- GPU is strongly recommended for 512 resolution
- CPU is supported but slower
- Wav2Lip increases runtime cost
- RMBG adds preprocessing overhead
Why PackedAvatar?
Compared to a standard SadTalker setup:
- single
.ptdeployment artifact - no model downloads for the main runtime
- no external repos required for core use
- built-in AvatarBank system
- built-in background removal
- optional lip-sync enhancement
- fully offline execution after first-run helper caching
- reproducible runtime via bundle hashing
Notes
- This repo is inference-only
- Bundles are treated as trusted artifacts
- Cache is auto-invalidated when the bundle changes
- All runtime dependencies are resolved internally
Credits
Built on top of:
- SadTalker
- FaceVid2Vid / PIRender
- Wav2Lip GAN
- Bria RMBG
- Downloads last month
- -