Packed-Avatar / README.md
HiMind's picture
Upload 2 files
64cfabb verified
|
Raw
History Blame Contribute Delete
14.9 kB
---
license: apache-2.0
language:
- en
tags:
- talking-head
- face-animation
- avatar
- image-to-video
- audio-to-video
- motion-transfer
- lip-sync
- face-synthesis
- video-generation
- generative-ai
- multimodal
- pytorch
- sad-talker
- wav2lip
- rmbg
- packed-model
---
# PackedAvatar
PackedAvatar is a **self-contained talking-head generation runtime** that bundles the SadTalker-based avatar pipeline into a single `.pt` artifact.
It supports generating animated talking avatars from:
* a single image + audio
* a prebuilt AvatarBank identity
* explicit avatar conditioning bundles
* motion transfer bundles
* reference-video driving
* optional Wav2Lip post-processing
All core runtime assets are packaged inside `PackedAvatar.pt`.
Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally.
---
# What is included
`PackedAvatar.pt` contains:
* SadTalker source code snapshot
* SadTalker checkpoints
* AvatarBank identity system
* Bria RMBG 2.0 background removal assets
* Wav2Lip GAN checkpoint
* BFM / face model assets
* configuration files
* runtime manifests and hashes
* cached avatar metadata
This is a **runtime artifact**, not a training checkpoint.
---
# Repository contents
* `PackedAvatar.pt` — full bundled runtime
* `PackedAvatar.py` — loader + inference engine
* `requirements.txt` — dependencies
* `README.md` — usage guide
---
# Features
* Single-file deployment (`.pt`) for the main runtime
* Full SadTalker pipeline bundled
* AvatarBank identity system
* Image / avatar / motion / video conditioning
* Automatic background removal (Bria RMBG)
* Optional Wav2Lip GAN post-processing
* CPU / CUDA
* Automatic caching and extraction system
* CLI + Python API support
---
# Requirements
* Python 3.10+
* PyTorch
* FFmpeg (for reference-video audio extraction)
* Dependencies listed in `requirements.txt`
GPU is recommended; CPU is supported.
---
# Quick start
## 1) Install dependencies
```bash
pip install -r requirements.txt
```
## 2) Place the bundle
```text
PackedAvatar.pt
```
## 3) Basic generation
```python
from PackedAvatar import PackedAvatar
model = PackedAvatar("PackedAvatar.pt")
video = model.generate(
source_image="person.jpg",
driven_audio="speech.wav"
)
print(video)
```
---
# AvatarBank usage
Generate directly from a prebuilt identity:
```python
video = model.generate(
avatar_id="Rebecca",
driven_audio="speech.wav"
)
```
No source image is required for this path.
If `avatar_id` is omitted, the runtime selects a default avatar from the packed bank.
---
# Prepacked AvatarBank table
The following avatars are prepacked in the bank.
## Female
| Style | Names |
| ----- | --------------------------------------------------------------------------------------------------------------------------------- |
| anime | Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara |
| cyber | Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina |
| drawn | Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica |
| paint | Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda |
| real | Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon |
## Male
| Style | Names |
| ----- | ----------------------------------------------------------------- |
| anime | Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert |
| cyber | Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean |
| drawn | Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas |
| paint | Jacob, Justin, Michael, Nicholas, Steven, William, Zachary |
| real | Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph |
There are **100 avatars total** in the bank.
---
# Default avatar
If no avatar is explicitly selected, the runtime resolves a default in this order:
1. `defaults.default_avatar` from the manifest, if present and valid
2. first real-style male avatar
3. any real-style avatar
4. any male avatar
5. any avatar with embeddings
6. first available avatar entry
---
# Source image mode
```python
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav"
)
```
Pipeline:
```text
image → face detection → crop → 3DMM extraction → animation
```
---
# Background removal (Bria RMBG)
```python
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
remove_background=True
)
```
Pipeline:
```text
image → Bria RMBG → foreground → SadTalker → video
```
---
# Explicit avatar conditioning
`avatar_condition` may be:
* a Python `dict`
* a `.pt` / `.pth` file
* a `.mat` file
When `avatar_condition` is provided, it supersedes `source_image`-driven conditioning.
```python
video = model.generate(
avatar_condition="my_avatar_condition.pt",
driven_audio="speech.wav"
)
```
A valid avatar bundle can include fields such as:
* `coeff_3dmm`
* `motion_3dmm`
* `full_3dmm`
* `crop_preview`
* `crop_info`
---
# Motion conditioning
```python
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
motion_condition="motion.pt"
)
```
Supported motion inputs include:
* `motion_3dmm`
* `coeff_3dmm`
* `full_3dmm_seq`
* `full_3dmm`
---
# Reference-video driving
```python
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
use_ref_video=True,
ref_video="reference.mp4",
ref_info="pose"
)
```
Supported `ref_info` values:
* `pose`
* `blink`
* `pose+blink`
* `all`
When `ref_info="all"`, the runtime uses the reference video coefficients directly.
---
# Wav2Lip GAN (optional)
```python
video = model.generate(
source_image="portrait.png",
driven_audio="speech.wav",
use_wav2lip=True,
wav2lip_repo="/path/to/Wav2Lip"
)
```
Post-processes the SadTalker output for improved lip sync.
The bundled checkpoint `checkpoints/wav2lip_gan.pth` is used automatically.
If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing.
---
# Idle mode
Generate with silent audio instead of an input file:
```python
video = model.generate(
avatar_id="Aaron",
use_idle_mode=True,
length_of_audio=4
)
```
---
# Still mode
Reduces head movement:
```python
still_mode=True
```
---
# Expression control
```python
exp_scale=1.2
```
* higher values → more expressive motion
* lower values → more neutral motion
---
# Face render backend
```python
facerender="facevid2vid"
```
---
# Device selection
Automatically chooses:
* CUDA when available
* Apple Silicon MPS on macOS when available
* CPU fallback otherwise
Override:
```python
PackedAvatar(device="cuda")
```
---
# Python API (full example)
```python
from PackedAvatar import PackedAvatar
model = PackedAvatar(
packed_pt_path="PackedAvatar.pt",
device="cuda",
cache_dir="./cache"
)
video = model.generate(
source_image="speaker.png",
driven_audio="speech.wav",
remove_background=True,
use_wav2lip=True,
size=512,
exp_scale=1.2,
pose_style=1,
still_mode=False
)
print(video)
```
---
# Preprocessing helpers
The runtime exposes an embedding extraction helper for image or video conditioning:
```python
bundle = model.extract_embeddings(
input_path="test_image.png",
crop_or_resize="crop",
pic_size=256
)
```
Camel-case alias:
```python
bundle = model.ExtractEmbeddings("test_image.png")
```
The returned bundle can be saved and reused as `avatar_condition` or `motion_condition`.
---
# CLI usage
## Basic
```bash
python PackedAvatar.py \
--source-image person.jpg \
--driven-audio speech.wav
```
## AvatarBank
```bash
python PackedAvatar.py \
--avatar-id Rebecca \
--driven-audio speech.wav
```
## Background removal
```bash
python PackedAvatar.py \
--source-image portrait.png \
--driven-audio speech.wav \
--remove-background
```
## Wav2Lip
```bash
python PackedAvatar.py \
--source-image portrait.png \
--driven-audio speech.wav \
--use-wav2lip \
--wav2lip-repo /path/to/Wav2Lip
```
## Reference video driving
```bash
python PackedAvatar.py \
--source-image portrait.png \
--driven-audio speech.wav \
--use-ref-video \
--ref-video reference.mp4 \
--ref-info pose+blink
```
## Idle mode
```bash
python PackedAvatar.py \
--avatar-id Aaron \
--use-idle-mode \
--length-of-audio 5
```
## Explicit avatar conditioning bundle
```bash
python PackedAvatar.py \
--avatar-condition avatar_condition.pt \
--driven-audio speech.wav
```
## Motion conditioning bundle
```bash
python PackedAvatar.py \
--motion-condition motion_condition.pt \
--driven-audio speech.wav
```
---
# How it works
PackedAvatar runs a full multimodal pipeline.
## 1. Asset extraction
* extracts SadTalker + checkpoints from `.pt`
* verifies SHA256 hashes
* builds the runtime cache
## 2. Avatar resolution
Priority:
```text
avatar_condition
→ source_image-driven SadTalker path
→ avatar_id / default AvatarBank resolution
```
If `avatar_condition` is provided, it supersedes `source_image` conditioning.
## 3. Preprocessing
* face detection
* cropping
* 3DMM extraction
## 4. Motion generation
* audio → facial coefficients
* or motion transfer injection
## 5. Rendering
* SadTalker / PIRender animation
* frame synthesis
## 6. Optional post-processing
* Wav2Lip GAN lip-sync enhancement
# AvatarBank API
PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime.
The AvatarBank is loaded automatically when the model is initialized.
```python
from PackedAvatar import PackedAvatar
model = PackedAvatar("PackedAvatar.pt")
```
## List available avatars
```python
avatars = model.list_avatars()
print(avatars)
```
Returns:
```python
[
"Aaron",
"Rebecca",
"Amy",
...
]
```
---
## Search by name
Perform a fuzzy search against avatar IDs.
```python
matches = model.fuzzy_search_avatar("rebeca")
print(matches)
```
Example:
```python
["Rebecca"]
```
---
## Query by gender and style
Search the bank using metadata filters.
```python
female_anime = model.query_avatars(
gender="female",
style="anime"
)
print(female_anime)
```
Example:
```python
[
"Alison",
"Amber",
"Andrea",
...
]
```
Fuzzy matching is supported:
```python
model.query_avatars(
gender="femal",
style="anim"
)
```
---
## Retrieve avatar metadata
```python
metadata = model.get_avatar_metadata("Rebecca")
print(metadata)
```
Example:
```python
{
"gender": "female",
"style": "anime"
}
```
---
## Retrieve avatar conditioning
Access the full conditioning bundle used internally by PackedAvatar.
```python
avatar = model.get_avatar("Rebecca")
```
Returned fields may include:
```python
{
"avatar_id": "Rebecca",
"gender": "female",
"style": "anime",
"coeff_3dmm": ...,
"motion_3dmm": ...,
"full_3dmm": ...,
"crop_info": ...,
"crop_preview": ...
}
```
This bundle can be passed directly as an `avatar_condition`.
```python
video = model.generate(
avatar_condition=avatar,
driven_audio="speech.wav"
)
```
---
## Retrieve avatar previews
Preview images stored in the AvatarBank can be accessed directly.
```python
preview = model.get_avatar_preview("Rebecca")
preview.show()
```
Returns a PIL image.
---
## Delete avatars
Remove an avatar from the in-memory runtime.
```python
model.delete_avatar("Rebecca")
```
This only affects the current runtime session.
To persist changes, save the bank.
---
## Save AvatarBank changes
```python
model.save_avatar_bank("ModifiedAvatarBank.pt")
```
This writes the current AvatarBank state to disk.
---
## Check whether an avatar exists
```python
exists = model.avatar_bank.exists("Rebecca")
print(exists)
```
Returns:
```python
True
```
---
## Direct AvatarBank access
The underlying AvatarBank runtime is also exposed.
```python
bank = model.avatar_bank
```
Available methods include:
```python
bank.available_ids()
bank.list_avatars()
bank.exists(...)
bank.query(...)
bank.fuzzy_search_id(...)
bank.get_avatar(...)
bank.get_metadata(...)
bank.get_preview(...)
bank.delete_avatar(...)
bank.save(...)
```
This provides full programmatic access to the packed avatar database.
---
# First run vs later runs
### First run
* extract bundle
* build cache
* initialize models
* download a couple of auxiliary face-analysis weights if they are not already cached locally
### Later runs
* reuse cache
* skip the auxiliary downloads when the files are already present
* faster startup
---
# Performance notes
* GPU is strongly recommended for 512 resolution
* CPU is supported but slower
* Wav2Lip increases runtime cost
* RMBG adds preprocessing overhead
---
# Why PackedAvatar?
Compared to a standard SadTalker setup:
* single `.pt` deployment artifact
* no model downloads for the main runtime
* no external repos required for core use
* built-in AvatarBank system
* built-in background removal
* optional lip-sync enhancement
* fully offline execution after first-run helper caching
* reproducible runtime via bundle hashing
---
# Notes
* This repo is inference-only
* Bundles are treated as trusted artifacts
* Cache is auto-invalidated when the bundle changes
* All runtime dependencies are resolved internally
---
# Credits
Built on top of:
* SadTalker
* FaceVid2Vid / PIRender
* Wav2Lip GAN
* Bria RMBG