---
license: apache-2.0

language:
  - en

tags:
  - talking-head
  - face-animation
  - avatar
  - image-to-video
  - audio-to-video
  - motion-transfer
  - lip-sync
  - face-synthesis
  - video-generation
  - generative-ai
  - multimodal
  - pytorch
  - sad-talker
  - wav2lip
  - rmbg
  - packed-model
---

# PackedAvatar

PackedAvatar is a **self-contained talking-head generation runtime** that bundles the SadTalker-based avatar pipeline into a single `.pt` artifact.

It supports generating animated talking avatars from:

* a single image + audio
* a prebuilt AvatarBank identity
* explicit avatar conditioning bundles
* motion transfer bundles
* reference-video driving
* optional Wav2Lip post-processing

All core runtime assets are packaged inside `PackedAvatar.pt`.

Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally.

---

# What is included

`PackedAvatar.pt` contains:

* SadTalker source code snapshot
* SadTalker checkpoints
* AvatarBank identity system
* Bria RMBG 2.0 background removal assets
* Wav2Lip GAN checkpoint
* BFM / face model assets
* configuration files
* runtime manifests and hashes
* cached avatar metadata

This is a **runtime artifact**, not a training checkpoint.

---

# Repository contents

* `PackedAvatar.pt` — full bundled runtime
* `PackedAvatar.py` — loader + inference engine
* `requirements.txt` — dependencies
* `README.md` — usage guide

---

# Features

* Single-file deployment (`.pt`) for the main runtime
* Full SadTalker pipeline bundled
* AvatarBank identity system
* Image / avatar / motion / video conditioning
* Automatic background removal (Bria RMBG)
* Optional Wav2Lip GAN post-processing
* CPU / CUDA
* Automatic caching and extraction system
* CLI + Python API support

---

# Requirements

* Python 3.10+
* PyTorch
* FFmpeg (for reference-video audio extraction)
* Dependencies listed in `requirements.txt`

GPU is recommended; CPU is supported.

---

# Quick start

## 1) Install dependencies

```bash
pip install -r requirements.txt
```

## 2) Place the bundle

```text
PackedAvatar.pt
```

## 3) Basic generation

```python
from PackedAvatar import PackedAvatar

model = PackedAvatar("PackedAvatar.pt")

video = model.generate(
    source_image="person.jpg",
    driven_audio="speech.wav"
)

print(video)
```

---

# AvatarBank usage

Generate directly from a prebuilt identity:

```python
video = model.generate(
    avatar_id="Rebecca",
    driven_audio="speech.wav"
)
```

No source image is required for this path.

If `avatar_id` is omitted, the runtime selects a default avatar from the packed bank.

---

# Prepacked AvatarBank table

The following avatars are prepacked in the bank.

## Female

| Style | Names                                                                                                                             |
| ----- | --------------------------------------------------------------------------------------------------------------------------------- |
| anime | Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara |
| cyber | Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina                            |
| drawn | Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica                                                              |
| paint | Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda                   |
| real  | Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon                                |

## Male

| Style | Names                                                             |
| ----- | ----------------------------------------------------------------- |
| anime | Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert         |
| cyber | Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean                   |
| drawn | Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas               |
| paint | Jacob, Justin, Michael, Nicholas, Steven, William, Zachary        |
| real  | Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph |

There are **100 avatars total** in the bank.

---

# Default avatar

If no avatar is explicitly selected, the runtime resolves a default in this order:

1. `defaults.default_avatar` from the manifest, if present and valid
2. first real-style male avatar
3. any real-style avatar
4. any male avatar
5. any avatar with embeddings
6. first available avatar entry

---

# Source image mode

```python
video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav"
)
```

Pipeline:

```text
image → face detection → crop → 3DMM extraction → animation
```

---

# Background removal (Bria RMBG)

```python
video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    remove_background=True
)
```

Pipeline:

```text
image → Bria RMBG → foreground → SadTalker → video
```

---

# Explicit avatar conditioning

`avatar_condition` may be:

* a Python `dict`
* a `.pt` / `.pth` file
* a `.mat` file

When `avatar_condition` is provided, it supersedes `source_image`-driven conditioning.

```python
video = model.generate(
    avatar_condition="my_avatar_condition.pt",
    driven_audio="speech.wav"
)
```

A valid avatar bundle can include fields such as:

* `coeff_3dmm`
* `motion_3dmm`
* `full_3dmm`
* `crop_preview`
* `crop_info`

---

# Motion conditioning

```python
video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    motion_condition="motion.pt"
)
```

Supported motion inputs include:

* `motion_3dmm`
* `coeff_3dmm`
* `full_3dmm_seq`
* `full_3dmm`

---

# Reference-video driving

```python
video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    use_ref_video=True,
    ref_video="reference.mp4",
    ref_info="pose"
)
```

Supported `ref_info` values:

* `pose`
* `blink`
* `pose+blink`
* `all`

When `ref_info="all"`, the runtime uses the reference video coefficients directly.

---

# Wav2Lip GAN (optional)

```python
video = model.generate(
    source_image="portrait.png",
    driven_audio="speech.wav",
    use_wav2lip=True,
    wav2lip_repo="/path/to/Wav2Lip"
)
```

Post-processes the SadTalker output for improved lip sync.

The bundled checkpoint `checkpoints/wav2lip_gan.pth` is used automatically.

If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing.

---

# Idle mode

Generate with silent audio instead of an input file:

```python
video = model.generate(
    avatar_id="Aaron",
    use_idle_mode=True,
    length_of_audio=4
)
```

---

# Still mode

Reduces head movement:

```python
still_mode=True
```

---

# Expression control

```python
exp_scale=1.2
```

* higher values → more expressive motion
* lower values → more neutral motion

---

# Face render backend

```python
facerender="facevid2vid"
```

---

# Device selection

Automatically chooses:

* CUDA when available
* Apple Silicon MPS on macOS when available
* CPU fallback otherwise

Override:

```python
PackedAvatar(device="cuda")
```

---

# Python API (full example)

```python
from PackedAvatar import PackedAvatar

model = PackedAvatar(
    packed_pt_path="PackedAvatar.pt",
    device="cuda",
    cache_dir="./cache"
)

video = model.generate(
    source_image="speaker.png",
    driven_audio="speech.wav",
    remove_background=True,
    use_wav2lip=True,
    size=512,
    exp_scale=1.2,
    pose_style=1,
    still_mode=False
)

print(video)
```

---

# Preprocessing helpers

The runtime exposes an embedding extraction helper for image or video conditioning:

```python
bundle = model.extract_embeddings(
    input_path="test_image.png",
    crop_or_resize="crop",
    pic_size=256
)
```

Camel-case alias:

```python
bundle = model.ExtractEmbeddings("test_image.png")
```

The returned bundle can be saved and reused as `avatar_condition` or `motion_condition`.

---

# CLI usage

## Basic

```bash
python PackedAvatar.py \
  --source-image person.jpg \
  --driven-audio speech.wav
```

## AvatarBank

```bash
python PackedAvatar.py \
  --avatar-id Rebecca \
  --driven-audio speech.wav
```

## Background removal

```bash
python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --remove-background
```

## Wav2Lip

```bash
python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --use-wav2lip \
  --wav2lip-repo /path/to/Wav2Lip
```

## Reference video driving

```bash
python PackedAvatar.py \
  --source-image portrait.png \
  --driven-audio speech.wav \
  --use-ref-video \
  --ref-video reference.mp4 \
  --ref-info pose+blink
```

## Idle mode

```bash
python PackedAvatar.py \
  --avatar-id Aaron \
  --use-idle-mode \
  --length-of-audio 5
```

## Explicit avatar conditioning bundle

```bash
python PackedAvatar.py \
  --avatar-condition avatar_condition.pt \
  --driven-audio speech.wav
```

## Motion conditioning bundle

```bash
python PackedAvatar.py \
  --motion-condition motion_condition.pt \
  --driven-audio speech.wav
```

---

# How it works

PackedAvatar runs a full multimodal pipeline.

## 1. Asset extraction

* extracts SadTalker + checkpoints from `.pt`
* verifies SHA256 hashes
* builds the runtime cache

## 2. Avatar resolution

Priority:

```text
avatar_condition
→ source_image-driven SadTalker path
→ avatar_id / default AvatarBank resolution
```

If `avatar_condition` is provided, it supersedes `source_image` conditioning.

## 3. Preprocessing

* face detection
* cropping
* 3DMM extraction

## 4. Motion generation

* audio → facial coefficients
* or motion transfer injection

## 5. Rendering

* SadTalker / PIRender animation
* frame synthesis

## 6. Optional post-processing

* Wav2Lip GAN lip-sync enhancement
# AvatarBank API

PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime.

The AvatarBank is loaded automatically when the model is initialized.

```python
from PackedAvatar import PackedAvatar

model = PackedAvatar("PackedAvatar.pt")
```

## List available avatars

```python
avatars = model.list_avatars()

print(avatars)
```

Returns:

```python
[
    "Aaron",
    "Rebecca",
    "Amy",
    ...
]
```

---

## Search by name

Perform a fuzzy search against avatar IDs.

```python
matches = model.fuzzy_search_avatar("rebeca")

print(matches)
```

Example:

```python
["Rebecca"]
```

---

## Query by gender and style

Search the bank using metadata filters.

```python
female_anime = model.query_avatars(
    gender="female",
    style="anime"
)

print(female_anime)
```

Example:

```python
[
    "Alison",
    "Amber",
    "Andrea",
    ...
]
```

Fuzzy matching is supported:

```python
model.query_avatars(
    gender="femal",
    style="anim"
)
```

---

## Retrieve avatar metadata

```python
metadata = model.get_avatar_metadata("Rebecca")

print(metadata)
```

Example:

```python
{
    "gender": "female",
    "style": "anime"
}
```

---

## Retrieve avatar conditioning

Access the full conditioning bundle used internally by PackedAvatar.

```python
avatar = model.get_avatar("Rebecca")
```

Returned fields may include:

```python
{
    "avatar_id": "Rebecca",
    "gender": "female",
    "style": "anime",
    "coeff_3dmm": ...,
    "motion_3dmm": ...,
    "full_3dmm": ...,
    "crop_info": ...,
    "crop_preview": ...
}
```

This bundle can be passed directly as an `avatar_condition`.

```python
video = model.generate(
    avatar_condition=avatar,
    driven_audio="speech.wav"
)
```

---

## Retrieve avatar previews

Preview images stored in the AvatarBank can be accessed directly.

```python
preview = model.get_avatar_preview("Rebecca")

preview.show()
```

Returns a PIL image.

---

## Delete avatars

Remove an avatar from the in-memory runtime.

```python
model.delete_avatar("Rebecca")
```

This only affects the current runtime session.

To persist changes, save the bank.

---

## Save AvatarBank changes

```python
model.save_avatar_bank("ModifiedAvatarBank.pt")
```

This writes the current AvatarBank state to disk.

---

## Check whether an avatar exists

```python
exists = model.avatar_bank.exists("Rebecca")

print(exists)
```

Returns:

```python
True
```

---

## Direct AvatarBank access

The underlying AvatarBank runtime is also exposed.

```python
bank = model.avatar_bank
```

Available methods include:

```python
bank.available_ids()
bank.list_avatars()
bank.exists(...)
bank.query(...)
bank.fuzzy_search_id(...)
bank.get_avatar(...)
bank.get_metadata(...)
bank.get_preview(...)
bank.delete_avatar(...)
bank.save(...)
```

This provides full programmatic access to the packed avatar database.

---

# First run vs later runs

### First run

* extract bundle
* build cache
* initialize models
* download a couple of auxiliary face-analysis weights if they are not already cached locally

### Later runs

* reuse cache
* skip the auxiliary downloads when the files are already present
* faster startup

---

# Performance notes

* GPU is strongly recommended for 512 resolution
* CPU is supported but slower
* Wav2Lip increases runtime cost
* RMBG adds preprocessing overhead

---

# Why PackedAvatar?

Compared to a standard SadTalker setup:

* single `.pt` deployment artifact
* no model downloads for the main runtime
* no external repos required for core use
* built-in AvatarBank system
* built-in background removal
* optional lip-sync enhancement
* fully offline execution after first-run helper caching
* reproducible runtime via bundle hashing

---

# Notes

* This repo is inference-only
* Bundles are treated as trusted artifacts
* Cache is auto-invalidated when the bundle changes
* All runtime dependencies are resolved internally

---

# Credits

Built on top of:

* SadTalker
* FaceVid2Vid / PIRender
* Wav2Lip GAN
* Bria RMBG