| ---
|
| license: apache-2.0
|
|
|
| language:
|
| - en
|
|
|
| tags:
|
| - talking-head
|
| - face-animation
|
| - avatar
|
| - image-to-video
|
| - audio-to-video
|
| - motion-transfer
|
| - lip-sync
|
| - face-synthesis
|
| - video-generation
|
| - generative-ai
|
| - multimodal
|
| - pytorch
|
| - sad-talker
|
| - wav2lip
|
| - rmbg
|
| - packed-model
|
| ---
|
|
|
| # PackedAvatar
|
|
|
| PackedAvatar is a **self-contained talking-head generation runtime** that bundles the SadTalker-based avatar pipeline into a single `.pt` artifact.
|
|
|
| It supports generating animated talking avatars from:
|
|
|
| * a single image + audio
|
| * a prebuilt AvatarBank identity
|
| * explicit avatar conditioning bundles
|
| * motion transfer bundles
|
| * reference-video driving
|
| * optional Wav2Lip post-processing
|
|
|
| All core runtime assets are packaged inside `PackedAvatar.pt`.
|
|
|
| Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally.
|
|
|
| ---
|
|
|
| # What is included
|
|
|
| `PackedAvatar.pt` contains:
|
|
|
| * SadTalker source code snapshot
|
| * SadTalker checkpoints
|
| * AvatarBank identity system
|
| * Bria RMBG 2.0 background removal assets
|
| * Wav2Lip GAN checkpoint
|
| * BFM / face model assets
|
| * configuration files
|
| * runtime manifests and hashes
|
| * cached avatar metadata
|
|
|
| This is a **runtime artifact**, not a training checkpoint.
|
|
|
| ---
|
|
|
| # Repository contents
|
|
|
| * `PackedAvatar.pt` — full bundled runtime
|
| * `PackedAvatar.py` — loader + inference engine
|
| * `requirements.txt` — dependencies
|
| * `README.md` — usage guide
|
|
|
| ---
|
|
|
| # Features
|
|
|
| * Single-file deployment (`.pt`) for the main runtime
|
| * Full SadTalker pipeline bundled
|
| * AvatarBank identity system
|
| * Image / avatar / motion / video conditioning
|
| * Automatic background removal (Bria RMBG)
|
| * Optional Wav2Lip GAN post-processing
|
| * CPU / CUDA
|
| * Automatic caching and extraction system
|
| * CLI + Python API support
|
|
|
| ---
|
|
|
| # Requirements
|
|
|
| * Python 3.10+
|
| * PyTorch
|
| * FFmpeg (for reference-video audio extraction)
|
| * Dependencies listed in `requirements.txt`
|
|
|
| GPU is recommended; CPU is supported.
|
|
|
| ---
|
|
|
| # Quick start
|
|
|
| ## 1) Install dependencies
|
|
|
| ```bash
|
| pip install -r requirements.txt
|
| ```
|
|
|
| ## 2) Place the bundle
|
|
|
| ```text
|
| PackedAvatar.pt
|
| ```
|
|
|
| ## 3) Basic generation
|
|
|
| ```python
|
| from PackedAvatar import PackedAvatar
|
|
|
| model = PackedAvatar("PackedAvatar.pt")
|
|
|
| video = model.generate(
|
| source_image="person.jpg",
|
| driven_audio="speech.wav"
|
| )
|
|
|
| print(video)
|
| ```
|
|
|
| ---
|
|
|
| # AvatarBank usage
|
|
|
| Generate directly from a prebuilt identity:
|
|
|
| ```python
|
| video = model.generate(
|
| avatar_id="Rebecca",
|
| driven_audio="speech.wav"
|
| )
|
| ```
|
|
|
| No source image is required for this path.
|
|
|
| If `avatar_id` is omitted, the runtime selects a default avatar from the packed bank.
|
|
|
| ---
|
|
|
| # Prepacked AvatarBank table
|
|
|
| The following avatars are prepacked in the bank.
|
|
|
| ## Female
|
|
|
| | Style | Names |
|
| | ----- | --------------------------------------------------------------------------------------------------------------------------------- |
|
| | anime | Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara |
|
| | cyber | Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina |
|
| | drawn | Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica |
|
| | paint | Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda |
|
| | real | Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon |
|
|
|
| ## Male
|
|
|
| | Style | Names |
|
| | ----- | ----------------------------------------------------------------- |
|
| | anime | Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert |
|
| | cyber | Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean |
|
| | drawn | Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas |
|
| | paint | Jacob, Justin, Michael, Nicholas, Steven, William, Zachary |
|
| | real | Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph |
|
|
|
| There are **100 avatars total** in the bank.
|
|
|
| ---
|
|
|
| # Default avatar
|
|
|
| If no avatar is explicitly selected, the runtime resolves a default in this order:
|
|
|
| 1. `defaults.default_avatar` from the manifest, if present and valid
|
| 2. first real-style male avatar
|
| 3. any real-style avatar
|
| 4. any male avatar
|
| 5. any avatar with embeddings
|
| 6. first available avatar entry
|
|
|
| ---
|
|
|
| # Source image mode
|
|
|
| ```python
|
| video = model.generate(
|
| source_image="portrait.png",
|
| driven_audio="speech.wav"
|
| )
|
| ```
|
|
|
| Pipeline:
|
|
|
| ```text
|
| image → face detection → crop → 3DMM extraction → animation
|
| ```
|
|
|
| ---
|
|
|
| # Background removal (Bria RMBG)
|
|
|
| ```python
|
| video = model.generate(
|
| source_image="portrait.png",
|
| driven_audio="speech.wav",
|
| remove_background=True
|
| )
|
| ```
|
|
|
| Pipeline:
|
|
|
| ```text
|
| image → Bria RMBG → foreground → SadTalker → video
|
| ```
|
|
|
| ---
|
|
|
| # Explicit avatar conditioning
|
|
|
| `avatar_condition` may be:
|
|
|
| * a Python `dict`
|
| * a `.pt` / `.pth` file
|
| * a `.mat` file
|
|
|
| When `avatar_condition` is provided, it supersedes `source_image`-driven conditioning.
|
|
|
| ```python
|
| video = model.generate(
|
| avatar_condition="my_avatar_condition.pt",
|
| driven_audio="speech.wav"
|
| )
|
| ```
|
|
|
| A valid avatar bundle can include fields such as:
|
|
|
| * `coeff_3dmm`
|
| * `motion_3dmm`
|
| * `full_3dmm`
|
| * `crop_preview`
|
| * `crop_info`
|
|
|
| ---
|
|
|
| # Motion conditioning
|
|
|
| ```python
|
| video = model.generate(
|
| source_image="portrait.png",
|
| driven_audio="speech.wav",
|
| motion_condition="motion.pt"
|
| )
|
| ```
|
|
|
| Supported motion inputs include:
|
|
|
| * `motion_3dmm`
|
| * `coeff_3dmm`
|
| * `full_3dmm_seq`
|
| * `full_3dmm`
|
|
|
| ---
|
|
|
| # Reference-video driving
|
|
|
| ```python
|
| video = model.generate(
|
| source_image="portrait.png",
|
| driven_audio="speech.wav",
|
| use_ref_video=True,
|
| ref_video="reference.mp4",
|
| ref_info="pose"
|
| )
|
| ```
|
|
|
| Supported `ref_info` values:
|
|
|
| * `pose`
|
| * `blink`
|
| * `pose+blink`
|
| * `all`
|
|
|
| When `ref_info="all"`, the runtime uses the reference video coefficients directly.
|
|
|
| ---
|
|
|
| # Wav2Lip GAN (optional)
|
|
|
| ```python
|
| video = model.generate(
|
| source_image="portrait.png",
|
| driven_audio="speech.wav",
|
| use_wav2lip=True,
|
| wav2lip_repo="/path/to/Wav2Lip"
|
| )
|
| ```
|
|
|
| Post-processes the SadTalker output for improved lip sync.
|
|
|
| The bundled checkpoint `checkpoints/wav2lip_gan.pth` is used automatically.
|
|
|
| If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing.
|
|
|
| ---
|
|
|
| # Idle mode
|
|
|
| Generate with silent audio instead of an input file:
|
|
|
| ```python
|
| video = model.generate(
|
| avatar_id="Aaron",
|
| use_idle_mode=True,
|
| length_of_audio=4
|
| )
|
| ```
|
|
|
| ---
|
|
|
| # Still mode
|
|
|
| Reduces head movement:
|
|
|
| ```python
|
| still_mode=True
|
| ```
|
|
|
| ---
|
|
|
| # Expression control
|
|
|
| ```python
|
| exp_scale=1.2
|
| ```
|
|
|
| * higher values → more expressive motion
|
| * lower values → more neutral motion
|
|
|
| ---
|
|
|
| # Face render backend
|
|
|
| ```python
|
| facerender="facevid2vid"
|
| ```
|
|
|
| ---
|
|
|
| # Device selection
|
|
|
| Automatically chooses:
|
|
|
| * CUDA when available
|
| * Apple Silicon MPS on macOS when available
|
| * CPU fallback otherwise
|
|
|
| Override:
|
|
|
| ```python
|
| PackedAvatar(device="cuda")
|
| ```
|
|
|
| ---
|
|
|
| # Python API (full example)
|
|
|
| ```python
|
| from PackedAvatar import PackedAvatar
|
|
|
| model = PackedAvatar(
|
| packed_pt_path="PackedAvatar.pt",
|
| device="cuda",
|
| cache_dir="./cache"
|
| )
|
|
|
| video = model.generate(
|
| source_image="speaker.png",
|
| driven_audio="speech.wav",
|
| remove_background=True,
|
| use_wav2lip=True,
|
| size=512,
|
| exp_scale=1.2,
|
| pose_style=1,
|
| still_mode=False
|
| )
|
|
|
| print(video)
|
| ```
|
|
|
| ---
|
|
|
| # Preprocessing helpers
|
|
|
| The runtime exposes an embedding extraction helper for image or video conditioning:
|
|
|
| ```python
|
| bundle = model.extract_embeddings(
|
| input_path="test_image.png",
|
| crop_or_resize="crop",
|
| pic_size=256
|
| )
|
| ```
|
|
|
| Camel-case alias:
|
|
|
| ```python
|
| bundle = model.ExtractEmbeddings("test_image.png")
|
| ```
|
|
|
| The returned bundle can be saved and reused as `avatar_condition` or `motion_condition`.
|
|
|
| ---
|
|
|
| # CLI usage
|
|
|
| ## Basic
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --source-image person.jpg \
|
| --driven-audio speech.wav
|
| ```
|
|
|
| ## AvatarBank
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --avatar-id Rebecca \
|
| --driven-audio speech.wav
|
| ```
|
|
|
| ## Background removal
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --source-image portrait.png \
|
| --driven-audio speech.wav \
|
| --remove-background
|
| ```
|
|
|
| ## Wav2Lip
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --source-image portrait.png \
|
| --driven-audio speech.wav \
|
| --use-wav2lip \
|
| --wav2lip-repo /path/to/Wav2Lip
|
| ```
|
|
|
| ## Reference video driving
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --source-image portrait.png \
|
| --driven-audio speech.wav \
|
| --use-ref-video \
|
| --ref-video reference.mp4 \
|
| --ref-info pose+blink
|
| ```
|
|
|
| ## Idle mode
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --avatar-id Aaron \
|
| --use-idle-mode \
|
| --length-of-audio 5
|
| ```
|
|
|
| ## Explicit avatar conditioning bundle
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --avatar-condition avatar_condition.pt \
|
| --driven-audio speech.wav
|
| ```
|
|
|
| ## Motion conditioning bundle
|
|
|
| ```bash
|
| python PackedAvatar.py \
|
| --motion-condition motion_condition.pt \
|
| --driven-audio speech.wav
|
| ```
|
|
|
| ---
|
|
|
| # How it works
|
|
|
| PackedAvatar runs a full multimodal pipeline.
|
|
|
| ## 1. Asset extraction
|
|
|
| * extracts SadTalker + checkpoints from `.pt`
|
| * verifies SHA256 hashes
|
| * builds the runtime cache
|
|
|
| ## 2. Avatar resolution
|
|
|
| Priority:
|
|
|
| ```text
|
| avatar_condition
|
| → source_image-driven SadTalker path
|
| → avatar_id / default AvatarBank resolution
|
| ```
|
|
|
| If `avatar_condition` is provided, it supersedes `source_image` conditioning.
|
|
|
| ## 3. Preprocessing
|
|
|
| * face detection
|
| * cropping
|
| * 3DMM extraction
|
|
|
| ## 4. Motion generation
|
|
|
| * audio → facial coefficients
|
| * or motion transfer injection
|
|
|
| ## 5. Rendering
|
|
|
| * SadTalker / PIRender animation
|
| * frame synthesis
|
|
|
| ## 6. Optional post-processing
|
|
|
| * Wav2Lip GAN lip-sync enhancement
|
| # AvatarBank API
|
|
|
| PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime.
|
|
|
| The AvatarBank is loaded automatically when the model is initialized.
|
|
|
| ```python
|
| from PackedAvatar import PackedAvatar
|
|
|
| model = PackedAvatar("PackedAvatar.pt")
|
| ```
|
|
|
| ## List available avatars
|
|
|
| ```python
|
| avatars = model.list_avatars()
|
|
|
| print(avatars)
|
| ```
|
|
|
| Returns:
|
|
|
| ```python
|
| [
|
| "Aaron",
|
| "Rebecca",
|
| "Amy",
|
| ...
|
| ]
|
| ```
|
|
|
| ---
|
|
|
| ## Search by name
|
|
|
| Perform a fuzzy search against avatar IDs.
|
|
|
| ```python
|
| matches = model.fuzzy_search_avatar("rebeca")
|
|
|
| print(matches)
|
| ```
|
|
|
| Example:
|
|
|
| ```python
|
| ["Rebecca"]
|
| ```
|
|
|
| ---
|
|
|
| ## Query by gender and style
|
|
|
| Search the bank using metadata filters.
|
|
|
| ```python
|
| female_anime = model.query_avatars(
|
| gender="female",
|
| style="anime"
|
| )
|
|
|
| print(female_anime)
|
| ```
|
|
|
| Example:
|
|
|
| ```python
|
| [
|
| "Alison",
|
| "Amber",
|
| "Andrea",
|
| ...
|
| ]
|
| ```
|
|
|
| Fuzzy matching is supported:
|
|
|
| ```python
|
| model.query_avatars(
|
| gender="femal",
|
| style="anim"
|
| )
|
| ```
|
|
|
| ---
|
|
|
| ## Retrieve avatar metadata
|
|
|
| ```python
|
| metadata = model.get_avatar_metadata("Rebecca")
|
|
|
| print(metadata)
|
| ```
|
|
|
| Example:
|
|
|
| ```python
|
| {
|
| "gender": "female",
|
| "style": "anime"
|
| }
|
| ```
|
|
|
| ---
|
|
|
| ## Retrieve avatar conditioning
|
|
|
| Access the full conditioning bundle used internally by PackedAvatar.
|
|
|
| ```python
|
| avatar = model.get_avatar("Rebecca")
|
| ```
|
|
|
| Returned fields may include:
|
|
|
| ```python
|
| {
|
| "avatar_id": "Rebecca",
|
| "gender": "female",
|
| "style": "anime",
|
| "coeff_3dmm": ...,
|
| "motion_3dmm": ...,
|
| "full_3dmm": ...,
|
| "crop_info": ...,
|
| "crop_preview": ...
|
| }
|
| ```
|
|
|
| This bundle can be passed directly as an `avatar_condition`.
|
|
|
| ```python
|
| video = model.generate(
|
| avatar_condition=avatar,
|
| driven_audio="speech.wav"
|
| )
|
| ```
|
|
|
| ---
|
|
|
| ## Retrieve avatar previews
|
|
|
| Preview images stored in the AvatarBank can be accessed directly.
|
|
|
| ```python
|
| preview = model.get_avatar_preview("Rebecca")
|
|
|
| preview.show()
|
| ```
|
|
|
| Returns a PIL image.
|
|
|
| ---
|
|
|
| ## Delete avatars
|
|
|
| Remove an avatar from the in-memory runtime.
|
|
|
| ```python
|
| model.delete_avatar("Rebecca")
|
| ```
|
|
|
| This only affects the current runtime session.
|
|
|
| To persist changes, save the bank.
|
|
|
| ---
|
|
|
| ## Save AvatarBank changes
|
|
|
| ```python
|
| model.save_avatar_bank("ModifiedAvatarBank.pt")
|
| ```
|
|
|
| This writes the current AvatarBank state to disk.
|
|
|
| ---
|
|
|
| ## Check whether an avatar exists
|
|
|
| ```python
|
| exists = model.avatar_bank.exists("Rebecca")
|
|
|
| print(exists)
|
| ```
|
|
|
| Returns:
|
|
|
| ```python
|
| True
|
| ```
|
|
|
| ---
|
|
|
| ## Direct AvatarBank access
|
|
|
| The underlying AvatarBank runtime is also exposed.
|
|
|
| ```python
|
| bank = model.avatar_bank
|
| ```
|
|
|
| Available methods include:
|
|
|
| ```python
|
| bank.available_ids()
|
| bank.list_avatars()
|
| bank.exists(...)
|
| bank.query(...)
|
| bank.fuzzy_search_id(...)
|
| bank.get_avatar(...)
|
| bank.get_metadata(...)
|
| bank.get_preview(...)
|
| bank.delete_avatar(...)
|
| bank.save(...)
|
| ```
|
|
|
| This provides full programmatic access to the packed avatar database.
|
|
|
| ---
|
|
|
| # First run vs later runs
|
|
|
| ### First run
|
|
|
| * extract bundle
|
| * build cache
|
| * initialize models
|
| * download a couple of auxiliary face-analysis weights if they are not already cached locally
|
|
|
| ### Later runs
|
|
|
| * reuse cache
|
| * skip the auxiliary downloads when the files are already present
|
| * faster startup
|
|
|
| ---
|
|
|
| # Performance notes
|
|
|
| * GPU is strongly recommended for 512 resolution
|
| * CPU is supported but slower
|
| * Wav2Lip increases runtime cost
|
| * RMBG adds preprocessing overhead
|
|
|
| ---
|
|
|
| # Why PackedAvatar?
|
|
|
| Compared to a standard SadTalker setup:
|
|
|
| * single `.pt` deployment artifact
|
| * no model downloads for the main runtime
|
| * no external repos required for core use
|
| * built-in AvatarBank system
|
| * built-in background removal
|
| * optional lip-sync enhancement
|
| * fully offline execution after first-run helper caching
|
| * reproducible runtime via bundle hashing
|
|
|
| ---
|
|
|
| # Notes
|
|
|
| * This repo is inference-only
|
| * Bundles are treated as trusted artifacts
|
| * Cache is auto-invalidated when the bundle changes
|
| * All runtime dependencies are resolved internally
|
|
|
| ---
|
|
|
| # Credits
|
|
|
| Built on top of:
|
|
|
| * SadTalker
|
| * FaceVid2Vid / PIRender
|
| * Wav2Lip GAN
|
| * Bria RMBG
|
|
|
|
|