README.md · HiMind/Packed-Avatar at main

Packed-Avatar / README.md

HiMind

Upload 2 files

64cfabb verified 4 days ago

preview code

Raw

History Blame Contribute Delete

14.9 kB

	---
	license: apache-2.0

	language:
	- en

	tags:
	- talking-head
	- face-animation
	- avatar
	- image-to-video
	- audio-to-video
	- motion-transfer
	- lip-sync
	- face-synthesis
	- video-generation
	- generative-ai
	- multimodal
	- pytorch
	- sad-talker
	- wav2lip
	- rmbg
	- packed-model
	---

	# PackedAvatar

	PackedAvatar is a self-contained talking-head generation runtime that bundles the SadTalker-based avatar pipeline into a single `.pt` artifact.

	It supports generating animated talking avatars from:

	* a single image + audio
	* a prebuilt AvatarBank identity
	* explicit avatar conditioning bundles
	* motion transfer bundles
	* reference-video driving
	* optional Wav2Lip post-processing

	All core runtime assets are packaged inside `PackedAvatar.pt`.

	Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally.

	---

	# What is included

	`PackedAvatar.pt` contains:

	* SadTalker source code snapshot
	* SadTalker checkpoints
	* AvatarBank identity system
	* Bria RMBG 2.0 background removal assets
	* Wav2Lip GAN checkpoint
	* BFM / face model assets
	* configuration files
	* runtime manifests and hashes
	* cached avatar metadata

	This is a runtime artifact, not a training checkpoint.

	---

	# Repository contents

	* `PackedAvatar.pt` — full bundled runtime
	* `PackedAvatar.py` — loader + inference engine
	* `requirements.txt` — dependencies
	* `README.md` — usage guide

	---

	# Features

	* Single-file deployment (`.pt`) for the main runtime
	* Full SadTalker pipeline bundled
	* AvatarBank identity system
	* Image / avatar / motion / video conditioning
	* Automatic background removal (Bria RMBG)
	* Optional Wav2Lip GAN post-processing
	* CPU / CUDA
	* Automatic caching and extraction system
	* CLI + Python API support

	---

	# Requirements

	* Python 3.10+
	* PyTorch
	* FFmpeg (for reference-video audio extraction)
	* Dependencies listed in `requirements.txt`

	GPU is recommended; CPU is supported.

	---

	# Quick start

	## 1) Install dependencies

	```bash
	pip install -r requirements.txt
	```

	## 2) Place the bundle

	```text
	PackedAvatar.pt
	```

	## 3) Basic generation

	```python
	from PackedAvatar import PackedAvatar

	model = PackedAvatar("PackedAvatar.pt")

	video = model.generate(
	source_image="person.jpg",
	driven_audio="speech.wav"
	)

	print(video)
	```

	---

	# AvatarBank usage

	Generate directly from a prebuilt identity:

	```python
	video = model.generate(
	avatar_id="Rebecca",
	driven_audio="speech.wav"
	)
	```

	No source image is required for this path.

	If `avatar_id` is omitted, the runtime selects a default avatar from the packed bank.

	---

	# Prepacked AvatarBank table

	The following avatars are prepacked in the bank.

	## Female

	\| Style \| Names \|
	\| ----- \| --------------------------------------------------------------------------------------------------------------------------------- \|
	\| anime \| Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara \|
	\| cyber \| Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina \|
	\| drawn \| Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica \|
	\| paint \| Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda \|
	\| real \| Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon \|

	## Male

	\| Style \| Names \|
	\| ----- \| ----------------------------------------------------------------- \|
	\| anime \| Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert \|
	\| cyber \| Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean \|
	\| drawn \| Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas \|
	\| paint \| Jacob, Justin, Michael, Nicholas, Steven, William, Zachary \|
	\| real \| Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph \|

	There are 100 avatars total in the bank.

	---

	# Default avatar

	If no avatar is explicitly selected, the runtime resolves a default in this order:

	1. `defaults.default_avatar` from the manifest, if present and valid
	2. first real-style male avatar
	3. any real-style avatar
	4. any male avatar
	5. any avatar with embeddings
	6. first available avatar entry

	---

	# Source image mode

	```python
	video = model.generate(
	source_image="portrait.png",
	driven_audio="speech.wav"
	)
	```

	Pipeline:

	```text
	image → face detection → crop → 3DMM extraction → animation
	```

	---

	# Background removal (Bria RMBG)

	```python
	video = model.generate(
	source_image="portrait.png",
	driven_audio="speech.wav",
	remove_background=True
	)
	```

	Pipeline:

	```text
	image → Bria RMBG → foreground → SadTalker → video
	```

	---

	# Explicit avatar conditioning

	`avatar_condition` may be:

	* a Python `dict`
	* a `.pt` / `.pth` file
	* a `.mat` file

	When `avatar_condition` is provided, it supersedes `source_image`-driven conditioning.

	```python
	video = model.generate(
	avatar_condition="my_avatar_condition.pt",
	driven_audio="speech.wav"
	)
	```

	A valid avatar bundle can include fields such as:

	* `coeff_3dmm`
	* `motion_3dmm`
	* `full_3dmm`
	* `crop_preview`
	* `crop_info`

	---

	# Motion conditioning

	```python
	video = model.generate(
	source_image="portrait.png",
	driven_audio="speech.wav",
	motion_condition="motion.pt"
	)
	```

	Supported motion inputs include:

	* `motion_3dmm`
	* `coeff_3dmm`
	* `full_3dmm_seq`
	* `full_3dmm`

	---

	# Reference-video driving

	```python
	video = model.generate(
	source_image="portrait.png",
	driven_audio="speech.wav",
	use_ref_video=True,
	ref_video="reference.mp4",
	ref_info="pose"
	)
	```

	Supported `ref_info` values:

	* `pose`
	* `blink`
	* `pose+blink`
	* `all`

	When `ref_info="all"`, the runtime uses the reference video coefficients directly.

	---

	# Wav2Lip GAN (optional)

	```python
	video = model.generate(
	source_image="portrait.png",
	driven_audio="speech.wav",
	use_wav2lip=True,
	wav2lip_repo="/path/to/Wav2Lip"
	)
	```

	Post-processes the SadTalker output for improved lip sync.

	The bundled checkpoint `checkpoints/wav2lip_gan.pth` is used automatically.

	If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing.

	---

	# Idle mode

	Generate with silent audio instead of an input file:

	```python
	video = model.generate(
	avatar_id="Aaron",
	use_idle_mode=True,
	length_of_audio=4
	)
	```

	---

	# Still mode

	Reduces head movement:

	```python
	still_mode=True
	```

	---

	# Expression control

	```python
	exp_scale=1.2
	```

	* higher values → more expressive motion
	* lower values → more neutral motion

	---

	# Face render backend

	```python
	facerender="facevid2vid"
	```

	---

	# Device selection

	Automatically chooses:

	* CUDA when available
	* Apple Silicon MPS on macOS when available
	* CPU fallback otherwise

	Override:

	```python
	PackedAvatar(device="cuda")
	```

	---

	# Python API (full example)

	```python
	from PackedAvatar import PackedAvatar

	model = PackedAvatar(
	packed_pt_path="PackedAvatar.pt",
	device="cuda",
	cache_dir="./cache"
	)

	video = model.generate(
	source_image="speaker.png",
	driven_audio="speech.wav",
	remove_background=True,
	use_wav2lip=True,
	size=512,
	exp_scale=1.2,
	pose_style=1,
	still_mode=False
	)

	print(video)
	```

	---

	# Preprocessing helpers

	The runtime exposes an embedding extraction helper for image or video conditioning:

	```python
	bundle = model.extract_embeddings(
	input_path="test_image.png",
	crop_or_resize="crop",
	pic_size=256
	)
	```

	Camel-case alias:

	```python
	bundle = model.ExtractEmbeddings("test_image.png")
	```

	The returned bundle can be saved and reused as `avatar_condition` or `motion_condition`.

	---

	# CLI usage

	## Basic

	```bash
	python PackedAvatar.py \
	--source-image person.jpg \
	--driven-audio speech.wav
	```

	## AvatarBank

	```bash
	python PackedAvatar.py \
	--avatar-id Rebecca \
	--driven-audio speech.wav
	```

	## Background removal

	```bash
	python PackedAvatar.py \
	--source-image portrait.png \
	--driven-audio speech.wav \
	--remove-background
	```

	## Wav2Lip

	```bash
	python PackedAvatar.py \
	--source-image portrait.png \
	--driven-audio speech.wav \
	--use-wav2lip \
	--wav2lip-repo /path/to/Wav2Lip
	```

	## Reference video driving

	```bash
	python PackedAvatar.py \
	--source-image portrait.png \
	--driven-audio speech.wav \
	--use-ref-video \
	--ref-video reference.mp4 \
	--ref-info pose+blink
	```

	## Idle mode

	```bash
	python PackedAvatar.py \
	--avatar-id Aaron \
	--use-idle-mode \
	--length-of-audio 5
	```

	## Explicit avatar conditioning bundle

	```bash
	python PackedAvatar.py \
	--avatar-condition avatar_condition.pt \
	--driven-audio speech.wav
	```

	## Motion conditioning bundle

	```bash
	python PackedAvatar.py \
	--motion-condition motion_condition.pt \
	--driven-audio speech.wav
	```

	---

	# How it works

	PackedAvatar runs a full multimodal pipeline.

	## 1. Asset extraction

	* extracts SadTalker + checkpoints from `.pt`
	* verifies SHA256 hashes
	* builds the runtime cache

	## 2. Avatar resolution

	Priority:

	```text
	avatar_condition
	→ source_image-driven SadTalker path
	→ avatar_id / default AvatarBank resolution
	```

	If `avatar_condition` is provided, it supersedes `source_image` conditioning.

	## 3. Preprocessing

	* face detection
	* cropping
	* 3DMM extraction

	## 4. Motion generation

	* audio → facial coefficients
	* or motion transfer injection

	## 5. Rendering

	* SadTalker / PIRender animation
	* frame synthesis

	## 6. Optional post-processing

	* Wav2Lip GAN lip-sync enhancement
	# AvatarBank API

	PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime.

	The AvatarBank is loaded automatically when the model is initialized.

	```python
	from PackedAvatar import PackedAvatar

	model = PackedAvatar("PackedAvatar.pt")
	```

	## List available avatars

	```python
	avatars = model.list_avatars()

	print(avatars)
	```

	Returns:

	```python
	[
	"Aaron",
	"Rebecca",
	"Amy",
	...
	]
	```

	---

	## Search by name

	Perform a fuzzy search against avatar IDs.

	```python
	matches = model.fuzzy_search_avatar("rebeca")

	print(matches)
	```

	Example:

	```python
	["Rebecca"]
	```

	---

	## Query by gender and style

	Search the bank using metadata filters.

	```python
	female_anime = model.query_avatars(
	gender="female",
	style="anime"
	)

	print(female_anime)
	```

	Example:

	```python
	[
	"Alison",
	"Amber",
	"Andrea",
	...
	]
	```

	Fuzzy matching is supported:

	```python
	model.query_avatars(
	gender="femal",
	style="anim"
	)
	```

	---

	## Retrieve avatar metadata

	```python
	metadata = model.get_avatar_metadata("Rebecca")

	print(metadata)
	```

	Example:

	```python
	{
	"gender": "female",
	"style": "anime"
	}
	```

	---

	## Retrieve avatar conditioning

	Access the full conditioning bundle used internally by PackedAvatar.

	```python
	avatar = model.get_avatar("Rebecca")
	```

	Returned fields may include:

	```python
	{
	"avatar_id": "Rebecca",
	"gender": "female",
	"style": "anime",
	"coeff_3dmm": ...,
	"motion_3dmm": ...,
	"full_3dmm": ...,
	"crop_info": ...,
	"crop_preview": ...
	}
	```

	This bundle can be passed directly as an `avatar_condition`.

	```python
	video = model.generate(
	avatar_condition=avatar,
	driven_audio="speech.wav"
	)
	```

	---

	## Retrieve avatar previews

	Preview images stored in the AvatarBank can be accessed directly.

	```python
	preview = model.get_avatar_preview("Rebecca")

	preview.show()
	```

	Returns a PIL image.

	---

	## Delete avatars

	Remove an avatar from the in-memory runtime.

	```python
	model.delete_avatar("Rebecca")
	```

	This only affects the current runtime session.

	To persist changes, save the bank.

	---

	## Save AvatarBank changes

	```python
	model.save_avatar_bank("ModifiedAvatarBank.pt")
	```

	This writes the current AvatarBank state to disk.

	---

	## Check whether an avatar exists

	```python
	exists = model.avatar_bank.exists("Rebecca")

	print(exists)
	```

	Returns:

	```python
	True
	```

	---

	## Direct AvatarBank access

	The underlying AvatarBank runtime is also exposed.

	```python
	bank = model.avatar_bank
	```

	Available methods include:

	```python
	bank.available_ids()
	bank.list_avatars()
	bank.exists(...)
	bank.query(...)
	bank.fuzzy_search_id(...)
	bank.get_avatar(...)
	bank.get_metadata(...)
	bank.get_preview(...)
	bank.delete_avatar(...)
	bank.save(...)
	```

	This provides full programmatic access to the packed avatar database.

	---

	# First run vs later runs

	### First run

	* extract bundle
	* build cache
	* initialize models
	* download a couple of auxiliary face-analysis weights if they are not already cached locally

	### Later runs

	* reuse cache
	* skip the auxiliary downloads when the files are already present
	* faster startup

	---

	# Performance notes

	* GPU is strongly recommended for 512 resolution
	* CPU is supported but slower
	* Wav2Lip increases runtime cost
	* RMBG adds preprocessing overhead

	---

	# Why PackedAvatar?

	Compared to a standard SadTalker setup:

	* single `.pt` deployment artifact
	* no model downloads for the main runtime
	* no external repos required for core use
	* built-in AvatarBank system
	* built-in background removal
	* optional lip-sync enhancement
	* fully offline execution after first-run helper caching
	* reproducible runtime via bundle hashing

	---

	# Notes

	* This repo is inference-only
	* Bundles are treated as trusted artifacts
	* Cache is auto-invalidated when the bundle changes
	* All runtime dependencies are resolved internally

	---

	# Credits

	Built on top of:

	* SadTalker
	* FaceVid2Vid / PIRender
	* Wav2Lip GAN
	* Bria RMBG