--- license: apache-2.0 language: - en tags: - talking-head - face-animation - avatar - image-to-video - audio-to-video - motion-transfer - lip-sync - face-synthesis - video-generation - generative-ai - multimodal - pytorch - sad-talker - wav2lip - rmbg - packed-model --- # PackedAvatar PackedAvatar is a **self-contained talking-head generation runtime** that bundles the SadTalker-based avatar pipeline into a single `.pt` artifact. It supports generating animated talking avatars from: * a single image + audio * a prebuilt AvatarBank identity * explicit avatar conditioning bundles * motion transfer bundles * reference-video driving * optional Wav2Lip post-processing All core runtime assets are packaged inside `PackedAvatar.pt`. Core model assets are bundled, but a few auxiliary helper weights may still be downloaded on the first run if they are not already cached locally. --- # What is included `PackedAvatar.pt` contains: * SadTalker source code snapshot * SadTalker checkpoints * AvatarBank identity system * Bria RMBG 2.0 background removal assets * Wav2Lip GAN checkpoint * BFM / face model assets * configuration files * runtime manifests and hashes * cached avatar metadata This is a **runtime artifact**, not a training checkpoint. --- # Repository contents * `PackedAvatar.pt` — full bundled runtime * `PackedAvatar.py` — loader + inference engine * `requirements.txt` — dependencies * `README.md` — usage guide --- # Features * Single-file deployment (`.pt`) for the main runtime * Full SadTalker pipeline bundled * AvatarBank identity system * Image / avatar / motion / video conditioning * Automatic background removal (Bria RMBG) * Optional Wav2Lip GAN post-processing * CPU / CUDA * Automatic caching and extraction system * CLI + Python API support --- # Requirements * Python 3.10+ * PyTorch * FFmpeg (for reference-video audio extraction) * Dependencies listed in `requirements.txt` GPU is recommended; CPU is supported. --- # Quick start ## 1) Install dependencies ```bash pip install -r requirements.txt ``` ## 2) Place the bundle ```text PackedAvatar.pt ``` ## 3) Basic generation ```python from PackedAvatar import PackedAvatar model = PackedAvatar("PackedAvatar.pt") video = model.generate( source_image="person.jpg", driven_audio="speech.wav" ) print(video) ``` --- # AvatarBank usage Generate directly from a prebuilt identity: ```python video = model.generate( avatar_id="Rebecca", driven_audio="speech.wav" ) ``` No source image is required for this path. If `avatar_id` is omitted, the runtime selects a default avatar from the packed bank. --- # Prepacked AvatarBank table The following avatars are prepacked in the bank. ## Female | Style | Names | | ----- | --------------------------------------------------------------------------------------------------------------------------------- | | anime | Alison, Amber, Andrea, Angela, Christine, Cynthia, Heidi, Jennifer, Karla, Kristen, Laura, Nancy, Patricia, Rebecca, Sandra, Tara | | cyber | Amanda, Brenda, Christina, Janet, Jill, Julie, Lisa, Mallory, Mandy, Martha, Melissa, Michelle, Regina | | drawn | Alyssa, Danielle, Joan, Kaitlyn, Kimberly, Marie, Samantha, Veronica | | paint | Alejandra, Barbara, Briana, Brittany, Emily, Jacqueline, Jodi, Mary, Rhonda, Savannah, Tammy, Victoria, Yolanda | | real | Amy, Ann, Ashley, Colleen, Heather, Holly, Jordan, Kristin, Kristine, Mariah, Pamela, Sara, Sharon | ## Male | Style | Names | | ----- | ----------------------------------------------------------------- | | anime | Brad, Brian, David, Gregory, John, Jose, Lawrence, Robert | | cyber | Daniel, Hayden, James, Jeremy, Paul, Ryan, Sean | | drawn | Bobby, George, Gregg, Kevin, Matthew, Ricky, Thomas | | paint | Jacob, Justin, Michael, Nicholas, Steven, William, Zachary | | real | Aaron, Andrew, Benjamin, Christopher, Derek, Frank, Jesse, Joseph | There are **100 avatars total** in the bank. --- # Default avatar If no avatar is explicitly selected, the runtime resolves a default in this order: 1. `defaults.default_avatar` from the manifest, if present and valid 2. first real-style male avatar 3. any real-style avatar 4. any male avatar 5. any avatar with embeddings 6. first available avatar entry --- # Source image mode ```python video = model.generate( source_image="portrait.png", driven_audio="speech.wav" ) ``` Pipeline: ```text image → face detection → crop → 3DMM extraction → animation ``` --- # Background removal (Bria RMBG) ```python video = model.generate( source_image="portrait.png", driven_audio="speech.wav", remove_background=True ) ``` Pipeline: ```text image → Bria RMBG → foreground → SadTalker → video ``` --- # Explicit avatar conditioning `avatar_condition` may be: * a Python `dict` * a `.pt` / `.pth` file * a `.mat` file When `avatar_condition` is provided, it supersedes `source_image`-driven conditioning. ```python video = model.generate( avatar_condition="my_avatar_condition.pt", driven_audio="speech.wav" ) ``` A valid avatar bundle can include fields such as: * `coeff_3dmm` * `motion_3dmm` * `full_3dmm` * `crop_preview` * `crop_info` --- # Motion conditioning ```python video = model.generate( source_image="portrait.png", driven_audio="speech.wav", motion_condition="motion.pt" ) ``` Supported motion inputs include: * `motion_3dmm` * `coeff_3dmm` * `full_3dmm_seq` * `full_3dmm` --- # Reference-video driving ```python video = model.generate( source_image="portrait.png", driven_audio="speech.wav", use_ref_video=True, ref_video="reference.mp4", ref_info="pose" ) ``` Supported `ref_info` values: * `pose` * `blink` * `pose+blink` * `all` When `ref_info="all"`, the runtime uses the reference video coefficients directly. --- # Wav2Lip GAN (optional) ```python video = model.generate( source_image="portrait.png", driven_audio="speech.wav", use_wav2lip=True, wav2lip_repo="/path/to/Wav2Lip" ) ``` Post-processes the SadTalker output for improved lip sync. The bundled checkpoint `checkpoints/wav2lip_gan.pth` is used automatically. If no runnable Wav2Lip inference code is found, the runtime falls back to the SadTalker video instead of crashing. --- # Idle mode Generate with silent audio instead of an input file: ```python video = model.generate( avatar_id="Aaron", use_idle_mode=True, length_of_audio=4 ) ``` --- # Still mode Reduces head movement: ```python still_mode=True ``` --- # Expression control ```python exp_scale=1.2 ``` * higher values → more expressive motion * lower values → more neutral motion --- # Face render backend ```python facerender="facevid2vid" ``` --- # Device selection Automatically chooses: * CUDA when available * Apple Silicon MPS on macOS when available * CPU fallback otherwise Override: ```python PackedAvatar(device="cuda") ``` --- # Python API (full example) ```python from PackedAvatar import PackedAvatar model = PackedAvatar( packed_pt_path="PackedAvatar.pt", device="cuda", cache_dir="./cache" ) video = model.generate( source_image="speaker.png", driven_audio="speech.wav", remove_background=True, use_wav2lip=True, size=512, exp_scale=1.2, pose_style=1, still_mode=False ) print(video) ``` --- # Preprocessing helpers The runtime exposes an embedding extraction helper for image or video conditioning: ```python bundle = model.extract_embeddings( input_path="test_image.png", crop_or_resize="crop", pic_size=256 ) ``` Camel-case alias: ```python bundle = model.ExtractEmbeddings("test_image.png") ``` The returned bundle can be saved and reused as `avatar_condition` or `motion_condition`. --- # CLI usage ## Basic ```bash python PackedAvatar.py \ --source-image person.jpg \ --driven-audio speech.wav ``` ## AvatarBank ```bash python PackedAvatar.py \ --avatar-id Rebecca \ --driven-audio speech.wav ``` ## Background removal ```bash python PackedAvatar.py \ --source-image portrait.png \ --driven-audio speech.wav \ --remove-background ``` ## Wav2Lip ```bash python PackedAvatar.py \ --source-image portrait.png \ --driven-audio speech.wav \ --use-wav2lip \ --wav2lip-repo /path/to/Wav2Lip ``` ## Reference video driving ```bash python PackedAvatar.py \ --source-image portrait.png \ --driven-audio speech.wav \ --use-ref-video \ --ref-video reference.mp4 \ --ref-info pose+blink ``` ## Idle mode ```bash python PackedAvatar.py \ --avatar-id Aaron \ --use-idle-mode \ --length-of-audio 5 ``` ## Explicit avatar conditioning bundle ```bash python PackedAvatar.py \ --avatar-condition avatar_condition.pt \ --driven-audio speech.wav ``` ## Motion conditioning bundle ```bash python PackedAvatar.py \ --motion-condition motion_condition.pt \ --driven-audio speech.wav ``` --- # How it works PackedAvatar runs a full multimodal pipeline. ## 1. Asset extraction * extracts SadTalker + checkpoints from `.pt` * verifies SHA256 hashes * builds the runtime cache ## 2. Avatar resolution Priority: ```text avatar_condition → source_image-driven SadTalker path → avatar_id / default AvatarBank resolution ``` If `avatar_condition` is provided, it supersedes `source_image` conditioning. ## 3. Preprocessing * face detection * cropping * 3DMM extraction ## 4. Motion generation * audio → facial coefficients * or motion transfer injection ## 5. Rendering * SadTalker / PIRender animation * frame synthesis ## 6. Optional post-processing * Wav2Lip GAN lip-sync enhancement # AvatarBank API PackedAvatar exposes the packed AvatarBank directly, allowing you to browse, search, inspect, and manage avatars at runtime. The AvatarBank is loaded automatically when the model is initialized. ```python from PackedAvatar import PackedAvatar model = PackedAvatar("PackedAvatar.pt") ``` ## List available avatars ```python avatars = model.list_avatars() print(avatars) ``` Returns: ```python [ "Aaron", "Rebecca", "Amy", ... ] ``` --- ## Search by name Perform a fuzzy search against avatar IDs. ```python matches = model.fuzzy_search_avatar("rebeca") print(matches) ``` Example: ```python ["Rebecca"] ``` --- ## Query by gender and style Search the bank using metadata filters. ```python female_anime = model.query_avatars( gender="female", style="anime" ) print(female_anime) ``` Example: ```python [ "Alison", "Amber", "Andrea", ... ] ``` Fuzzy matching is supported: ```python model.query_avatars( gender="femal", style="anim" ) ``` --- ## Retrieve avatar metadata ```python metadata = model.get_avatar_metadata("Rebecca") print(metadata) ``` Example: ```python { "gender": "female", "style": "anime" } ``` --- ## Retrieve avatar conditioning Access the full conditioning bundle used internally by PackedAvatar. ```python avatar = model.get_avatar("Rebecca") ``` Returned fields may include: ```python { "avatar_id": "Rebecca", "gender": "female", "style": "anime", "coeff_3dmm": ..., "motion_3dmm": ..., "full_3dmm": ..., "crop_info": ..., "crop_preview": ... } ``` This bundle can be passed directly as an `avatar_condition`. ```python video = model.generate( avatar_condition=avatar, driven_audio="speech.wav" ) ``` --- ## Retrieve avatar previews Preview images stored in the AvatarBank can be accessed directly. ```python preview = model.get_avatar_preview("Rebecca") preview.show() ``` Returns a PIL image. --- ## Delete avatars Remove an avatar from the in-memory runtime. ```python model.delete_avatar("Rebecca") ``` This only affects the current runtime session. To persist changes, save the bank. --- ## Save AvatarBank changes ```python model.save_avatar_bank("ModifiedAvatarBank.pt") ``` This writes the current AvatarBank state to disk. --- ## Check whether an avatar exists ```python exists = model.avatar_bank.exists("Rebecca") print(exists) ``` Returns: ```python True ``` --- ## Direct AvatarBank access The underlying AvatarBank runtime is also exposed. ```python bank = model.avatar_bank ``` Available methods include: ```python bank.available_ids() bank.list_avatars() bank.exists(...) bank.query(...) bank.fuzzy_search_id(...) bank.get_avatar(...) bank.get_metadata(...) bank.get_preview(...) bank.delete_avatar(...) bank.save(...) ``` This provides full programmatic access to the packed avatar database. --- # First run vs later runs ### First run * extract bundle * build cache * initialize models * download a couple of auxiliary face-analysis weights if they are not already cached locally ### Later runs * reuse cache * skip the auxiliary downloads when the files are already present * faster startup --- # Performance notes * GPU is strongly recommended for 512 resolution * CPU is supported but slower * Wav2Lip increases runtime cost * RMBG adds preprocessing overhead --- # Why PackedAvatar? Compared to a standard SadTalker setup: * single `.pt` deployment artifact * no model downloads for the main runtime * no external repos required for core use * built-in AvatarBank system * built-in background removal * optional lip-sync enhancement * fully offline execution after first-run helper caching * reproducible runtime via bundle hashing --- # Notes * This repo is inference-only * Bundles are treated as trusted artifacts * Cache is auto-invalidated when the bundle changes * All runtime dependencies are resolved internally --- # Credits Built on top of: * SadTalker * FaceVid2Vid / PIRender * Wav2Lip GAN * Bria RMBG