| | --- |
| | license: mit |
| | pipeline_tag: image-to-video |
| | tags: |
| | - lip-sync |
| | - talking-head |
| | - face-animation |
| | - musetalk |
| | - safetensors |
| | --- |
| | |
| | # MuseTalk V15 UNet — AEmotionStudio Mirror |
| |
|
| | **Mirror of the MuseTalk V15 UNet weights** for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA). |
| |
|
| | ## About |
| |
|
| | [MuseTalk](https://github.com/TMElyralab/MuseTalk) is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports: |
| |
|
| | - **Video + Audio lip sync** — make a person in a video speak new dialogue |
| | - **Image + Audio talking head** — animate a portrait photo with speech audio |
| | - **Multi-face support** — sync multiple faces in a single video |
| | - **Batch inference** — process multiple frames simultaneously for speed |
| |
|
| | ## Files |
| |
|
| | | File | Precision | Size | Description | |
| | |------|-----------|------|-------------| |
| | | `musetalkV15/unet_fp16.safetensors` | fp16 | ~1.6 GB | **Recommended** — half-precision UNet weights | |
| | | `musetalkV15/unet.safetensors` | fp32 | ~3.2 GB | Full-precision UNet weights (fallback) | |
| | | `musetalkV15/musetalk.json` | — | < 1 KB | Model configuration | |
| |
|
| | ## Usage with ComfyUI-FFMPEGA |
| |
|
| | This model is **auto-downloaded** when you use the `lip_sync` skill in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA). |
| |
|
| | ### Example Prompts |
| |
|
| | ``` |
| | Lip sync this video to the provided audio |
| | ``` |
| | ``` |
| | Make the person's lips match the speech |
| | ``` |
| | ``` |
| | Dub this video with the new voiceover |
| | ``` |
| |
|
| | The fp16 variant is preferred by default when `use_float16` is enabled (default). Falls back to fp32 if fp16 is unavailable. |
| |
|
| | ### Manual Download |
| |
|
| | If auto-download is disabled, download the files and place them in: |
| | ``` |
| | ComfyUI/models/musetalk/musetalkV15/ |
| | ``` |
| |
|
| | ### Additional Dependencies |
| |
|
| | MuseTalk also requires these models (auto-downloaded from HuggingFace on first use): |
| | - **SD-VAE** (`stabilityai/sd-vae-ft-mse`) — ~335 MB |
| | - **Whisper-tiny** (`openai/whisper-tiny`) — ~75 MB |
| |
|
| | ## VRAM Requirements |
| |
|
| | - **Minimum**: ~4 GB |
| | - **Recommended**: ~6 GB |
| | - Uses subprocess isolation to prevent CUDA memory leaks |
| |
|
| | ## License |
| |
|
| | - **MuseTalk code**: [MIT License](https://github.com/TMElyralab/MuseTalk/blob/main/LICENSE) |
| | - **SD-VAE**: [CreativeML Open RAIL-M](https://huggingface.co/stabilityai/sd-vae-ft-mse/blob/main/LICENSE) |
| | - **Whisper**: [MIT License](https://github.com/openai/whisper/blob/main/LICENSE) |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @article{zhang2024musetalk, |
| | title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting}, |
| | author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others}, |
| | journal={arXiv preprint arXiv:2410.10122}, |
| | year={2024} |
| | } |
| | ``` |
| |
|