AEmotionStudio
/

musetalk-models

+---
+license: mit
+pipeline_tag: image-to-video
+tags:
+  - lip-sync
+  - talking-head
+  - face-animation
+  - musetalk
+  - safetensors
+---
+# MuseTalk V15 UNet — AEmotionStudio Mirror
+**Mirror of the MuseTalk V15 UNet weights** for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
+## About
+[MuseTalk](https://github.com/TMElyralab/MuseTalk) is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports:
+- **Video + Audio lip sync** — make a person in a video speak new dialogue
+- **Image + Audio talking head** — animate a portrait photo with speech audio
+- **Multi-face support** — sync multiple faces in a single video
+- **Batch inference** — process multiple frames simultaneously for speed
+## Files
+| File | Precision | Size | Description |
+|------|-----------|------|-------------|
+| `musetalkV15/unet_fp16.safetensors` | fp16 | ~1.6 GB | **Recommended** — half-precision UNet weights |
+| `musetalkV15/unet.safetensors` | fp32 | ~3.2 GB | Full-precision UNet weights (fallback) |
+| `musetalkV15/musetalk.json` | — | < 1 KB | Model configuration |
+## Usage with ComfyUI-FFMPEGA
+This model is **auto-downloaded** when you use the `lip_sync` skill in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
+### Example Prompts
+```
+Lip sync this video to the provided audio
+```
+```
+Make the person's lips match the speech
+```
+```
+Dub this video with the new voiceover
+```
+The fp16 variant is preferred by default when `use_float16` is enabled (default). Falls back to fp32 if fp16 is unavailable.
+### Manual Download
+If auto-download is disabled, download the files and place them in:
+```
+ComfyUI/models/musetalk/musetalkV15/
+```
+### Additional Dependencies
+MuseTalk also requires these models (auto-downloaded from HuggingFace on first use):
+- **SD-VAE** (`stabilityai/sd-vae-ft-mse`) — ~335 MB
+- **Whisper-tiny** (`openai/whisper-tiny`) — ~75 MB
+## VRAM Requirements
+- **Minimum**: ~4 GB
+- **Recommended**: ~6 GB
+- Uses subprocess isolation to prevent CUDA memory leaks
+## License
+- **MuseTalk code**: [MIT License](https://github.com/TMElyralab/MuseTalk/blob/main/LICENSE)
+- **SD-VAE**: [CreativeML Open RAIL-M](https://huggingface.co/stabilityai/sd-vae-ft-mse/blob/main/LICENSE)
+- **Whisper**: [MIT License](https://github.com/openai/whisper/blob/main/LICENSE)
+## Citation
+```bibtex
+@article{zhang2024musetalk,
+  title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting},
+  author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others},
+  journal={arXiv preprint arXiv:2410.10122},
+  year={2024}
+}
+```