MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting
Paper
• 2410.10122 • Published
Mirror of the MuseTalk V15 UNet weights for use with ComfyUI-FFMPEGA.
MuseTalk is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports:
| File | Precision | Size | Description |
|---|---|---|---|
musetalkV15/unet_fp16.safetensors |
fp16 | ~1.6 GB | Recommended — half-precision UNet weights |
musetalkV15/unet.safetensors |
fp32 | ~3.2 GB | Full-precision UNet weights (fallback) |
musetalkV15/musetalk.json |
— | < 1 KB | Model configuration |
This model is auto-downloaded when you use the lip_sync skill in ComfyUI-FFMPEGA.
Lip sync this video to the provided audio
Make the person's lips match the speech
Dub this video with the new voiceover
The fp16 variant is preferred by default when use_float16 is enabled (default). Falls back to fp32 if fp16 is unavailable.
If auto-download is disabled, download the files and place them in:
ComfyUI/models/musetalk/musetalkV15/
MuseTalk also requires these models (auto-downloaded from HuggingFace on first use):
stabilityai/sd-vae-ft-mse) — ~335 MBopenai/whisper-tiny) — ~75 MB@article{zhang2024musetalk,
title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting},
author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others},
journal={arXiv preprint arXiv:2410.10122},
year={2024}
}