metadata
license: mit
pipeline_tag: image-to-video
tags:
- lip-sync
- talking-head
- face-animation
- musetalk
- safetensors
MuseTalk V15 UNet — AEmotionStudio Mirror
Mirror of the MuseTalk V15 UNet weights for use with ComfyUI-FFMPEGA.
About
MuseTalk is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports:
- Video + Audio lip sync — make a person in a video speak new dialogue
- Image + Audio talking head — animate a portrait photo with speech audio
- Multi-face support — sync multiple faces in a single video
- Batch inference — process multiple frames simultaneously for speed
Files
| File | Precision | Size | Description |
|---|---|---|---|
musetalkV15/unet_fp16.safetensors |
fp16 | ~1.6 GB | Recommended — half-precision UNet weights |
musetalkV15/unet.safetensors |
fp32 | ~3.2 GB | Full-precision UNet weights (fallback) |
musetalkV15/musetalk.json |
— | < 1 KB | Model configuration |
Usage with ComfyUI-FFMPEGA
This model is auto-downloaded when you use the lip_sync skill in ComfyUI-FFMPEGA.
Example Prompts
Lip sync this video to the provided audio
Make the person's lips match the speech
Dub this video with the new voiceover
The fp16 variant is preferred by default when use_float16 is enabled (default). Falls back to fp32 if fp16 is unavailable.
Manual Download
If auto-download is disabled, download the files and place them in:
ComfyUI/models/musetalk/musetalkV15/
Additional Dependencies
MuseTalk also requires these models (auto-downloaded from HuggingFace on first use):
- SD-VAE (
stabilityai/sd-vae-ft-mse) — ~335 MB - Whisper-tiny (
openai/whisper-tiny) — ~75 MB
VRAM Requirements
- Minimum: ~4 GB
- Recommended: ~6 GB
- Uses subprocess isolation to prevent CUDA memory leaks
License
- MuseTalk code: MIT License
- SD-VAE: CreativeML Open RAIL-M
- Whisper: MIT License
Citation
@article{zhang2024musetalk,
title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting},
author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others},
journal={arXiv preprint arXiv:2410.10122},
year={2024}
}