musetalk-models / README.md

AEmotionStudio

Update metadata with huggingface_hub

cf2fc95 verified 2 days ago

preview code

raw

history blame contribute delete

2.68 kB

metadata

license: mit
pipeline_tag: image-to-video
tags:
  - lip-sync
  - talking-head
  - face-animation
  - musetalk
  - safetensors

MuseTalk V15 UNet — AEmotionStudio Mirror

Mirror of the MuseTalk V15 UNet weights for use with ComfyUI-FFMPEGA.

About

MuseTalk is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports:

Video + Audio lip sync — make a person in a video speak new dialogue
Image + Audio talking head — animate a portrait photo with speech audio
Multi-face support — sync multiple faces in a single video
Batch inference — process multiple frames simultaneously for speed

Files

File	Precision	Size	Description
`musetalkV15/unet_fp16.safetensors`	fp16	~1.6 GB	Recommended — half-precision UNet weights
`musetalkV15/unet.safetensors`	fp32	~3.2 GB	Full-precision UNet weights (fallback)
`musetalkV15/musetalk.json`	—	< 1 KB	Model configuration

Usage with ComfyUI-FFMPEGA

This model is auto-downloaded when you use the lip_sync skill in ComfyUI-FFMPEGA.

Example Prompts

Lip sync this video to the provided audio

Make the person's lips match the speech

Dub this video with the new voiceover

The fp16 variant is preferred by default when use_float16 is enabled (default). Falls back to fp32 if fp16 is unavailable.

Manual Download

If auto-download is disabled, download the files and place them in:

ComfyUI/models/musetalk/musetalkV15/

Additional Dependencies

MuseTalk also requires these models (auto-downloaded from HuggingFace on first use):

SD-VAE (stabilityai/sd-vae-ft-mse) — ~335 MB
Whisper-tiny (openai/whisper-tiny) — ~75 MB

VRAM Requirements

Minimum: ~4 GB
Recommended: ~6 GB
Uses subprocess isolation to prevent CUDA memory leaks

License

MuseTalk code: MIT License
SD-VAE: CreativeML Open RAIL-M
Whisper: MIT License

Citation

@article{zhang2024musetalk,
  title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting},
  author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others},
  journal={arXiv preprint arXiv:2410.10122},
  year={2024}
}