AEmotionStudio
/

musetalk-models

Model card Files Files and versions

musetalk-models / README.md

AEmotionStudio's picture

Update metadata with huggingface_hub

cf2fc95 verified 2 days ago

|

history blame contribute delete

2.68 kB

	---
	license: mit
	pipeline_tag: image-to-video
	tags:
	- lip-sync
	- talking-head
	- face-animation
	- musetalk
	- safetensors
	---

	# MuseTalk V15 UNet — AEmotionStudio Mirror

	Mirror of the MuseTalk V15 UNet weights for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).

	## About

	[MuseTalk](https://github.com/TMElyralab/MuseTalk) is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports:

	- Video + Audio lip sync — make a person in a video speak new dialogue
	- Image + Audio talking head — animate a portrait photo with speech audio
	- Multi-face support — sync multiple faces in a single video
	- Batch inference — process multiple frames simultaneously for speed

	## Files

	\| File \| Precision \| Size \| Description \|
	\|------\|-----------\|------\|-------------\|
	\| `musetalkV15/unet_fp16.safetensors` \| fp16 \| ~1.6 GB \| Recommended — half-precision UNet weights \|
	\| `musetalkV15/unet.safetensors` \| fp32 \| ~3.2 GB \| Full-precision UNet weights (fallback) \|
	\| `musetalkV15/musetalk.json` \| — \| < 1 KB \| Model configuration \|

	## Usage with ComfyUI-FFMPEGA

	This model is auto-downloaded when you use the `lip_sync` skill in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).

	### Example Prompts

	```
	Lip sync this video to the provided audio
	```
	```
	Make the person's lips match the speech
	```
	```
	Dub this video with the new voiceover
	```

	The fp16 variant is preferred by default when `use_float16` is enabled (default). Falls back to fp32 if fp16 is unavailable.

	### Manual Download

	If auto-download is disabled, download the files and place them in:
	```
	ComfyUI/models/musetalk/musetalkV15/
	```

	### Additional Dependencies

	MuseTalk also requires these models (auto-downloaded from HuggingFace on first use):
	- SD-VAE (`stabilityai/sd-vae-ft-mse`) — ~335 MB
	- Whisper-tiny (`openai/whisper-tiny`) — ~75 MB

	## VRAM Requirements

	- Minimum: ~4 GB
	- Recommended: ~6 GB
	- Uses subprocess isolation to prevent CUDA memory leaks

	## License

	- MuseTalk code: [MIT License](https://github.com/TMElyralab/MuseTalk/blob/main/LICENSE)
	- SD-VAE: [CreativeML Open RAIL-M](https://huggingface.co/stabilityai/sd-vae-ft-mse/blob/main/LICENSE)
	- Whisper: [MIT License](https://github.com/openai/whisper/blob/main/LICENSE)

	## Citation

	```bibtex
	@article{zhang2024musetalk,
	title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting},
	author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others},
	journal={arXiv preprint arXiv:2410.10122},
	year={2024}
	}
	```