File size: 2,684 Bytes
bdf3f9c
 
 
 
cf2fc95
 
 
 
 
bdf3f9c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: mit
pipeline_tag: image-to-video
tags:
- lip-sync
- talking-head
- face-animation
- musetalk
- safetensors
---

# MuseTalk V15 UNet — AEmotionStudio Mirror

**Mirror of the MuseTalk V15 UNet weights** for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).

## About

[MuseTalk](https://github.com/TMElyralab/MuseTalk) is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports:

- **Video + Audio lip sync** — make a person in a video speak new dialogue
- **Image + Audio talking head** — animate a portrait photo with speech audio
- **Multi-face support** — sync multiple faces in a single video
- **Batch inference** — process multiple frames simultaneously for speed

## Files

| File | Precision | Size | Description |
|------|-----------|------|-------------|
| `musetalkV15/unet_fp16.safetensors` | fp16 | ~1.6 GB | **Recommended** — half-precision UNet weights |
| `musetalkV15/unet.safetensors` | fp32 | ~3.2 GB | Full-precision UNet weights (fallback) |
| `musetalkV15/musetalk.json` | — | < 1 KB | Model configuration |

## Usage with ComfyUI-FFMPEGA

This model is **auto-downloaded** when you use the `lip_sync` skill in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).

### Example Prompts

```
Lip sync this video to the provided audio
```
```
Make the person's lips match the speech
```
```
Dub this video with the new voiceover
```

The fp16 variant is preferred by default when `use_float16` is enabled (default). Falls back to fp32 if fp16 is unavailable.

### Manual Download

If auto-download is disabled, download the files and place them in:
```
ComfyUI/models/musetalk/musetalkV15/
```

### Additional Dependencies

MuseTalk also requires these models (auto-downloaded from HuggingFace on first use):
- **SD-VAE** (`stabilityai/sd-vae-ft-mse`) — ~335 MB
- **Whisper-tiny** (`openai/whisper-tiny`) — ~75 MB

## VRAM Requirements

- **Minimum**: ~4 GB
- **Recommended**: ~6 GB
- Uses subprocess isolation to prevent CUDA memory leaks

## License

- **MuseTalk code**: [MIT License](https://github.com/TMElyralab/MuseTalk/blob/main/LICENSE)
- **SD-VAE**: [CreativeML Open RAIL-M](https://huggingface.co/stabilityai/sd-vae-ft-mse/blob/main/LICENSE)
- **Whisper**: [MIT License](https://github.com/openai/whisper/blob/main/LICENSE)

## Citation

```bibtex
@article{zhang2024musetalk,
  title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting},
  author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others},
  journal={arXiv preprint arXiv:2410.10122},
  year={2024}
}
```