AEmotionStudio commited on
Commit
bdf3f9c
·
verified ·
1 Parent(s): 7d0b11c

Add mirror README with usage instructions

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-video
4
+ tags:
5
+ - lip-sync
6
+ - talking-head
7
+ - face-animation
8
+ - musetalk
9
+ - safetensors
10
+ ---
11
+
12
+ # MuseTalk V15 UNet — AEmotionStudio Mirror
13
+
14
+ **Mirror of the MuseTalk V15 UNet weights** for use with [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
15
+
16
+ ## About
17
+
18
+ [MuseTalk](https://github.com/TMElyralab/MuseTalk) is a real-time, high-quality lip sync model that synchronizes lip movements in video to match provided audio. It supports:
19
+
20
+ - **Video + Audio lip sync** — make a person in a video speak new dialogue
21
+ - **Image + Audio talking head** — animate a portrait photo with speech audio
22
+ - **Multi-face support** — sync multiple faces in a single video
23
+ - **Batch inference** — process multiple frames simultaneously for speed
24
+
25
+ ## Files
26
+
27
+ | File | Precision | Size | Description |
28
+ |------|-----------|------|-------------|
29
+ | `musetalkV15/unet_fp16.safetensors` | fp16 | ~1.6 GB | **Recommended** — half-precision UNet weights |
30
+ | `musetalkV15/unet.safetensors` | fp32 | ~3.2 GB | Full-precision UNet weights (fallback) |
31
+ | `musetalkV15/musetalk.json` | — | < 1 KB | Model configuration |
32
+
33
+ ## Usage with ComfyUI-FFMPEGA
34
+
35
+ This model is **auto-downloaded** when you use the `lip_sync` skill in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
36
+
37
+ ### Example Prompts
38
+
39
+ ```
40
+ Lip sync this video to the provided audio
41
+ ```
42
+ ```
43
+ Make the person's lips match the speech
44
+ ```
45
+ ```
46
+ Dub this video with the new voiceover
47
+ ```
48
+
49
+ The fp16 variant is preferred by default when `use_float16` is enabled (default). Falls back to fp32 if fp16 is unavailable.
50
+
51
+ ### Manual Download
52
+
53
+ If auto-download is disabled, download the files and place them in:
54
+ ```
55
+ ComfyUI/models/musetalk/musetalkV15/
56
+ ```
57
+
58
+ ### Additional Dependencies
59
+
60
+ MuseTalk also requires these models (auto-downloaded from HuggingFace on first use):
61
+ - **SD-VAE** (`stabilityai/sd-vae-ft-mse`) — ~335 MB
62
+ - **Whisper-tiny** (`openai/whisper-tiny`) — ~75 MB
63
+
64
+ ## VRAM Requirements
65
+
66
+ - **Minimum**: ~4 GB
67
+ - **Recommended**: ~6 GB
68
+ - Uses subprocess isolation to prevent CUDA memory leaks
69
+
70
+ ## License
71
+
72
+ - **MuseTalk code**: [MIT License](https://github.com/TMElyralab/MuseTalk/blob/main/LICENSE)
73
+ - **SD-VAE**: [CreativeML Open RAIL-M](https://huggingface.co/stabilityai/sd-vae-ft-mse/blob/main/LICENSE)
74
+ - **Whisper**: [MIT License](https://github.com/openai/whisper/blob/main/LICENSE)
75
+
76
+ ## Citation
77
+
78
+ ```bibtex
79
+ @article{zhang2024musetalk,
80
+ title={MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting},
81
+ author={Zhang, Yue and Liu, Minhao and Chen, Zhaokang and Wu, Bin and others},
82
+ journal={arXiv preprint arXiv:2410.10122},
83
+ year={2024}
84
+ }
85
+ ```