| --- |
| license: mit |
| tags: |
| - lip-sync |
| - musetalk |
| - talking-head |
| - mirror |
| library_name: pytorch |
| --- |
| |
| # MuseTalk Mirror (A.I.M.I) |
|
|
| Mirror of [TMElyralab/MuseTalk](https://huggingface.co/TMElyralab/MuseTalk) V1.5 plus its inference-time dependencies, re-hosted for stable URLs inside the [A.I.M.I](https://aimi.app) desktop product. Contents are unmodified. |
|
|
| MuseTalk re-syncs the lips of an existing video to match a new audio track (mouth-region editing, rest of frame passes through). Pairs with our TTS + Voice-Clone stack for full "text β lip-synced video" workflows. |
|
|
| ## Files |
|
|
| | Folder / File | Upstream | Size | Purpose | |
| |---|---|---|---| |
| | `musetalkV15/unet.pth` | TMElyralab/MuseTalk | 3.24 GB | MuseTalk V1.5 UNet weights | |
| | `musetalkV15/musetalk.json` | TMElyralab/MuseTalk | 748 B | UNet config | |
| | `sd-vae-ft-mse/diffusion_pytorch_model.bin` | stabilityai/sd-vae-ft-mse | 319 MB | VAE for face latents | |
| | `sd-vae-ft-mse/config.json` | stabilityai/sd-vae-ft-mse | 547 B | VAE config | |
| | `whisper/pytorch_model.bin` | openai/whisper-tiny | 144 MB | Audio feature extraction (tiny) | |
| | `dwpose/dw-ll_ucoco_384.pth` | yzd-v/DWPose | 388 MB | Face bbox + pose detection | |
| | `face-parse-bisent/79999_iter.pth` | ManyOtherFunctions/face-parse-bisent | 51 MB | BiSeNet face-region parser | |
| | `face-parse-bisent/resnet18-5c106cde.pth` | pytorch.org/models | 45 MB | ResNet18 backbone for face-parser | |
|
|
| Total: ~4.1 GB. |
|
|
| ## Licenses |
|
|
| | Component | License | |
| |---|---| |
| | MuseTalk | MIT (Tencent Music Entertainment Lyra Lab) | |
| | SD-VAE-ft-MSE | CreativeML Open RAIL-M (Stability AI) | |
| | Whisper | MIT (OpenAI) | |
| | DWPose | Apache 2.0 | |
| | face-parse-bisent | MIT | |
| | ResNet18 (pretrained) | BSD-3-Clause (PyTorch / Facebook) | |
|
|
| All components are commercial-use-compatible. Redistributed unchanged. See upstream repos for full license texts. |
|
|
| ## Attribution |
|
|
| - **MuseTalk**: Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou β *MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting* (2024). |
| - **Whisper**: Alec Radford et al. β *Robust Speech Recognition via Large-Scale Weak Supervision* (OpenAI, 2022). |
| - **DWPose**: Zhendong Yang, Ailing Zeng, Chun Yuan, Yu Li β *Effective Whole-body Pose Estimation with Two-stages Distillation* (ICCV 2023). |
| - **BiSeNet**: Changqian Yu et al. β *BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation* (ECCV 2018). |
|
|