--- title: MuseTalk emoji: 💻 colorFrom: indigo colorTo: green sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false short_description: MuseTalk - Real-time Audio-Driven Lip Sync --- # MuseTalk: Real-Time High-Quality Lip Synchronization This Hugging Face Space allows you to run MuseTalk for audio-driven lip synchronization experiments. ## About MuseTalk MuseTalk is a real-time, high-quality audio-driven lip synchronization model that generates realistic lip movements from audio input. It can be applied to videos to create lip-synced content. ## Features - **Real-time Processing**: Generate lip-synced videos efficiently - **High Quality**: Produces natural and realistic lip movements - **Easy to Use**: Simple Gradio interface for quick experimentation - **Customizable**: Adjust bounding box positions for better results ## How to Use 1. **Upload Video**: Provide an input video file (preferably with a clear face) 2. **Upload Audio**: Provide an audio file with the target speech 3. **Adjust Parameters**: (Optional) Fine-tune the bbox_shift parameter 4. **Generate**: Click the "Generate" button to create your lip-synced video ## Model Information - **Model Weights**: [TMElyralab/MuseTalk](https://huggingface.co/TMElyralab/MuseTalk) - **GitHub Repository**: [TMElyralab/MuseTalk](https://github.com/TMElyralab/MuseTalk) ## Requirements The Space automatically installs all necessary dependencies including: - PyTorch and Torchvision - Gradio for the UI - OpenCV for video processing - Various ML libraries (transformers, diffusers, etc.) ## Setup Instructions This Space is configured to: 1. Clone the MuseTalk repository on first run 2. Install all required dependencies from requirements.txt 3. Download necessary model weights automatically 4. Launch the Gradio interface ## Technical Details **Required Model Components:** - VAE: sd-vae-ft-mse from Stability AI - Whisper: For audio processing - DWPose: For pose estimation - Face Parsing: For face segmentation - ResNet18: For feature extraction ## Tips for Best Results - Use videos with clear, well-lit faces - Ensure audio quality is good for better lip sync - Adjust the bbox_shift parameter if the face detection is off-center - Input videos should ideally be in MP4 format ## Citation If you use MuseTalk in your research or projects, please cite the original repository: ``` @misc{musetalk2024, title={MuseTalk: Real-Time High-Quality Lip Synchronization}, author={TMElyralab}, year={2024}, url={https://github.com/TMElyralab/MuseTalk} } ``` ## Related Projects - [MuseV](https://github.com/TMElyralab/MuseV) - For text-to-video generation ## License Please refer to the [original repository](https://github.com/TMElyralab/MuseTalk) for licensing information. --- **Note**: First-time setup may take several minutes as model weights (~2GB) are downloaded automatically.