Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: MuseTalk
emoji: 💻
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: MuseTalk - Real-time Audio-Driven Lip Sync
MuseTalk: Real-Time High-Quality Lip Synchronization
This Hugging Face Space allows you to run MuseTalk for audio-driven lip synchronization experiments.
About MuseTalk
MuseTalk is a real-time, high-quality audio-driven lip synchronization model that generates realistic lip movements from audio input. It can be applied to videos to create lip-synced content.
Features
- Real-time Processing: Generate lip-synced videos efficiently
- High Quality: Produces natural and realistic lip movements
- Easy to Use: Simple Gradio interface for quick experimentation
- Customizable: Adjust bounding box positions for better results
How to Use
- Upload Video: Provide an input video file (preferably with a clear face)
- Upload Audio: Provide an audio file with the target speech
- Adjust Parameters: (Optional) Fine-tune the bbox_shift parameter
- Generate: Click the "Generate" button to create your lip-synced video
Model Information
- Model Weights: TMElyralab/MuseTalk
- GitHub Repository: TMElyralab/MuseTalk
Requirements
The Space automatically installs all necessary dependencies including:
- PyTorch and Torchvision
- Gradio for the UI
- OpenCV for video processing
- Various ML libraries (transformers, diffusers, etc.)
Setup Instructions
This Space is configured to:
- Clone the MuseTalk repository on first run
- Install all required dependencies from requirements.txt
- Download necessary model weights automatically
- Launch the Gradio interface
Technical Details
Required Model Components:
- VAE: sd-vae-ft-mse from Stability AI
- Whisper: For audio processing
- DWPose: For pose estimation
- Face Parsing: For face segmentation
- ResNet18: For feature extraction
Tips for Best Results
- Use videos with clear, well-lit faces
- Ensure audio quality is good for better lip sync
- Adjust the bbox_shift parameter if the face detection is off-center
- Input videos should ideally be in MP4 format
Citation
If you use MuseTalk in your research or projects, please cite the original repository:
@misc{musetalk2024,
title={MuseTalk: Real-Time High-Quality Lip Synchronization},
author={TMElyralab},
year={2024},
url={https://github.com/TMElyralab/MuseTalk}
}
Related Projects
- MuseV - For text-to-video generation
License
Please refer to the original repository for licensing information.
Note: First-time setup may take several minutes as model weights (~2GB) are downloaded automatically.