Spaces:

scratchyourbrain123
/

MuseTalk

Running

File size: 2,892 Bytes

---
title: MuseTalk
emoji: 💻
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: MuseTalk - Real-time Audio-Driven Lip Sync
---

# MuseTalk: Real-Time High-Quality Lip Synchronization

This Hugging Face Space allows you to run MuseTalk for audio-driven lip synchronization experiments.

## About MuseTalk

MuseTalk is a real-time, high-quality audio-driven lip synchronization model that generates realistic lip movements from audio input. It can be applied to videos to create lip-synced content.

## Features

- **Real-time Processing**: Generate lip-synced videos efficiently
- **High Quality**: Produces natural and realistic lip movements
- **Easy to Use**: Simple Gradio interface for quick experimentation
- **Customizable**: Adjust bounding box positions for better results

## How to Use

1. **Upload Video**: Provide an input video file (preferably with a clear face)
2. **Upload Audio**: Provide an audio file with the target speech
3. **Adjust Parameters**: (Optional) Fine-tune the bbox_shift parameter
4. **Generate**: Click the "Generate" button to create your lip-synced video

## Model Information

- **Model Weights**: [TMElyralab/MuseTalk](https://huggingface.co/TMElyralab/MuseTalk)
- **GitHub Repository**: [TMElyralab/MuseTalk](https://github.com/TMElyralab/MuseTalk)

## Requirements

The Space automatically installs all necessary dependencies including:
- PyTorch and Torchvision
- Gradio for the UI
- OpenCV for video processing
- Various ML libraries (transformers, diffusers, etc.)

## Setup Instructions

This Space is configured to:
1. Clone the MuseTalk repository on first run
2. Install all required dependencies from requirements.txt
3. Download necessary model weights automatically
4. Launch the Gradio interface

## Technical Details

**Required Model Components:**
- VAE: sd-vae-ft-mse from Stability AI
- Whisper: For audio processing
- DWPose: For pose estimation
- Face Parsing: For face segmentation
- ResNet18: For feature extraction

## Tips for Best Results

- Use videos with clear, well-lit faces
- Ensure audio quality is good for better lip sync
- Adjust the bbox_shift parameter if the face detection is off-center
- Input videos should ideally be in MP4 format

## Citation

If you use MuseTalk in your research or projects, please cite the original repository:

```
@misc{musetalk2024,
  title={MuseTalk: Real-Time High-Quality Lip Synchronization},
  author={TMElyralab},
  year={2024},
  url={https://github.com/TMElyralab/MuseTalk}
}
```

## Related Projects

- [MuseV](https://github.com/TMElyralab/MuseV) - For text-to-video generation

## License

Please refer to the [original repository](https://github.com/TMElyralab/MuseTalk) for licensing information.

---

**Note**: First-time setup may take several minutes as model weights (~2GB) are downloaded automatically.