MuseTalk / README.md
scratchyourbrain123's picture
Update README with comprehensive setup and usage documentation
b6f43ab verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: MuseTalk
emoji: 💻
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: MuseTalk - Real-time Audio-Driven Lip Sync

MuseTalk: Real-Time High-Quality Lip Synchronization

This Hugging Face Space allows you to run MuseTalk for audio-driven lip synchronization experiments.

About MuseTalk

MuseTalk is a real-time, high-quality audio-driven lip synchronization model that generates realistic lip movements from audio input. It can be applied to videos to create lip-synced content.

Features

  • Real-time Processing: Generate lip-synced videos efficiently
  • High Quality: Produces natural and realistic lip movements
  • Easy to Use: Simple Gradio interface for quick experimentation
  • Customizable: Adjust bounding box positions for better results

How to Use

  1. Upload Video: Provide an input video file (preferably with a clear face)
  2. Upload Audio: Provide an audio file with the target speech
  3. Adjust Parameters: (Optional) Fine-tune the bbox_shift parameter
  4. Generate: Click the "Generate" button to create your lip-synced video

Model Information

Requirements

The Space automatically installs all necessary dependencies including:

  • PyTorch and Torchvision
  • Gradio for the UI
  • OpenCV for video processing
  • Various ML libraries (transformers, diffusers, etc.)

Setup Instructions

This Space is configured to:

  1. Clone the MuseTalk repository on first run
  2. Install all required dependencies from requirements.txt
  3. Download necessary model weights automatically
  4. Launch the Gradio interface

Technical Details

Required Model Components:

  • VAE: sd-vae-ft-mse from Stability AI
  • Whisper: For audio processing
  • DWPose: For pose estimation
  • Face Parsing: For face segmentation
  • ResNet18: For feature extraction

Tips for Best Results

  • Use videos with clear, well-lit faces
  • Ensure audio quality is good for better lip sync
  • Adjust the bbox_shift parameter if the face detection is off-center
  • Input videos should ideally be in MP4 format

Citation

If you use MuseTalk in your research or projects, please cite the original repository:

@misc{musetalk2024,
  title={MuseTalk: Real-Time High-Quality Lip Synchronization},
  author={TMElyralab},
  year={2024},
  url={https://github.com/TMElyralab/MuseTalk}
}

Related Projects

  • MuseV - For text-to-video generation

License

Please refer to the original repository for licensing information.


Note: First-time setup may take several minutes as model weights (~2GB) are downloaded automatically.