Spaces:

scratchyourbrain123
/

MuseTalk

Running

App Files Files Community

MuseTalk / README.md

scratchyourbrain123

Update README with comprehensive setup and usage documentation

b6f43ab verified 3 months ago

preview code

raw

history blame contribute delete

2.89 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: MuseTalk
emoji: 💻
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: MuseTalk - Real-time Audio-Driven Lip Sync

MuseTalk: Real-Time High-Quality Lip Synchronization

This Hugging Face Space allows you to run MuseTalk for audio-driven lip synchronization experiments.

About MuseTalk

MuseTalk is a real-time, high-quality audio-driven lip synchronization model that generates realistic lip movements from audio input. It can be applied to videos to create lip-synced content.

Features

Real-time Processing: Generate lip-synced videos efficiently
High Quality: Produces natural and realistic lip movements
Easy to Use: Simple Gradio interface for quick experimentation
Customizable: Adjust bounding box positions for better results

How to Use

Upload Video: Provide an input video file (preferably with a clear face)
Upload Audio: Provide an audio file with the target speech
Adjust Parameters: (Optional) Fine-tune the bbox_shift parameter
Generate: Click the "Generate" button to create your lip-synced video

Model Information

Model Weights: TMElyralab/MuseTalk
GitHub Repository: TMElyralab/MuseTalk

Requirements

The Space automatically installs all necessary dependencies including:

PyTorch and Torchvision
Gradio for the UI
OpenCV for video processing
Various ML libraries (transformers, diffusers, etc.)

Setup Instructions

This Space is configured to:

Clone the MuseTalk repository on first run
Install all required dependencies from requirements.txt
Download necessary model weights automatically
Launch the Gradio interface

Technical Details

Required Model Components:

VAE: sd-vae-ft-mse from Stability AI
Whisper: For audio processing
DWPose: For pose estimation
Face Parsing: For face segmentation
ResNet18: For feature extraction

Tips for Best Results

Use videos with clear, well-lit faces
Ensure audio quality is good for better lip sync
Adjust the bbox_shift parameter if the face detection is off-center
Input videos should ideally be in MP4 format

Citation

If you use MuseTalk in your research or projects, please cite the original repository:

@misc{musetalk2024,
  title={MuseTalk: Real-Time High-Quality Lip Synchronization},
  author={TMElyralab},
  year={2024},
  url={https://github.com/TMElyralab/MuseTalk}
}

Related Projects

MuseV - For text-to-video generation

License

Please refer to the original repository for licensing information.

Note: First-time setup may take several minutes as model weights (~2GB) are downloaded automatically.