Spaces:
Running
A newer version of the Gradio SDK is available:
6.2.0
title: Lip Sync Generator
emoji: 🎵
colorFrom: indigo
colorTo: blue
sdk: gradio
app_file: app.py
pinned: false
Lip-Sync Video Generator
This project contains a simple web application that allows you to upload a single frame (or short clip) of an avatar and an audio file and then produce a lip‑synchronised video. Internally the app uses the open‑source Wav2Lip model to animate the avatar so that the mouth movements follow the uploaded audio. Everything runs on a free cloud platform – there is no need to install anything locally.
Features
- Upload your own avatar: any static image (PNG/JPG) or short video clip can be used as the source face.
- Upload an audio track: accepts common audio formats (MP3/WAV/M4A) from 1–10 minutes long.
- Self‑contained setup: on first use the application extracts the Wav2Lip source code from a zip archive (if present) and verifies that the model weights exist. If the environment allows outbound downloads, it will fetch the weights automatically; otherwise you can provide them manually. No local installation is required.
- Offline fallback: if neither the repository nor the weights are available (for example on locked‑down networks where large downloads are forbidden), the app gracefully falls back to a lightweight amplitude‑based animation. It will still produce a talking head by stretching and squashing the mouth region in sync with the loudness of the audio. This effect is simpler than full Wav2Lip but ensures you always get a video out.
- Runs on free cloud hardware: designed for deployment on Hugging Face Spaces, which provide free CPU/GPU resources for public projects.
- Extensible: advanced users can tweak padding, segmentation and other options by modifying the inference arguments in
app.py.
How it works
When the Generate video button is pressed, the application performs the following steps:
- If the
Wav2Lipfolder is not present, it tries to extract it from a local zip archive namedWav2Lip-master.zip. If the archive isn’t found it attempts a shallow clone from GitHub. On network‑restricted environments you should upload the archive yourself (see Deploying to Spaces). - If the pre‑trained weights (
wav2lip_gan.pthand the face segmentation model) are not present, it attempts to download them from publicly available mirrors. These files are large (~436 MB and ~53 MB respectively). If the download fails, you can upload the files manually into theWav2Lip/checkpointsfolder. - The uploaded image/video and audio files are saved into a temporary folder. Basic validation ensures that the audio duration is between 1 and 10 minutes; otherwise an error is shown.
- The application calls the official
inference.pyscript from Wav2Lip in a subprocess. The script reads the avatar frame and audio, applies the lip‑sync model, and writes an MP4 video to theoutputsdirectory. If this step fails because the repository or weights are missing, the app automatically switches to a basic fallback: it computes the loudness of the audio and stretches the mouth area of the avatar up and down to create a rudimentary talking animation. - Once video generation completes (either via Wav2Lip or the fallback), the resulting MP4 is returned to the web UI and can be played or downloaded.
The heavy lifting is done by Wav2Lip; this project simply wraps it in a clean user interface with sensible defaults and handles all setup.
Deploying to Hugging Face Spaces
- Create a free account: go to huggingface.co and create a free account if you don’t already have one.
- Create a new Space: from your dashboard, click “New Space”, choose the Gradio SDK and give your space a name (e.g.
lip-sync-app). Set it to “Public” so that it can use free hardware. - Upload the project files: clone or download this repository, then upload the contents of the
lipsync_appfolder (app.py,requirements.txtand thisREADME.md) into your new space. The file structure should look like this:
├── app.py
├── requirements.txt
└── README.md
- (Optional) Upload the Wav2Lip source and weights: in environments without internet access you should provide two additional assets:
- A zip of the Wav2Lip repository named
Wav2Lip-master.zip(download it from https://github.com/Rudrabha/Wav2Lip using the Download ZIP button). Place the archive at the root of your space so the app can extract it. - The model checkpoints
wav2lip_gan.pth(≈436 MB) andface_segmentation.pth(≈53 MB). Upload these files into a folderWav2Lip/checkpointsin your space. You can obtain them from the links inMODEL_URLSinsideapp.py.
- A zip of the Wav2Lip repository named
- Commit and build: once all files are uploaded, commit them to the Space. Hugging Face will install the Python dependencies and build the application automatically. The first build may take a few minutes.
- Use the app: after the build succeeds and the weights are available, open the App tab of your space. You can now upload an image and an audio file and click Generate video to produce a lip‑synchronised output. Longer audio clips (up to ten minutes) will take longer to process.
Limitations
- Large downloads on first run: the Wav2Lip weights are hundreds of megabytes. The application caches them in the working directory so subsequent runs are faster. If the environment is reset (e.g. if the space times out), the weights will be downloaded again.
- Processing time: running Wav2Lip on CPU is slow (several minutes for a 1‑minute clip). For best performance, enable GPU hardware in your Hugging Face space settings. GPU hours are free on public spaces but limited, so plan accordingly.
- Avatar quality: Wav2Lip works best with clear, front‑facing images where the mouth is visible. Complex backgrounds or occlusions can degrade the result.
Troubleshooting
- Error cloning repository: network‑restricted environments may forbid git operations. Download the Wav2Lip source code as a zip file on your own machine via the Download ZIP option on GitHub. Rename the archive to
Wav2Lip-master.zipand upload it into the root of your space. The app will extract it automatically. - Model download fails: large binary files often cannot be fetched from within a Space. Download the files listed in
MODEL_URLSinapp.py(wav2lip_gan.pthandface_segmentation.pth) to your computer and upload them intoWav2Lip/checkpointsin your space. Once present, the app will skip downloading them. - Inference error / missing FFmpeg: the Wav2Lip inference script requires the
ffmpegbinary to combine audio and video. If your space does not haveffmpeginstalled, consider enabling GPU hardware (which comes with ffmpeg) or add a staticffmpegbinary into your repository and modifyPATHaccordingly. - App times out or runs out of memory: try reducing the audio length or using CPU hardware. Free GPU instances provide limited memory (T4/8 GB), which may not handle extremely high resolution inputs.
Acknowledgements
This project would not be possible without the original Wav2Lip authors and their permissive MIT licence. It also makes use of Gradio for the web interface.