--- title: Lip Sync Generator emoji: "đŸŽ”" colorFrom: indigo colorTo: blue sdk: gradio app_file: app.py pinned: false --- # Lip-Sync Video Generator This project contains a simple web application that allows you to upload a single frame (or short clip) of an avatar and an audio file and then produce a lip‑synchronised video. Internally the app uses the open‑source **Wav2Lip** model to animate the avatar so that the mouth movements follow the uploaded audio. Everything runs on a free cloud platform – there is no need to install anything locally. ## Features - **Upload your own avatar:** any static image (PNG/JPG) or short video clip can be used as the source face. - **Upload an audio track:** accepts common audio formats (MP3/WAV/M4A) from 1–10 minutes long. - **Self‑contained setup:** on first use the application extracts the Wav2Lip source code from a zip archive (if present) and verifies that the model weights exist. If the environment allows outbound downloads, it will fetch the weights automatically; otherwise you can provide them manually. No local installation is required. - **Offline fallback:** if neither the repository nor the weights are available (for example on locked‑down networks where large downloads are forbidden), the app gracefully falls back to a lightweight amplitude‑based animation. It will still produce a talking head by stretching and squashing the mouth region in sync with the loudness of the audio. This effect is simpler than full Wav2Lip but ensures you always get a video out. - **Runs on free cloud hardware:** designed for deployment on [Hugging Face Spaces](https://huggingface.co/spaces), which provide free CPU/GPU resources for public projects. - **Extensible:** advanced users can tweak padding, segmentation and other options by modifying the inference arguments in ``app.py``. ## How it works When the **Generate video** button is pressed, the application performs the following steps: 1. If the ``Wav2Lip`` folder is not present, it tries to extract it from a local zip archive named ``Wav2Lip-master.zip``. If the archive isn’t found it attempts a shallow clone from GitHub. On network‑restricted environments you should upload the archive yourself (see **Deploying to Spaces**). 2. If the pre‑trained weights (``wav2lip_gan.pth`` and the face segmentation model) are not present, it attempts to download them from publicly available mirrors. These files are large (~436 MB and ~53 MB respectively). If the download fails, you can upload the files manually into the ``Wav2Lip/checkpoints`` folder. 3. The uploaded image/video and audio files are saved into a temporary folder. Basic validation ensures that the audio duration is between 1 and 10 minutes; otherwise an error is shown. 4. The application calls the official ``inference.py`` script from Wav2Lip in a subprocess. The script reads the avatar frame and audio, applies the lip‑sync model, and writes an MP4 video to the ``outputs`` directory. If this step fails because the repository or weights are missing, the app automatically switches to a basic fallback: it computes the loudness of the audio and stretches the mouth area of the avatar up and down to create a rudimentary talking animation. 5. Once video generation completes (either via Wav2Lip or the fallback), the resulting MP4 is returned to the web UI and can be played or downloaded. The heavy lifting is done by Wav2Lip; this project simply wraps it in a clean user interface with sensible defaults and handles all setup. ## Deploying to Hugging Face Spaces 1. **Create a free account:** go to [huggingface.co](https://huggingface.co) and create a free account if you don’t already have one. 2. **Create a new Space:** from your dashboard, click “New Space”, choose the **Gradio** SDK and give your space a name (e.g. ``lip-sync-app``). Set it to “Public” so that it can use free hardware. 3. **Upload the project files:** clone or download this repository, then upload the contents of the ``lipsync_app`` folder (``app.py``, ``requirements.txt`` and this ``README.md``) into your new space. The file structure should look like this: ``` ├── app.py ├── requirements.txt └── README.md ``` 4. **(Optional) Upload the Wav2Lip source and weights:** in environments without internet access you should provide two additional assets: - A zip of the Wav2Lip repository named ``Wav2Lip-master.zip`` (download it from using the **Download ZIP** button). Place the archive at the root of your space so the app can extract it. - The model checkpoints ``wav2lip_gan.pth`` (≈436 MB) and ``face_segmentation.pth`` (≈53 MB). Upload these files into a folder ``Wav2Lip/checkpoints`` in your space. You can obtain them from the links in ``MODEL_URLS`` inside ``app.py``. 5. **Commit and build:** once all files are uploaded, commit them to the Space. Hugging Face will install the Python dependencies and build the application automatically. The first build may take a few minutes. 6. **Use the app:** after the build succeeds and the weights are available, open the **App** tab of your space. You can now upload an image and an audio file and click **Generate video** to produce a lip‑synchronised output. Longer audio clips (up to ten minutes) will take longer to process. ## Limitations - **Large downloads on first run:** the Wav2Lip weights are hundreds of megabytes. The application caches them in the working directory so subsequent runs are faster. If the environment is reset (e.g. if the space times out), the weights will be downloaded again. - **Processing time:** running Wav2Lip on CPU is slow (several minutes for a 1‑minute clip). For best performance, enable GPU hardware in your Hugging Face space settings. GPU hours are free on public spaces but limited, so plan accordingly. - **Avatar quality:** Wav2Lip works best with clear, front‑facing images where the mouth is visible. Complex backgrounds or occlusions can degrade the result. ## Troubleshooting - **Error cloning repository:** network‑restricted environments may forbid git operations. Download the Wav2Lip source code as a zip file on your own machine via the **Download ZIP** option on GitHub. Rename the archive to ``Wav2Lip-master.zip`` and upload it into the root of your space. The app will extract it automatically. - **Model download fails:** large binary files often cannot be fetched from within a Space. Download the files listed in ``MODEL_URLS`` in ``app.py`` (``wav2lip_gan.pth`` and ``face_segmentation.pth``) to your computer and upload them into ``Wav2Lip/checkpoints`` in your space. Once present, the app will skip downloading them. - **Inference error / missing FFmpeg:** the Wav2Lip inference script requires the ``ffmpeg`` binary to combine audio and video. If your space does not have ``ffmpeg`` installed, consider enabling GPU hardware (which comes with ffmpeg) or add a static ``ffmpeg`` binary into your repository and modify ``PATH`` accordingly. - **App times out or runs out of memory:** try reducing the audio length or using CPU hardware. Free GPU instances provide limited memory (T4/8 GB), which may not handle extremely high resolution inputs. ## Acknowledgements This project would not be possible without the original [Wav2Lip](https://github.com/Rudrabha/Wav2Lip) authors and their permissive MIT licence. It also makes use of [Gradio](https://gradio.app/) for the web interface.