|
|
--- |
|
|
license: cc-by-4.0 |
|
|
tags: |
|
|
- lipsync |
|
|
- dubbing |
|
|
- tts |
|
|
- asr |
|
|
- mt |
|
|
- nptel |
|
|
--- |
|
|
# Lip-Sync Video Generation |
|
|
|
|
|
This project generates a lip-synced video based on an original video and translated subtitles. |
|
|
|
|
|
generate_Lipsync_videos - In this project, we maintained the constant speed of the audio and slowed down the video by interpolating the video frames. This approach ensured proper synchronization between the audio and video, resulting in a video duration longer than the original. |
|
|
|
|
|
## Prerequisites |
|
|
|
|
|
- Conda |
|
|
- Python 3.8 |
|
|
|
|
|
## Setup |
|
|
|
|
|
1. Clone the repository: |
|
|
|
|
|
```bash |
|
|
git clone https://huggingface.co/smtiitm/generate_lipsynced_videos_IITM_API |
|
|
cd generate_lipsynced_videos_IITM_API #generate_Lipsync_videos |
|
|
``` |
|
|
|
|
|
2. Create and activate the conda environment: |
|
|
|
|
|
```bash |
|
|
conda create -n lipsync_env python=3.8 |
|
|
conda activate lipsync_env |
|
|
``` |
|
|
|
|
|
3. Install the required packages: |
|
|
|
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
4. Currently the code uses our APIs for text-to-speech generation. You can use your own TTS API (make sure to use the correct payload given in `vtt_to_speech.py` file) URL in `srt_to_audio_original` file. The current APIs are local to the lab and can be requested via mail. |
|
|
|
|
|
## Usage |
|
|
|
|
|
1. Edit the `run_script.sh` file to assign the paths to your original video and translated subtitles: |
|
|
|
|
|
```bash |
|
|
original_video_path="path/to/original_video.mp4" |
|
|
translated_srt_path="path/to/translated_subtitles.srt" |
|
|
``` |
|
|
|
|
|
2. Assign the target language code in `run_script.sh`: |
|
|
|
|
|
```bash |
|
|
target_language_code="hin" # Example: 'hin' for Hindi |
|
|
``` |
|
|
|
|
|
3. Run the script: |
|
|
|
|
|
```bash |
|
|
./run_script.sh |
|
|
``` |
|
|
4. Current supported sampling rate is 22.5KHz and .wav format is preferred. |
|
|
|
|
|
## Supported Languages |
|
|
|
|
|
- Hindi (hin) |
|
|
- Malayalam (mal) |
|
|
- Kannada (kan) |
|
|
- Bengali (bn) |
|
|
- Urdu (ur) |
|
|
- Telugu (tel) |
|
|
- Punjabi (pun) |
|
|
- Marathi (mar) |
|
|
- Gujarati (guj) |
|
|
- Tamil (tam) |
|
|
- English (en) |
|
|
|
|
|
### Citation |
|
|
If you use this repo in your research or work, please consider citing: |
|
|
|
|
|
“ |
|
|
COPYRIGHT |
|
|
2025, Speech Technology Consortium, |
|
|
|
|
|
Bhashini, MeiTY and by Hema A Murthy & S Umesh, |
|
|
|
|
|
|
|
|
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING |
|
|
and |
|
|
ELECTRICAL ENGINEERING, |
|
|
IIT MADRAS. ALL RIGHTS RESERVED " |
|
|
|
|
|
|
|
|
|
|
|
Shield: [![CC BY 4.0][cc-by-shield]][cc-by] |
|
|
|
|
|
This work is licensed under a |
|
|
[Creative Commons Attribution 4.0 International License][cc-by]. |
|
|
|
|
|
[![CC BY 4.0][cc-by-image]][cc-by] |
|
|
|
|
|
[cc-by]: http://creativecommons.org/licenses/by/4.0/ |
|
|
[cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png |
|
|
[cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg |