File size: 7,698 Bytes
fcac53f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install numpy==1.19.5



D:\CODE\Wav2Lip-HD\

Chạy lệnh Python cho Wav2Lip:

python D:\CODE\Wav2Lip-HD\inference.py --checkpoint_path "D:\CODE\Wav2Lip-HD\checkpoints\wav2lip_gan.pth" --segmentation_path "D:\CODE\Wav2Lip-HD\checkpoints\face_segmentation.pth" --sr_path "D:\CODE\Wav2Lip-HD\checkpoints\esrgan_yunying.pth" --face D:\CODE\Wav2Lip-HD\input_videos\kurumi.jpg --audio D:\CODE\Wav2Lip-HD\input_audios\TRA GUNG DEN VO DUNG.mp3 --save_frames --gt_path "D:\CODE\Wav2Lip-HD\data\gt" --pred_path "D:\CODE\Wav2Lip-HD\data\lq" --no_sr --no_segmentation --outfile D:\CODE\Wav2Lip-HD\output_videos_wav2lip\mona.mp4





python D:\CODE\Wav2Lip-HD\inference.py --checkpoint_path "D:\CODE\Wav2Lip-HD\checkpoints\wav2lip_gan.pth" --segmentation_path "D:\CODE\Wav2Lip-HD\checkpoints\face_segmentation.pth" --sr_path "D:\CODE\Wav2Lip-HD\checkpoints\esrgan_yunying.pth" --face D:\CODE\Wav2Lip-HD\input_videos\kurumi.jpg --audio D:\CODE\Wav2Lip-HD\input_audios\TRA_GUNG_DEN_VO_DUNG.mp3 --save_frames --gt_path "D:\CODE\Wav2Lip-HD\data\gt" --pred_path "D:\CODE\Wav2Lip-HD\data\lq" --outfile D:\CODE\Wav2Lip-HD\output_videos_wav2lip\mona.mp4


Chạy lệnh video2frames:

python D:\CODE\Wav2Lip-HD\video2frames.py --input_video D:\CODE\Wav2Lip-HD\output_videos_wav2lip\mona.mp4 --frames_path D:\CODE\Wav2Lip-HD\frames_wav2lip\mona





Chạy Real-ESRGAN:

python D:\CODE\Wav2Lip-HD\Real-ESRGAN\inference_realesrgan.py -n RealESRGAN_x4plus -i D:\CODE\Wav2Lip-HD\frames_wav2lip\mona --output D:\CODE\Wav2Lip-HD\frames_hd\mona --outscale 3.5 --face_enhance



Chạy FFmpeg để tạo video từ frames (tùy chọn)

ffmpeg -r 20 -i D:\CODE\Wav2Lip-HD\frames_wav2lip\mona\frame_%%05d.jpg -i D:\CODE\Wav2Lip-HD\input_audios\ai.wav -vcodec libx264 -crf 25 -preset veryslow -acodec copy D:\CODE\Wav2Lip-HD\output_videos_hd\mona.mkv





ffmpeg -r 20 -i D:\CODE\Wav2Lip-HD\frames_wav2lip\mona\frame_%05d.jpg -i D:\CODE\Wav2Lip-HD\input_audios\ai.wav -vcodec libx264 -crf 25 -preset veryslow -acodec copy D:\CODE\Wav2Lip-HD\output_videos_hd\mona.mkv

-----------------------------

# Wav2Lip-HD: Improving Wav2Lip to achieve High-Fidelity Videos

This repository contains code for achieving high-fidelity lip-syncing in videos, using the [Wav2Lip algorithm](https://github.com/Rudrabha/Wav2Lip) for lip-syncing and the [Real-ESRGAN algorithm](https://github.com/xinntao/Real-ESRGAN) for super-resolution. The combination of these two algorithms allows for the creation of lip-synced videos that are both highly accurate and visually stunning.

## Algorithm

The algorithm for achieving high-fidelity lip-syncing with Wav2Lip and Real-ESRGAN can be summarized as follows:

1. The input video and audio are given to `Wav2Lip` algorithm.
2. Python script is written to extract frames from the video generated by wav2lip.
3. Frames are provided to Real-ESRGAN algorithm to improve quality.
4. Then, the high-quality frames are converted to video using ffmpeg, along with the original audio.
5. The result is a high-quality lip-syncing video.
6. The specific steps for running this algorithm are described in the [Testing Model](https://github.com/saifhassan/Wav2Lip-HD#testing-model) section of this README.

## Testing Model

To test the "Wav2Lip-HD" model, follow these steps:

1. Clone this repository and install requirements using following command (Make sure, Python and CUDA are already installed):

    ```

    git clone https://github.com/saifhassan/Wav2Lip-HD.git

    cd Wav2Lip-HD

    pip install -r requirements.txt

    ```

    

2. Downloading weights


| Model        | Directory           | Download Link  |
| :------------- |:-------------| :-----:|
| Wav2Lip           | [checkpoints/](https://github.com/saifhassan/Wav2Lip-HD/tree/main/checkpoints)   | [Link](https://drive.google.com/drive/folders/1tB_uz-TYMePRMZzrDMdShWUZZ0JK3SIZ?usp=sharing) |
| ESRGAN            | [experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/models/](https://github.com/saifhassan/Wav2Lip-HD/tree/main/experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/models) | [Link](https://drive.google.com/file/d/1Al8lEpnx2K-kDX7zL2DBcAuDnSKXACPb/view?usp=sharing) |
| Face_Detection    | [face_detection/detection/sfd/](https://github.com/saifhassan/Wav2Lip-HD/tree/main/face_detection/detection/sfd) | [Link](https://drive.google.com/file/d/1uNLYCPFFmO-og3WSHyFytJQLLYOwH5uY/view?usp=sharing) |

| Real-ESRGAN       | Real-ESRGAN/gfpgan/weights/   | [Link](https://drive.google.com/drive/folders/1BLx6aMpHgFt41fJ27_cRmT8bt53kVAYG?usp=sharing) |

| Real-ESRGAN       | Real-ESRGAN/weights/          | [Link](https://drive.google.com/file/d/1qNIf8cJl_dQo3ivelPJVWFkApyEAGnLi/view?usp=sharing) |





3. Put input video to `input_videos` directory and input audio to `input_audios` directory.

4. Open `run_final.sh` file and modify following parameters:
 
     `filename=kennedy` (just video file name without extension)

     

     `input_audio=input_audios/ai.wav` (audio filename with extension)


5. Execute `run_final.sh` using following command:

    ```

    bash run_final.sh

    ```

    

6. Outputs


- `output_videos_wav2lip` directory contains video output generated by wav2lip algorithm.
- `frames_wav2lip` directory contains frames extracted from video (generated by wav2lip algorithm).
- `frames_hd` directory contains frames after performing super-resolution using Real-ESRGAN algorithm.
- `output_videos_hd` directory contains final high quality video output generated by Wav2Lip-HD.


## Results
The results produced by Wav2Lip-HD are in two forms, one is frames and other is videos. Both are shared below:

### Example output frames </summary>
<table>
  <tr>
    <td>Frame by Wav2Lip</td>

     <td>Optimized Frame</td>

  </tr>

  <tr>

    <td><img src="examples/1_low.jpg" width=500></td>

    <td><img src="examples/1_hd.jpg" width=500></td>

  </tr>

    <tr>

    <td><img src="examples/kennedy_low.jpg" width=500></td>

    <td><img src="examples/kennedy_hd.jpg" width=500></td>

  </tr>


  </tr>
    <tr>

    <td><img src="examples/mona_low.jpg" width=500></td>

    <td><img src="examples/mona_hd.jpg" width=500></td>

  </tr>

 </table>

 </Details>


 ### Example output videos
    

 | Video by Wav2Lip  | Optimized Video |

| ------------- | ------------- |

| <video src="https://user-images.githubusercontent.com/11873763/229389410-56d96244-8c67-4add-a43e-a4900aa9db88.mp4" width="500">  | <video src="https://user-images.githubusercontent.com/11873763/229389414-d5cb6d33-7772-47a7-b829-9e3d5c3945a1.mp4" width="500">|

| <video src="https://user-images.githubusercontent.com/11873763/229389751-507669f1-7772-4863-ab23-8df7f206a065.mp4" width="500">  | <video src="https://user-images.githubusercontent.com/11873763/229389962-5373b765-ce3a-4af2-bd6a-8be8543ee933.mp4" width="500">|


## Acknowledgements

We would like to thank the following repositories and libraries for their contributions to our work:

1. The [Wav2Lip](https://github.com/Rudrabha/Wav2Lip) repository, which is the core model of our algorithm that performs lip-sync.
2. The [face-parsing.PyTorch](https://github.com/zllrunning/face-parsing.PyTorch) repository, which provides us with a model for face segmentation.
3. The [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN) repository, which provides the super resolution component for our algorithm.
4. [ffmpeg](https://ffmpeg.org), which we use for converting frames to video.