itechmusic commited on
Commit
77df88f
·
unverified ·
1 Parent(s): 6f36ffc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -138,7 +138,7 @@ MuseTalk was trained in latent spaces, where the images were encoded by a freeze
138
  </tr>
139
  </table>
140
 
141
- * For video dubbing, we applied a self-developed tool which can detect the talking person.
142
 
143
  ## Some interesting videos!
144
  <table class="center">
@@ -232,7 +232,9 @@ Here, we provide the inference script.
232
  python -m scripts.inference --inference_config configs/inference/test.yaml
233
  ```
234
  configs/inference/test.yaml is the path to the inference configuration file, including video_path and audio_path.
235
- The video_path should be either a video file or a directory of images.
 
 
236
 
237
  #### Use of bbox_shift to have adjustable results
238
  :mag_right: We have found that upper-bound of the mask has an important impact on mouth openness. Thus, to control the mask region, we suggest using the `bbox_shift` parameter. Positive values (moving towards the lower half) increase mouth openness, while negative values (moving towards the upper half) decrease mouth openness.
@@ -247,7 +249,7 @@ python -m scripts.inference --inference_config configs/inference/test.yaml --bbo
247
 
248
  #### Combining MuseV and MuseTalk
249
 
250
- As a complete solution to virtual human generation, you are suggested to first apply [MuseV](https://github.com/TMElyralab/MuseV) to generate a video (text-to-video, image-to-video or pose-to-video) by referring [this](https://github.com/TMElyralab/MuseV?tab=readme-ov-file#text2video). Then, you can use `MuseTalk` to generate a lip-sync video by referring [this](https://github.com/TMElyralab/MuseTalk?tab=readme-ov-file#inference).
251
 
252
  # Note
253
 
 
138
  </tr>
139
  </table>
140
 
141
+ * For video dubbing, we applied a self-developed tool which can identify the talking person.
142
 
143
  ## Some interesting videos!
144
  <table class="center">
 
232
  python -m scripts.inference --inference_config configs/inference/test.yaml
233
  ```
234
  configs/inference/test.yaml is the path to the inference configuration file, including video_path and audio_path.
235
+ The video_path should be either a video file or a directory of images.
236
+
237
+ You are recommended to input video with `25fps`, the same fps used when training the model. If your video is far less than 25fps, you are recommended to apply frame interpolation or directly convert the video to 25fps using ffmpeg.
238
 
239
  #### Use of bbox_shift to have adjustable results
240
  :mag_right: We have found that upper-bound of the mask has an important impact on mouth openness. Thus, to control the mask region, we suggest using the `bbox_shift` parameter. Positive values (moving towards the lower half) increase mouth openness, while negative values (moving towards the upper half) decrease mouth openness.
 
249
 
250
  #### Combining MuseV and MuseTalk
251
 
252
+ As a complete solution to virtual human generation, you are suggested to first apply [MuseV](https://github.com/TMElyralab/MuseV) to generate a video (text-to-video, image-to-video or pose-to-video) by referring [this](https://github.com/TMElyralab/MuseV?tab=readme-ov-file#text2video). Frame interpolation is suggested to increase frame rate. Then, you can use `MuseTalk` to generate a lip-sync video by referring [this](https://github.com/TMElyralab/MuseTalk?tab=readme-ov-file#inference).
253
 
254
  # Note
255