v2v based on custom audio
Does LTX2.3 support video generation based on a custom audio? Input is the Video, not a image.
Does LTX2.3 support video generation based on a custom audio? Input is the Video, not a image.
Not entirely sure i understood. You want to use the audio, from an input video?
The audio doesn’t come from the video. Instead, use another audio input to drive the video generation.
应该是数字人视频对口型的任务
I guess you mean what @zhaoyun0071 said @LiMuyi ?
You have a video, but you want to add dialog and lipsync?
I been testing that, with masks. But LTX is a little different than Wan, so the masking isn't as easy and straight forwards as with Wan.
LTX also mentioned they would come with an inpaint lora, fingers crossed ;-) that would make it much easier.
But will try some more for a lip-sync method ;-) The main challenge been to make the impact strong enough. It often ends up as a voice-over narrator instead of lips moving. But will try again ;-)
A "fake" way would just be to take the first frame of the video and use that as image-2-video.
But i guess you want to keep the video, but just change the face ;-)
From an early rough test I did with masking face, and adding voice (to a silent wan video). Its with low resolution and all, since it was just a quick test run :
( this audio came from LTX though, from the prompt. But would work same way with input audio file)
Does LTX2.3 support video generation based on a custom audio? Input is the Video, not a image.
I've basically achieved this, but the video and audio are still misaligned, most likely due to the mask issue. I need to investigate further.
I guess you mean what @zhaoyun0071 said @LiMuyi ?
You have a video, but you want to add dialog and lipsync?I been testing that, with masks. But LTX is a little different than Wan, so the masking isn't as easy and straight forwards as with Wan.
LTX also mentioned they would come with an inpaint lora, fingers crossed ;-) that would make it much easier.But will try some more for a lip-sync method ;-) The main challenge been to make the impact strong enough. It often ends up as a voice-over narrator instead of lips moving. But will try again ;-)
A "fake" way would just be to take the first frame of the video and use that as image-2-video.
But i guess you want to keep the video, but just change the face ;-)From an early rough test I did with masking face, and adding voice (to a silent wan video). Its with low resolution and all, since it was just a quick test run :
( this audio came from LTX though, from the prompt. But would work same way with input audio file)
Thanks, I've basically achieved this, but the video and audio are still misaligned, most likely due to the mask issue. I need to investigate further.