Workflow: V2V Just Dub It - lip synced multi-language dubbing with IC-Lora-LipDub

#141
by RuneXX - opened

Italian

Swedish

German

Spanish

V2V Just Dub It - lip synced multi-language dubbing with IC-Lora-LipDub

Translate any video with LTX official LipDub lora, based on the JustDubIt paper.
Lora available here: https://huggingface.co/Lightricks/LTX-2.3-22b-IC-LoRA-LipDub

And workflow to try here: https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main/Video-2-Video

You need the very latest of ComfyUI-LTXVideo. (update in comfyui manager, or install if you dont have already)

missing node - LTXVSetAudioRefTokens
When I click on install missing nodes in comfy manager, nothing comes up as missing. Any clue what github this is on where I can install it through python? Thanks. Also, how is this different from your recut workflow? Sounds like this does the exact same thing except for the language change.

missing node - LTXVSetAudioRefTokens

Try update ComfyUI-LTXVideo.
They made soem custom tokens for this feature.

Just search LTX in comfy manager and update, or install
I'll add a note in the original post, forgot about that one

Hi RuneXX,

Is it possible to have custom audio with this new dub ic-lora?

I don't see any node in this workflow load the IC Lora model.

I don't see any node in this workflow load the IC Lora model.

Look at top left, the Power Lora loader

Thanks RuneXX. Updating ComfyUI-LTXVideo through the manager fixed it. In your opinion, if not using the language change, is there any benefit/difference in using this over your recut workflow which doesn't need a lora?

Both way works, the LTX model seems to have built-in dubbing feature. As you said, it works even without the lora.
But seemed a bit more natural, faithful to the original and polished perhaps with the lora. And dont need any masking or anything like that with the lora.
Only did a few test runs though.

Thanks for the quick response. It does a great job with the lip sync but I find the voice completely changes from the original. Maybe I'm doing something wrong. Your retake nails it or at least stays very close to the original voice.

Yes agree it changes a bit. Maybe could add a voice clone at the very end, will try that later ;-)
The wf as is, is how LTX made it, but can always add some extras to see if it works even better

That being said, i see they updated the lora. Not sure if the updated lora works better (but you might already have the latest, depending on when you downloaded it).
It also says version 0.9, so maybe it might get further improvements also

I have the latest version and tried many different settings. Unfortunately it fails with cloning the voice. It is great for the language change.

Got another question for you Rune. You obviously do way more testing then most when it comes to generating videos. I noticed that you tend to put all your schedulers as linear_quadratic now. Do you find this gives cleaner audio compared to the others? I also noticed you put your random noise as fixed. Is there a reason for this? I always found random for both passes to be best. Just curious about your thoughts on these two settings.

tend to put all your schedulers as linear_quadratic now

I use that a lot myself, but its usually "hidden option" under the regular 8 step manual sigma. I put it below/under the manual sigma as an option that you can easily connect to sampler
The main benefit of the Basic scheduler (with linear_quadratic or other ), is that you add more steps easily.

8 step can be a bit optimistic, perhaps.. i cant even make a decent single image in some image models with that few steps ;-)
So for higher action scenes, or just more complex scenes that close up portraits, LTX can benefit from a few more steps.
So thats the only reason its there ;-)

And some workflows can really benefit from more steps most all of the time.. like those masking workflows, re-take etc.. Probably why i left it there as "default" (most all other workflows, i just put the 8 step manual sigma, the default LTX way, and leave the Basic Scheduler as option)

I noted that on the official LTX worfklow for ICLoRA libdub, they use these manual sigma's for the 2'nd pass: 0.909375, 0.725, 0.421875, 0.0 and not the "usual": 0.85, 0.7250, 0.4219, 0.0

I assume this is due to extra denoising for lipdub perhaps?

I didnt notice that part, assumed it was the standard sigmas. Nice catch ;-) since they changed it its probably for a reason. Will add to the wf

RuneXX, is it possible to use audio files instead of promt?

RuneXX, is it possible to use audio files instead of promt?

It kinda does already. It uses the audio of the video input. The prompting part is just transcribing the audio in the video to another language (dubbing)
You mean a silent video input and add sound?

RuneXX, is it possible to use audio files instead of promt?

It kinda does already. It uses the audio of the video input. The prompting part is just transcribing the audio in the video to another language (dubbing)
You mean a silent video input and add sound?

Yes, a silent video input and add sound. Just like in the workflow LTX-2.3_-_V2V_Just_Talk_custom_audio_lip-synced_to_any_video.json, but only with DubLip lora and the corresponding DubLip workflow.

Yes, a silent video input and add sound. Just like in the workflow LTX-2.3_-_V2V_Just_Talk_custom_audio_lip-synced_to_any_video.json, but only with DubLip lora and the corresponding DubLip workflow.

Will give it a try. The reference audio would then have to be masked in (as with any custom audio workflow).
I do think that the DubLip lora will then not "hear" the audio, but could be it works.. will try ;-)

Yes, a silent video input and add sound. Just like in the workflow LTX-2.3_-_V2V_Just_Talk_custom_audio_lip-synced_to_any_video.json, but only with DubLip lora and the corresponding DubLip workflow.

Will give it a try. The reference audio would then have to be masked in (as with any custom audio workflow).
I do think that the DubLip lora will then not "hear" the audio, but could be it works.. will try ;-)

Thank you for your feedback!

Hi, I don't really know what I am doing but I got it to work with custom audio. I just piped the custom audio as audio original and some other spaghetti work - see the attached pic for reference.
image

I'm trying to figure out how to amplify the 'Lipsync Dub Lora 0.9' because it's too weak with my current LoRA setup. My characters barely open their mouths. I tried adding a 'Latent Multiply' node after 'Set Latent Noise Mask', but it introduces artifacts.

Since the Lipsync LoRA's strength maxes out at 1.0 - does anyone have ideas on how to boost the lipsync effect?

Hi, I don't really know what I am doing but I got it to work with custom audio.

That looks all good, from a quick look... ;-) I guess it works then, -I didnt have chance to test yet

I'm trying to figure out how to amplify the 'Lipsync Dub Lora 0.9' because it's too weak with my current LoRA setup. My characters barely open their mouths.

You I dont think you can "amplify" the latent.
What you can try is increase the steps. Under the manual sigma node I usually "hide" the Basic Scheduler node, so you can connect that as sigma to the sampler instead.
And then easily adjust the steps. Try something like 10-15 steps. That should improve.. hopefully

Also the prompt matter, so in the prompt write something like: And then the man talks, and he says : "...... transcribe the words spoken to the language of choice.."

That should hopefully fix it. And if not, you can also adjust the CFG, try set it to 1.5 to 3..

image

Sign up or log in to comment