Add audio to existing silent video?

#30

by phooney - opened Mar 13

Mar 13

First of all, thank you for creating these workflows, yours are the first LTX2.3 workflows I tested that didn't throw insane errors at me and I'm really happy with the results.
One of the features I liked in my LTX2.0 I2V workflow was that I could input an image OR a silent video and if I input a video, LTX would generate the audio for it. I tried simply changing the input from an image to a video on the basic i2v/t2v workflow, but that didn't work.
Any advice or workflow for this?

Halifax0001

Mar 13

Try this workflow:
https://civitai.com/models/2315986

RuneXX

Owner Mar 13

•

edited Mar 13

I tried simply changing the input from an image to a video on the basic i2v/t2v workflow, but that didn't work.
Any advice or workflow for this?

Yes that wont work ;-)
I made a "foley" workflow for LTX-2.0, perhaps the one you tried.
It can add audio to a silent video.

I'll update for LTX-2.3 and add it here ;-)

(the one on civitai is probably too old, it was made before LTX-2.3)

phooney

Mar 13

I tried simply changing the input from an image to a video on the basic i2v/t2v workflow, but that didn't work.
Any advice or workflow for this?

Yes that wont work ;-)
I made a "foley" workflow for LTX-2.0, perhaps the one you tried.
It can add audio to a silent video.

I'll update for LTX-2.3 and add it here ;-)

(the one on civitai is probably too old, it was made before LTX-2.3)

Thanks!

Nah, the ltx2.0 workflow I used wasn't the one linked above. It didn't involve any special lora, I just added the video input node and swapped the connections when I wanted to change input from image to video.

Looking forward to your 2.3 version!

RuneXX

Owner Mar 13

•

edited Mar 13

Nah, the ltx2.0 workflow I used wasn't the one linked above

i didn't look at the civitai workflow, it might be entirely different than mine ;-)

I just added the video input node and swapped the connections when I wanted to change input from image to video.

Could be some batch image sort of workflow perhaps. But a "foley" workflow probably work better.
But i can try an image batch workflow too, that takes all the frames of the video and uses it in a I2V workflow

Looking forward to your 2.3 version!

I uploaded a foley workflow now, you can try it out if you want :
https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

harahara2016

Mar 13

Is it possible to do lipsync also with LTX for silent videos?

RuneXX

Owner Mar 13

Is it possible to do lipsync also with LTX for silent videos?

yes, i think so... I have it on my to-do to give it a try. I already started on a inpaint lip-sync test earlier today.

Very rough sketch, at low res, and single pass to test the concept, so a final workflow quality would be a lot better. But seems to work;-)
Will try see if i can make it work consistently and in a V2V workflow

harahara2016

Mar 13

Is it possible to do lipsync also with LTX for silent videos?

yes, i think so... I have it on my to-do to give it a try. I already started on a inpaint lip-sync test earlier today.

Very rough sketch, at low res, and single pass to test the concept, so a final workflow quality would be a lot better. But seems to work;-)
Will try see if i can make it work consistently and in a V2V workflow

Thats Awesome bro, Do share the workflow if its ready.

phooney

Mar 14

I uploaded a foley workflow now, you can try it out if you want :
https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

Thanks for your efforts! What this workflow seems to do is take the first frame of the input-video and create a 'new-video' based on that, then take the audio from that new video and attach it to the original input video? So... although audio of the correct length is attached to the video, all the timing just feels off because the sounds relate to something that happened in the 'new-video'.

It's hard to remember where I got the original LTX video workflow from, but I'm reasonably sure it was here https://comfyui.nomadoor.net/en/basic-workflows/ltx-2/

Would the image batch workflow match the timings better?

RuneXX

Owner Mar 14

•

edited Mar 14

take the first frame of the input-video and create a 'new-video' based on that, then take the audio from that new video and attach it to the original input video?

Takes ALL frames of the video, and makes audio to it.

although audio of the correct length is attached to the video, all the timing just feels off

Odd, I'll take a look if i made some mistake. Timing should not be off, and no different than batch image (sort of is the same, since it uses all frames of the video)
The only thing that could make timing off is if forgot to make same FPS all the way through

If you used a different workflow before, a little unsure which one you used. But adapting it for LTX-2.3 should be easy.
I thought it might be the "foley" one you used before ;-)

phooney

Mar 14

I was just going by the little preview visible in the sampler node, there is a completely different video generated, I assumed that the audio was matched to that one.

RuneXX

Owner Mar 14

•

edited Mar 14

Should match your prompt (most of all), as well as see the input frames.
But the prompt is of course very important. Its where you tell what sequence of audio you want to add

Its of course with some limitation, as to what the model can do. More "extreme" sound "editing" might be a bit out of scope ;-)
But will take a look, if something was missed. I made it in a hurry last nite when you asked about such a workflow ;-)

RuneXX

Owner Mar 14

•

edited Mar 14

wait a few, I'll improve it with a 2nd pass. As well as add an "extend the video" feature to it
Since silent video inputs often are quite short (say 5 second clip from Wan or similar)

RuneXX

Owner Mar 14

•

edited Mar 15

WAN (Original Input Video without audio)

LTX "Foley" (only audio added)

New version uploaded.
It should work better now. And I used a goofy monster video since he was walking, so that you can see timing as far as audio goes, matching the footsteps.
And it also feature extend mode, so if you got a short silent video (from Wan or similar, as they are 81 frames (5s) or so), you can now add 5-10 seconds to the video, LTX will continue where input video stops.

phooney

Mar 15

Hey lookin' (soundin') good! This pairs the audio with the visual much better. I did notice that when you use the node to extend the video, only the new fully LTX rendered video (in the LTX RENDER - VIDEO & SOUND node) comes out at the extended length, the original video remains at its original length. Can we extend the original video?

RuneXX

Owner Mar 15

Yes should be doable. And will also see if i can make the transition smoother, was some flickering issues

LiMuyi

Mar 16

I uploaded a foley workflow now, you can try it out if you want :
https://huggingface.co/RuneXX/LTX-2.3-Workflows/tree/main

I'm very interested in this workflow. In your v2v workflow, the audio is generated from prompts and then drives the video generation. Is it possible to input an audio clip and then use that audio to drive the generation of the target video from the original video?

RuneXX

Owner Mar 16

I Is it possible to input an audio clip and then use that audio to drive the generation of the target video from the original video?

Yeah that should be possible. A combination of the V2V Foley workflow and the Custom Audio workflows.
Will give it a try, sounds like an interesting and useful idea ;-)

RuneXX

Owner Mar 16

•

edited Mar 16

@LiMuyi

Got me curious. How do you vision it to work though? Why input video instead of input image?
Input image + audio (I2V custom audio workflows are already added)

You want the input video to be "re-imagined"? or extended? Or both ;-) In other words take like say 2-3 seconds of the input video as reference, and then start recreating the video from the audio input.
Or do you see it more as a video extension. Where your input video gets extended and then follows the audio input from there...

Both are possible. And perhaps i can even make both work in one workflow, just make it a little setting node. For when LTX should take over. After X seconds of input video.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment