Integrating IC Lora union control with Long Video Custom Audio loop

#134

by HokshaBald - opened 10 days ago

Hi,

Thank you for your amazing work. I am attempting to add IC Lora union control to the Long Video Custom Audio loop workflow and the dwpose skeleton was sampled. Eventually the character appears but clearly I am doing something wrong.

Here's what I have done if I remember correctly

Added the ic-lora loader node with the union control lora
Added a control image input to the loop subgraph
Inside the subgraph I scale the control image using ltx_latent_off (from ic lora loader) * 32 as in your other workflow with ic lora
Added 🅛🅣🅧 Add Video IC-LoRA Guide after the 🅛🅣🅧 LTXV Add Latent Guide node.
I am calling LTXVCropGuides connected to the ic lora first before calling it on the add latent (I thought that would make sense because IC-LoRA Guide was added after the Latent Guide Node.

And I think those are the big changes I have done.
I wanted to go with the ic-lora guide from the first frame till the video end but wanted to experiment with the loop first for a proof of concept.

The use case for such a workflow is common I think.
You have a person talking and gesturing and wanted to capture the audio and body movement to the new character in ltx. I think a way to add a character sheet guide to -1 then cropping it would make sense because if the character rotate his head and do significant movements it won't be consistent from my experience.

Any help is greatly appreciated.

RuneXX

Owner 10 days ago

•

edited 10 days ago

Should be doable.
The "tricky" part is to match the control video input (pose video) to that of the generated video.
But thats basically just using the frames generated at say part one... and then in the extending group, start that pose video from where it ended in previous part.
And do all over again, as in first part, just starting at the correct frame

The long video workflows is basically just generating a video in regular way, then doing it again, and again... and stitching them at the end
The challenge is getting the math right, since there are overlapping frames to continue motion and have consistency. That are then chopped off, when previous video is stitched with next video

What problems are you running into though? ComfyUI errors? or the output video not living up to expectations?
LTX is very sensitive to the pose height and with being correct (but x32 is right). Regular frames it auto adjust internally should the width or height be incompatible, but with pose it just crashes if wrong

RuneXX

Owner 10 days ago

Been meaning to take a new look at those long workflows, they are a bit more messy and complicated than they need to be.
Learned a few things since, that could be applied to make them easier.

Will do as soon as i have a chance ;-)

HokshaBald

10 days ago

The output video has the skeleton itself rather than controlling the character. I have probably connected something wrong.

RuneXX

Owner 10 days ago

•

edited 10 days ago

The output video has the skeleton itself rather than controlling the character. I have probably connected something wrong.

Whole video ? or just parts of it ?
If the skeleton appears at the whole video output, either something is wired wrong (hard to guess where), OR the lora is not applied to the model that is fed to the sampler
The model picker inside the subgraph (probably a GET node connected to the subgraph), should be the model + lora.

And if its just partly skeleton mixed with regular video output, then a LTX crop node is missing, that crops off the guides at the end of the wf

That being said, i had my fair share of struggles with the long video wf making it.
And equally fair share of struggles with the IC-Union control wf. Doing both combined, its a challenge i bet

HokshaBald

10 days ago

It shows most of the first video (outside the iteration) then the skeleton, then back to the character but not really controlled by the skeleton.

RuneXX

Owner 10 days ago

•

edited 10 days ago

It shows most of the first video (outside the iteration) then the skeleton, then back to the character but not really controlled by the skeleton.

Little hard to say, but that sounds a bit like a combo of both

the model link used is probably without the lora connected (the lora completely removes the skeleton in the sampler preview, you should not see it there, if the lora with model is connected)
the 2nd pass does not have a LTX crop guides node to crop off the skeleton before the upscale node (and if its a single pass workflow, the ltx crop guide nodes is before the vae decode)

Edit: looking at your screenshot it does look like you have the crop guides there. But make sure the positive and negative noodles pass from one node to the next. This can be a bit tricky

I might also try make one, when giving the long workflows some love soon.

HokshaBald

10 days ago

Thank you for your response.

I couldn't attach the json workflow, so I uploaded the video because it should have my version of the workflow.

RuneXX

Owner 10 days ago

Will take a look if the workflow is in the meta

HokshaBald

8 days ago

I found the problem.

I didn't connect the ltx_latent_df to "🅛🅣🅧 Add Video IC-LoRA Guide"'s latent_downscale_factor. After connecting it I got an error about latent size not divisible by 2, I changed the resolution of the video to 768x448 to be divisible by 64 and it worked!

RuneXX

Owner 5 days ago

Ah good good.. I forgot about this one. But you figured it out ;-)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment