Workflow - I2V & T2V - Dev model with full steps

#38

by RuneXX - opened Mar 17

Owner Mar 17

I2V & T2V - Dev model with full steps

This is a workflow using the Dev model more to its capacity, and potentially higher quality.
With 20++ steps and both spacial 2x and temporal (2x fps - optional) upscaler in same workflow.
And LTX multimodal guider (optional) to allow CFG of audio and video independently.

Multi-functional workflow with :

Toggle switch for Image-2-Video mode vs Text-2-Video mode
Toggle switch for single-pass vs 2-pass workflow
Toggle switch for regular guider vs multimodal guider
Toggle switch run with added temporal upscaler or not (double fps)

As far as "best samplers and best settings" goes, feel free to experiment ;-)

gtufaro

Mar 17

RuneXX, what's the difference between the regular guider and the multimodal guider (and your labeling of the FML as guider or injection)? And... when it comes to the extend, does the extend just typically add garbled language even if you prompt specific lines? The video is extending, but I'm getting garbage vocals.

RuneXX

Owner Mar 17

•

edited Mar 17

RuneXX, what's the difference between the regular guider and the multimodal guider

regular guider is just the normal way you set CFG value (how strong the model should follow your prompt, to simplify...)
And with multimodal guider, you can set different CFG value for audio and video, at ltx case its cfg 3 for video (to allow good movements etc), and cfg 7 for audio to get audio to follow prompt better, and quality better etc (it also has a bunch of other small little tweaks that seems less impactful).

and your labeling of the FML as guider or injection)?

Its two ways of providing image reference with LTX. You can either set a frame image, lets say first frame input image and then the model generate from there.
This is the standard for I2V regular workflows.

You can then use the same method to put an image frame anywhere in the generated video, "injected" into the amount of frames the model will generate.
And by that influence the end result. But since it injects an input image into a frame, it can be a bit rough, and not so polished. Even glitches. If the model struggle to make a logical transition between the input images you have put into the frames to generate.

A guider is more smooth and flexible. It just guides the model, and the model try follow, but is less strict than "injecting" a frame.
So you can get much better result this way, the model has more freedom. The downside is that it might not be 100% accurate first middle last, but very close ;-)

when it comes to the extend, does the extend just typically add garbled language even if you prompt specific lines? The video is extending, but I'm getting garbage vocals.

i did an update to the extend last nite, could be an error. Will take a look again

szczypen

Mar 18

RuneXX, please tell meI very often have anomalies, like when a person is moving her hands, there's a part of a finger moving very quickly and it disappears after like 2–3 frames or something. Do you have any idea what could be the reason for that? Generation time maybe? Today I tried a 13-sec video and it very often happens during some movement.
I'm using the dev workflow.

RuneXX

Owner Mar 18

•

edited Mar 18

Havent tried the dev model too much myself, since its so painfully slow. But a guess would be to add more steps.
(or to be honest, not that slow, but spoiled by the ultra fast distilled model ;-)

Also try the new upscaler model from ltx. That fixes a few artifacts that might show up in the upscale part

https://huggingface.co/Lightricks/LTX-2.3/tree/main
Version 1.1 => ltx-2.3-spatial-upscaler-x2-1.1.safetensors

I'll try the dev model some more myself, and see if i can reproduce.
In theory it could be other things like sampler, but more steps, and higher cfg is probably the cure ;-) Also do try other seeds.. if you run over and over with same seed, the same results may occur.
(and potentially artifacts from the upscaler, that has been fixed with the new model from LTX)

szczypen

Mar 18

Thanks! More steps just in first pass, right? I had the same issue in your previous "fast" workflow very often. It's seed dependent but it's common in my case, got some lucky seeds without anomalies

RuneXX

Owner Mar 18

•

edited Mar 18

Yes only the first part (the 2nd part is locked to 8 step upscaler with "hardcoded" 8 sigma values, from LTX)

Odd, i never saw any artifacts (other than text on screen - now fixed with new upscale model), so i wonder what could be different.
But, that being said, if you prompt for something with high motion and movements, that the model struggle to complete in the time set, or with the low steps sets, doing more steps might help.

But i'll experiment here too, try some more "action packed" prompts, and see where the sweet spot might be ;-)

The "default" i set in the Dev workflow might have been too optimistic. I think I set it to 20 steps.
And LTX recommends up to 40 steps. So perhaps it needs more than 20 (or you can set the distilled lora in that group a little higher, or both)

tooooop

Mar 29

This workflow is fantastic! Is there a trick to getting the Temporal Upscale to work? I have a 3090 / 128 GB RAM and am getting OOM whenever this is turned on (even at small res and short duration). The MultiModal guider produces really great results. Doing some experimentation to squeeze out the best possible quality from the full weight dev model :)

RuneXX

Owner Mar 29

I struggled a bit with the temporal myself as well, will take a look and see if its anything that can help reduce the needed memory

tooooop

Mar 29

I struggled a bit with the temporal myself as well, will take a look and see if its anything that can help reduce the needed memory

Cool, thanks! Btw did you land on euler_ancestral_cfg_pp / euler_cfg_pp for stage 1/2? Doing some experimentation right now with different samplers I know there’s a lot of subjective opinions though.

RuneXX

Owner Mar 30

Thats the thing, quite subjective i guess.

But all in all i think the euler ones does a good job at good speed. euler_ancestral_cfg_pp / euler_cfg_pp seems to be the more "ltx official" ones and works great, while euler_ancestral / euler seems ok as well.
And Comfy sometimes go for euler_ancestral / gradient_estimation. And to complicate more, LTX recommend "res_2s" for dev model with lower steps.

Often the differences are so marginal, that i just settle for speed with decent quality (aka euler), instead of the slower res_2s (that can sometimes give a bit "overbaked" looks)

But what is best, i am sure there is a 100 post thread about it on Reddit, and everyone disagree having their own favorite combo ;-) hehe

AmatoConsultingAI

29 days ago

Hi. Incredible video quality! But I can’t get the video to work with audio. I’m running the whole workflow unchanged as you made it. When it goes to the final output, it hits 100% and then produces an ffmpeg failure error. I asked Codex what was happening and it said the audio pipeline is feeding a bunch of null inputs throughout, so there’s no good audio file getting into the final output, crashing the workflow.

I’m new to ComfyUI and this has me stumped. Any help would be greatly appreciated.

RuneXX

Owner 29 days ago

Try update ComfyUI AND KJNodes.
Both had some audio vae updates today, maybe its related to that

AmatoConsultingAI

28 days ago

That might be it. I reverted to a previous ComfyUI version and now the audio's working. Weird how I can spend hours poking around on my own to no avail and then your two-sentence comment helps me figure out exactly what I need to do! Lol. That's why we come to you! Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment