Workflow - I2V & T2V - Dev model with full steps
I2V & T2V - Dev model with full steps
This is a workflow using the Dev model more to its capacity, and potentially higher quality.
With 20++ steps and both spacial 2x and temporal (2x fps - optional) upscaler in same workflow.
And LTX multimodal guider (optional) to allow CFG of audio and video independently.
Multi-functional workflow with :
- Toggle switch for Image-2-Video mode vs Text-2-Video mode
- Toggle switch for single-pass vs 2-pass workflow
- Toggle switch for regular guider vs multimodal guider
- Toggle switch run with added temporal upscaler or not (double fps)
As far as "best samplers and best settings" goes, feel free to experiment ;-)
RuneXX, what's the difference between the regular guider and the multimodal guider (and your labeling of the FML as guider or injection)? And... when it comes to the extend, does the extend just typically add garbled language even if you prompt specific lines? The video is extending, but I'm getting garbage vocals.
RuneXX, what's the difference between the regular guider and the multimodal guider
regular guider is just the normal way you set CFG value (how strong the model should follow your prompt, to simplify...)
And with multimodal guider, you can set different CFG value for audio and video, at ltx case its cfg 3 for video (to allow good movements etc), and cfg 7 for audio to get audio to follow prompt better, and quality better etc (it also has a bunch of other small little tweaks that seems less impactful).
and your labeling of the FML as guider or injection)?
Its two ways of providing image reference with LTX. You can either set a frame image, lets say first frame input image and then the model generate from there.
This is the standard for I2V regular workflows.
You can then use the same method to put an image frame anywhere in the generated video, "injected" into the amount of frames the model will generate.
And by that influence the end result. But since it injects an input image into a frame, it can be a bit rough, and not so polished. Even glitches. If the model struggle to make a logical transition between the input images you have put into the frames to generate.
A guider is more smooth and flexible. It just guides the model, and the model try follow, but is less strict than "injecting" a frame.
So you can get much better result this way, the model has more freedom. The downside is that it might not be 100% accurate first middle last, but very close ;-)
when it comes to the extend, does the extend just typically add garbled language even if you prompt specific lines? The video is extending, but I'm getting garbage vocals.
i did an update to the extend last nite, could be an error. Will take a look again
RuneXX, please tell meI very often have anomalies, like when a person is moving her hands, there's a part of a finger moving very quickly and it disappears after like 2–3 frames or something. Do you have any idea what could be the reason for that? Generation time maybe? Today I tried a 13-sec video and it very often happens during some movement.
I'm using the dev workflow.
Havent tried the dev model too much myself, since its so painfully slow. But a guess would be to add more steps.
(or to be honest, not that slow, but spoiled by the ultra fast distilled model ;-)
Also try the new upscaler model from ltx. That fixes a few artifacts that might show up in the upscale part
https://huggingface.co/Lightricks/LTX-2.3/tree/main
Version 1.1 => ltx-2.3-spatial-upscaler-x2-1.1.safetensors
I'll try the dev model some more myself, and see if i can reproduce.
In theory it could be other things like sampler, but more steps, and higher cfg is probably the cure ;-) Also do try other seeds.. if you run over and over with same seed, the same results may occur.
(and potentially artifacts from the upscaler, that has been fixed with the new model from LTX)
Thanks! More steps just in first pass, right? I had the same issue in your previous "fast" workflow very often. It's seed dependent but it's common in my case, got some lucky seeds without anomalies
Yes only the first part (the 2nd part is locked to 8 step upscaler with "hardcoded" 8 sigma values, from LTX)
Odd, i never saw any artifacts (other than text on screen - now fixed with new upscale model), so i wonder what could be different.
But, that being said, if you prompt for something with high motion and movements, that the model struggle to complete in the time set, or with the low steps sets, doing more steps might help.
But i'll experiment here too, try some more "action packed" prompts, and see where the sweet spot might be ;-)
The "default" i set in the Dev workflow might have been too optimistic. I think I set it to 20 steps.
And LTX recommends up to 40 steps. So perhaps it needs more than 20 (or you can set the distilled lora in that group a little higher, or both)