wow new workflow

#21

by lanranjun - opened Feb 25

Feb 25

I noticed that you are using the fp4 version of Gemma and the ltx2 Nag node. I would like to inquire whether the ltx2 Nag node is effective for negative prompts during the first sampling, and whether the cfg setting is set to 1.0. Additionally, I used the lanzcos node locally for the resize image v2 node and removed the ltxpreprocss node, which affects the quality of the reference image. Furthermore, in the upscaling node, I scaled the reference image to a size of 1536 for reference。
Below is the video demonstration I recently created：
https://space.bilibili.com/60182580
or
https://www.youtube.com/@lanranjun

RuneXX

Owner Feb 25

Yes ComfyUI made some recent changes, that "break" the old way of the model loaders.
So the workflows needed to be updated to reflect the new way of loading the models (this ComfyUI change seems like it has temporary been reverted to allow users more time to change their workflows).

Checked out your videos, the look really good ;-)

The NAG node is made for the specific scenario where CFG is set to 1. To allow negative prompt to have impact even if CFG is 1 (and without NAG negative prompt would be ignored).
The "default" workflow is set to use distilled LTX-2 model with CFG 1, and low steps. So in this scenario the NAG node allows negative prompt.
I did notice the NAG sometimes do impact the result a bit, for the worse. So I think I will disable it in the workflow, and leave it optional.

Alternatively you could of course run the dev model instead, with more steps at the 1st sampler. And higher CFG. Will give even better results, but does take a lot longer time.

The fp4 gemma was just a "default" to let low ram users have a chance, if you can run fp8 gemma, and fp8 main model (or even higher or gguf), the results will be better.

And yes, The ltx preprocess node is not strictly needed. It seems to help make movements in the rendered video (instead of a still image that just "slide"), as far as I can tell.
This was also the same for earlier versions of LTX video. Basically adding a bit of compression so that it appears more as a video still than a high quality image, I suppose.
But if you get movements without it, all good. Its not a crucial node it seems ;-)

For the reference image to a size of 1536 at the 2nd sampler is one that i really dont know why ;-) but thats what LTX-2 have in their default workflows. So could be that its a sweet spot.
I seen others claim it works better if its set to the widest pixels of your input image (although I have not seen that make any difference myself)
I left it at 1536, since thats what LTX-2 themselves have ;-)

I'll probably update my new workflows a bit, they were a bit rushed out to let users be able to use LTX-2 with the new model loader logic (due to recent ComfyUI changes).

lanranjun

Feb 26

Have you ever used this format to write prompts when generating videos
0s-2s:
2s-4s:
Control the content of the video in each time interval

RuneXX

Owner Feb 26

•

edited Feb 26

Yes i tried only a few times to prompt with timestamps.
It seems to work. Been meaning to try that out a bit more, since that gives a bit of accurate control ;-)

Also seems to accept things like scene cut , scene change ... but haven't experimented much with this yet ;-)

lanranjun

Feb 27

I've tried it a few times, but it doesn't work very well. It tends to produce parallel execution effects. Perhaps some prompt words are needed to indicate the order of precedence。
I haven't studied scene transitions. I usually work with images to generate videos, so scene transitions require segmented video generation。
I added a Nag node based on your workflow, but the issue of generating videos with subtitles from images and audio still persists, and it only resolves after multiple executions.
Additionally, I tried replacing Gemma with FP4 format alone, The ltx model use 19b-fp8-transforms-only and I felt that the generated results were similar, but it saved around 10% or more of RAM and VRAM

lanranjun

Feb 28

Hello, after upgrading to the new version of the workflow, I found that the generated videos were not as clear as before and were a bit blurry. Using this model combination, Comfyui was upgraded to version 0.15.1, and the kjnodes node was also upgraded to the latest version

RuneXX

Owner Mar 3

Not that I noticed, but will look closer, if there is a quality loss.

MattHVisual

Mar 3

•

edited Mar 3

The latest single pass workflows have LTXV-13B-0.9.8 VAE instead of the of the LTX2 bf16 vae. I assume that's a mistake?

Is the Single Pass worklow the same, just minus the 2nd pass detailer, and upscaler? I can't spot any differences other than using LCM instead of Euler, can you confirm please? I'm trying hard to master the LTX2 workflows, and surprised how many methods about this workflow are different than WAN 2.2. Have you been able to stack controlnets in series?

Workflows are very helpful, thank you.

RuneXX

Owner Mar 3

LTX2 bf16 vae for sure ;-) that was a little slip (although it still works)
Will update so it shows that.

Single pass is exactly that.. one sampler pass, without the 2nd pass upscale. So you then render it full size 1 time, 1 sampler (instead of half size 1st pass, and then full size 2nd pass).
Its a bit slower, but some prefer. A bit more "traditional", Wan 2.1 alike ;-)

And yes, LTX-2 is quite a different kind of beast, almost a jack-of-all-trades. It can do all sorts of things, natively.
And even the prompting is quite different. Focus on sequence of action, and less subject/scene description.

Havent tried stacking controlnets, but I wouldn't be surprised if it works ;-)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment