Kijai/LTXV2_comfy · Why is this happening?

Liquidmind111

Jan 8

At first, i got a woman speaking,

but then all images stay static........

eepos

Jan 8

It seems to me that the model has an easier time with images sourced from actual video, like TV-Shows or movies.

I get those "stills" alot with photographs or AI images. But screenshots from The Sopranos clips, work well almost 100% of the time.

Just early impressions, could be wrong here.

Liquidmind111

Jan 8

It seems to me that the model has an easier time with images sourced from actual video, like TV-Shows or movies.

I get those "stills" alot with photographs or AI images. But screenshots from The Sopranos clips, work well almost 100% of the time.

Just early impressions, could be wrong here.

weird,,,,, you think has something to do with some DENOISE or STRENGHT in the nodes?

Liquidmind111

Jan 8

•

edited Jan 8

It seems to me that the model has an easier time with images sourced from actual video, like TV-Shows or movies.

I get those "stills" alot with photographs or AI images. But screenshots from The Sopranos clips, work well almost 100% of the time.

Just early impressions, could be wrong here.

but the second image i used is a REAL photo and the WORKING image is an AI image, lol

RuneXX

Jan 8

I remember the old LTX someone said adding some fake film grain/noise made things work better.
But if thats a myth or true, i dont know ;-) and if it apply to LTX-2 as well not sure

But i had a few static images too when doing I2V. Probably just try a different seed to fix that

eepos

Jan 8

•

edited Jan 8

It seems to me that the model has an easier time with images sourced from actual video, like TV-Shows or movies.

I get those "stills" alot with photographs or AI images. But screenshots from The Sopranos clips, work well almost 100% of the time.

Just early impressions, could be wrong here.

but the second image i used is a REAL photo and the WORKING image is an AI image, lol

Yes, but like I said, photos and AI images work poorly in my experience. Both of them.

However, this image for example, has never produced a still for me:

I have similar Sopranos screenshots and only one of them has ever produced stills. Success rate has been extremely high with them overall.

Liquidmind111

Jan 8

SOMETIMES it works fine:

sometimes it JUMPS to another person

RuneXX

Jan 8

•

edited Jan 8

what happens if you add this node between your input image and the rest..

Its been ages since i used the "old" ltx, but was some tricks there if i remember right
(And this node above is also present in LTX-2 before the upscale)

The theory being that if the image is too perfect (too hq), the model treats it like a poster. But thats just pure speculations ;-) i dont really know

RuneXX

Jan 8

Jump to a different person was a bit odd though...

Liquidmind111

Jan 8

what happens if you add this node between your input image and the rest..

Its been ages since i used the "old" ltx, but was some tricks there if i remember right
(And this node above is also present in LTX-2 before the upscale)

its connected already, not sure

RuneXX

Jan 8

•

edited Jan 8

its connected already, not sure

yes thats before the upscale part. I meant adding it also at the first input image.
Will try some here too ;-) if i see that static image happen again

Edit:

You are right. The compression of image is already there, for the input image as well. Hmm. Not sure then

Liquidmind111

Jan 8

do any of you guys have the TEXT to video version with kijais node already added?

RuneXX

Jan 8

do any of you guys have the TEXT to video version with kijais node already added?

Huggingface dont allow json attachment it seems, but drop the video into comfy

Liquidmind111

Jan 8

do any of you guys have the TEXT to video version with kijais node already added?

Huggingface dont allow json attachment it seems, but drop the video into comfy

ok thank you

RuneXX

Jan 8

•

edited Jan 8

Will try the LTX-Video workflow too, see if that works ;-) (its a bit different than the native)

RuneXX

Jan 8

KJ nodes on the LTX-Video workflow (a little different than the native workflow).
Tried a few runs, seems to work very well

I2V

Flambo111

Jan 8

The old LTX models I believe were mainly trained on images from actual videos, which included compression noise added by mpeg encoders a lot of the time - so the standing theory I think was that the model learned to associate that kind of compression artifacts and noise with motion.
Thusly was born the "tricks" where you added this type of compression noise/artifacts in different ways.

I assume that this is what that LTXVPreprocess node does or something similar - earlier this kind of effect was also achievable by just pre-processing the loaded image by saving it as a video with h264 codec using https://comfyai.run/documentation/VHS_VideoCombine and then extracting the image back and feeding it into the AI, this worked like 90% of the time to get it to properly animate a picture.

Liquidmind111

Jan 9

GUYS, where do i change the steps??? i cant find it..... i want to try from 8 steps to 12..... any help?

Liquidmind111

Jan 9

KJ nodes on the LTX-Video workflow (a little different than the native workflow).
Tried a few runs, seems to work very well

I2V

hello friend, how do i change the steps?

RuneXX

Jan 9

•

edited Jan 9

hello friend, how do i change the steps?

Think thats in some advanced sigma, "hardcoded"
But if you use the full non distilled workflow, and just change model to distilled and set CFG to 1, you can try using 8 to 12 steps etc. The full model workflow has a step node
Think that might be fine (but i didnt try that yet)

Liquidmind111

Jan 9

•

edited Jan 9

hello friend, how do i change the steps?

Think thats in some advanced sigma, "hardcoded"
But if you use the full non distilled workflow, and just change model to distilled and set CFG to 1, you can try using 8 to 12 steps etc. The full model workflow has a step node
Think that might be fine (but i didnt try that yet)

yeah i removed the sigma node and added the steps node!!! thanks!

Nakamotosatoshi

Jan 9

LTXVEmptyLatentAudio
'VAE' object has no attribute 'latent_frequency_bins'

Nakamotosatoshi

Jan 9

RuneXX

Jan 9

•

edited Jan 9

LTXVEmptyLatentAudio
'VAE' object has no attribute 'latent_frequency_bins'

Looks like you are trying to use GGUF model?
Its not yet fully supported, the GGUF model loader node needs an update.
Either wait a little bit with patience until its out in the public version, or use git to pull the pr

Although you have a VAE error, maybe you connected something wrong with the VAE loaders (if you have already the PR update from the GGUF model loader)
But thats beyond my "pay grade", not sure why, will leave that for the experts ;-)

https://github.com/city96/ComfyUI-GGUF/pull/399

Liquidmind111

Jan 9

hello friend, how do i change the steps?

Think thats in some advanced sigma, "hardcoded"
But if you use the full non distilled workflow, and just change model to distilled and set CFG to 1, you can try using 8 to 12 steps etc. The full model workflow has a step node
Think that might be fine (but i didnt try that yet)

hello mate!!! any idea why i get this result using YOUR video as WF? i ONLY change the model to NON distilled and nothing else was changed and i get these results1!!!

RuneXX

Jan 9

•

edited Jan 9

Not sure about that one, I also get "static" or slow zoom sometimes without the main subject talking. I havent quite figured out why that sometimes happens, or if its just the model, and try other seed

Liquidmind111

Jan 9

Not sure about that one, I also get "static" or slow zoom sometimes without the main subject talking. I havent quite figured out why that sometimes happens, or if its just the model, and try other seed

thanks..... ok have a great day,,, seems i have to wait some more WEEKS so all theses issues are resolved,,,, bummer....

DoggyLovr

Jan 9

Not sure about that one, I also get "static" or slow zoom sometimes without the main subject talking. I havent quite figured out why that sometimes happens, or if its just the model, and try other seed

thanks..... ok have a great day,,, seems i have to wait some more WEEKS so all theses issues are resolved,,,, bummer....

I agree... think we need to wait to see how this evolves because as is, it's not really useful to be honest. Was hoping for Wan 2.5-like capabilities, but AI-Santa did not deliver ;)

Liquidmind111

Jan 10

do any of you guys have the TEXT to video version with kijais node already added?

Huggingface dont allow json attachment it seems, but drop the video into comfy

I MADE A JOKE COMPILATION using your example video, LOL

deadpoolx2223

Jan 11

Hello guys, I2V workflow anyone?

RuneXX

Jan 11

For "static image" try this:
https://huggingface.co/Kijai/LTXV2_comfy/discussions/1#6963cd90bd6f599dc4547fef

(at least it worked for me, not had that issue ever since)

ohBeOne

Jan 16

At first, i got a woman speaking,
but then all images stay static........

Are you preprocessing your images to add compression? Some workflows have the LTXVPreprocess node outputting the image into the LTXVImgToVideoInplace node (or, in my case, into an LTXVAddGuide node). I have a whole story based workflow that is using a series of images that were generated using Qwen Image Edit 2511 - so they're definitely not "raw" video taken from a show. (I won't claim that it looks 'great', but there is motion.) I've cobbled together several different workflows, and I am able to generate a series of first to last frame videos. (Based upon a Long I2V workflow created by BenjisAIPlayground on YouTube). I'm using a distilled GGUF Q6, I haven't had very good luck using the dev version. (RTX 3090, generating 241 frames @ 24fps, per segment, 832x468).

Liquidmind111

Jan 19

At first, i got a woman speaking,
but then all images stay static........

Are you preprocessing your images to add compression? Some workflows have the LTXVPreprocess node outputting the image into the LTXVImgToVideoInplace node (or, in my case, into an LTXVAddGuide node). I have a whole story based workflow that is using a series of images that were generated using Qwen Image Edit 2511 - so they're definitely not "raw" video taken from a show. (I won't claim that it looks 'great', but there is motion.) I've cobbled together several different workflows, and I am able to generate a series of first to last frame videos. (Based upon a Long I2V workflow created by BenjisAIPlayground on YouTube). I'm using a distilled GGUF Q6, I haven't had very good luck using the dev version. (RTX 3090, generating 241 frames @ 24fps, per segment, 832x468).

its all fine for now... but question, what do you meant about the compression? is it good or bad or neither?

RuneXX

Jan 19

•

edited Jan 19

its all fine for now... but question, what do you meant about the compression? is it good or bad or neither?

Just something LTX seems to need. It was also there in their past video models.
The common theory is that the model being trained on video footage, this compression helps the model to not end up with a static image, but see your input image as a video frame.. . if that theory holds water, i dont know ;-)