LTX 2.3 dev transformers only version issue ?

#7
by Veritsa - opened

Generation crashes when using ltx-2.3-22b-dev_transformer_only_bf16.safetensors
"RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4096 but got size 3840 for tensor number 1 in the list."
I used the base T2V workflow from comfyui + replaced nodes to use the separate models. 1280x720 resolution at 121 frames.

Did you remember the Dual clip encoder? not sure if its the error, but sounded a bit familiar to when i forgot ;-)

Did you remember the Dual clip encoder? not sure if its the error, but sounded a bit familiar to when i forgot ;-)

I didn't forget the dual clip encoder. The workflow worked when I used the dev fp8 transformer, but crashed with that error message when I used the bf16 version.

Just tried it to double check, and I'm not getting any errors.

got the same problem, probably a node not updated

got it , i use gemma api encoder (free ltx api ) i had to change the checkpoint to the ltx2.3 one

but got a prblem with vae preview ^^ vae is not updated at this moment right ?

but got a prblem with vae preview ^^ vae is not updated at this moment right ?

yes the tiny vae is not available for ltx-2.3 yet

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

try disabling dynamic vram or smart memory management. all that new vibecoded garbage is usually the issue.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

try disabling dynamic vram or smart memory management. all that new vibecoded garbage is usually the issue.
I had same problem
i used gemma api encoder (free ltx api ) i had to change the checkpoint to the ltx2.3 one,

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

try disabling dynamic vram or smart memory management. all that new vibecoded garbage is usually the issue.
I had same problem
i used gemma api encoder (free ltx api ) i had to change the checkpoint to the ltx2.3 one,

yea that was junk too. it considered an octopus attacking someone as a reason to censor the entire prompt lol. almost everything added this year is vibecoded junk because they can't be bothered to put in real work anymore.

yea that was junk too. it considered an octopus attacking someone as a reason to censor the entire prompt lol. almost everything added this year is vibecoded junk because they can't be bothered to put in real work anymore.

Kindly take your bullshit elsewhere.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

As for bf16... am on a 5090 in combo with 96GB DDR4 RAM would appreciate a recommendation, bf18 transformers only or fp8 input matul?

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

As for bf16... am on a 5090 in combo with 96GB DDR4 RAM would appreciate a recommendation, bf18 transformers only or fp8 input matul?

Fp8 input_scaled is at least ~20% faster even when not accounting for increased offload, depending on the resolution/frame count the speed difference may grow drastically. But both will work so it's really your choice between speed/quality. At the start with new model it may be wiser to use bf16 just to rule out any quality issues, as switching the models shouldn't need any other workflow modifications so you can just try if the quality is acceptable with fp8 when optimizing the workflow for speed.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

Finally I found where the problem came from! I got rid of all the MultiGPU nodes (and also made sure I did not mistake the LTX2 and LTX2.3 VAEs...) and that did the trick! I'm using a 5090 with a 3090, I usually use the 2nd GPU for VAEs and Text encoders but I guess that config doesn't work anymore with the dynamic VRAM? I've been awake since 8pm from yesterday until now :')

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

Finally I found where the problem came from! I got rid of all the MultiGPU nodes (and also made sure I did not mistake the LTX2 and LTX2.3 VAEs...) and that did the trick! I'm using a 5090 with a 3090, I usually use the 2nd GPU for VAEs and Text encoders but I guess that config doesn't work anymore with the dynamic VRAM? I've been awake since 8pm from yesterday until now :')

I believe there's continued discussion with the MultiGPU node author about that, I know there's been issues, but I don't know the state of it currently though.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

As for bf16... am on a 5090 in combo with 96GB DDR4 RAM would appreciate a recommendation, bf18 transformers only or fp8 input matul?

Fp8 input_scaled is at least ~20% faster even when not accounting for increased offload, depending on the resolution/frame count the speed difference may grow drastically. But both will work so it's really your choice between speed/quality. At the start with new model it may be wiser to use bf16 just to rule out any quality issues, as switching the models shouldn't need any other workflow modifications so you can just try if the quality is acceptable with fp8 when optimizing the workflow for speed.

Tested fp8 input matul in combo with gemma fp4 mixes vs. transformers only bf16 and my bf 16 gemma norm preserved. I am very confused 161 frames, res 1408x768 and the bf16 combo is 60 it/s faster....the new comfy ui vram management seems to fit very well with the blackwell architecture. I assume longer videos will turn the speed on the fp8/fp4 combo

Sign up or log in to comment