Diffusion Single File
comfyui

LTX 2.3 dev transformers only version issue ?

#7
by Veritsa - opened

Generation crashes when using ltx-2.3-22b-dev_transformer_only_bf16.safetensors
"RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 4096 but got size 3840 for tensor number 1 in the list."
I used the base T2V workflow from comfyui + replaced nodes to use the separate models. 1280x720 resolution at 121 frames.

Did you remember the Dual clip encoder? not sure if its the error, but sounded a bit familiar to when i forgot ;-)

Did you remember the Dual clip encoder? not sure if its the error, but sounded a bit familiar to when i forgot ;-)

I didn't forget the dual clip encoder. The workflow worked when I used the dev fp8 transformer, but crashed with that error message when I used the bf16 version.

Owner

Just tried it to double check, and I'm not getting any errors.

got the same problem, probably a node not updated

got it , i use gemma api encoder (free ltx api ) i had to change the checkpoint to the ltx2.3 one

but got a prblem with vae preview ^^ vae is not updated at this moment right ?

but got a prblem with vae preview ^^ vae is not updated at this moment right ?

yes the tiny vae is not available for ltx-2.3 yet

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

try disabling dynamic vram or smart memory management. all that new vibecoded garbage is usually the issue.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

try disabling dynamic vram or smart memory management. all that new vibecoded garbage is usually the issue.
I had same problem
i used gemma api encoder (free ltx api ) i had to change the checkpoint to the ltx2.3 one,

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

try disabling dynamic vram or smart memory management. all that new vibecoded garbage is usually the issue.
I had same problem
i used gemma api encoder (free ltx api ) i had to change the checkpoint to the ltx2.3 one,

yea that was junk too. it considered an octopus attacking someone as a reason to censor the entire prompt lol. almost everything added this year is vibecoded junk because they can't be bothered to put in real work anymore.

Owner

yea that was junk too. it considered an octopus attacking someone as a reason to censor the entire prompt lol. almost everything added this year is vibecoded junk because they can't be bothered to put in real work anymore.

Kindly take your bullshit elsewhere.

Owner

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

As for bf16... am on a 5090 in combo with 96GB DDR4 RAM would appreciate a recommendation, bf18 transformers only or fp8 input matul?

Owner

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

As for bf16... am on a 5090 in combo with 96GB DDR4 RAM would appreciate a recommendation, bf18 transformers only or fp8 input matul?

Fp8 input_scaled is at least ~20% faster even when not accounting for increased offload, depending on the resolution/frame count the speed difference may grow drastically. But both will work so it's really your choice between speed/quality. At the start with new model it may be wiser to use bf16 just to rule out any quality issues, as switching the models shouldn't need any other workflow modifications so you can just try if the quality is acceptable with fp8 when optimizing the workflow for speed.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

Finally I found where the problem came from! I got rid of all the MultiGPU nodes (and also made sure I did not mistake the LTX2 and LTX2.3 VAEs...) and that did the trick! I'm using a 5090 with a 3090, I usually use the 2nd GPU for VAEs and Text encoders but I guess that config doesn't work anymore with the dynamic VRAM? I've been awake since 8pm from yesterday until now :')

Owner

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

Finally I found where the problem came from! I got rid of all the MultiGPU nodes (and also made sure I did not mistake the LTX2 and LTX2.3 VAEs...) and that did the trick! I'm using a 5090 with a 3090, I usually use the 2nd GPU for VAEs and Text encoders but I guess that config doesn't work anymore with the dynamic VRAM? I've been awake since 8pm from yesterday until now :')

I believe there's continued discussion with the MultiGPU node author about that, I know there's been issues, but I don't know the state of it currently though.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

As for bf16... am on a 5090 in combo with 96GB DDR4 RAM would appreciate a recommendation, bf18 transformers only or fp8 input matul?

Fp8 input_scaled is at least ~20% faster even when not accounting for increased offload, depending on the resolution/frame count the speed difference may grow drastically. But both will work so it's really your choice between speed/quality. At the start with new model it may be wiser to use bf16 just to rule out any quality issues, as switching the models shouldn't need any other workflow modifications so you can just try if the quality is acceptable with fp8 when optimizing the workflow for speed.

Tested fp8 input matul in combo with gemma fp4 mixes vs. transformers only bf16 and my bf 16 gemma norm preserved. I am very confused 161 frames, res 1408x768 and the bf16 combo is 60 it/s faster....the new comfy ui vram management seems to fit very well with the blackwell architecture. I assume longer videos will turn the speed on the fp8/fp4 combo

for the next person ending up here: make sure the "DualCLIPLoader" type is set to 'ltx' and vae loaders are "VAELoader KJ Audio/Video", wtype bf16, easy to miss, for me adjusting these two nodes, solved the issue.

Just tried it to double check, and I'm not getting any errors.

So I've been investigating to find out where the problem comes from and I coudn't really find an answer. But I have two theories: either it comes from the UnetDistorch2MultiGPU node (however like I said before, when I use the dev fp8 version of the transformer it works fine) or it has something to do with ComfyUi's latest update.
I also noticed that my original LTX2 workflow doesn't work anymore after the ComfyUI update. Now I always get an OOM crash when generating a video resolution of 1792x1216 and 226 frames at 25fps, which NEVER happened before and I'm trying to get it to work with the same settings for LTX2.3 (and that's also the max my setup always managed to handle).

With the dynamic VRAM, which is now activate by default, you shouldn't use any other custom memory management such as distorch. It's a new feature that solves a lot of the previous memory management issues these custom solutions have tried to address in general, but as it's lower level it's far more complex too and is bound to have teething issues. Personally I did hundreds of generations yesterday with dynamic VRAM without any major issues even when using bf16 weights on a 4090.

dynamic VRAM for me at least is nothing but disaster. 128GB RAM dual GPU rtx-3080., rtx 5080, i7 nvme etc, first run goes smooth, 10 minutes later i have to restart the pc manually., python 3.12 cuda 13.2, conda env, no error msg, just sys ram gets full and stays full

dynamic VRAM for me at least is nothing but disaster. 128GB RAM dual GPU rtx-3080., rtx 5080, i7 nvme etc, first run goes smooth, 10 minutes later i have to restart the pc manually., python 3.12 cuda 13.2, conda env, no error msg, just sys ram gets full and stays full

Do you have ComfyUI and comfy-aimdo up to date, and are you using natively supported models or something like GGUF? Also any "VRAM clean" nodes, or any custom nodes touching the memory management may mess it up. Or then something specific about the dual GPU setup, I don't have experience with those.

I have to say that I've never, ever, in any circumstance in all of my time using ComfyUI had to restart the PC... or even had a situation where killing the python process wouldn't free it up, definitely not normal or common thing.

dynamic VRAM for me at least is nothing but disaster. 128GB RAM dual GPU rtx-3080., rtx 5080, i7 nvme etc, first run goes smooth, 10 minutes later i have to restart the pc manually., python 3.12 cuda 13.2, conda env, no error msg, just sys ram gets full and stays full

Usually when a memory leaks occurred on a user space application, you only need to restart the application and the system will free any memory allocated by that application upon terminating/killing it.

But if the memory leaks reside on kernel/driver space, you may need to restart the system.

So, may be try using a different graphics driver 🤔 assuming it wasn't because the python process that runs ComfyUI stuck in memory even after you shut down ComfyUI, where you will need to killed the process manually (check your task manager) in order to free the resources used by that process.

dynamic VRAM for me at least is nothing but disaster. 128GB RAM dual GPU rtx-3080., rtx 5080, i7 nvme etc, first run goes smooth, 10 minutes later i have to restart the pc manually., python 3.12 cuda 13.2, conda env, no error msg, just sys ram gets full and stays full

Do you have ComfyUI and comfy-aimdo up to date, and are you using natively supported models or something like GGUF? Also any "VRAM clean" nodes, or any custom nodes touching the memory management may mess it up. Or then something specific about the dual GPU setup, I don't have experience with those.

I have to say that I've never, ever, in any circumstance in all of my time using ComfyUI had to restart the PC... or even had a situation where killing the python process wouldn't free it up, definitely not normal or common thing.

Yes, they're up to date.
No, I am not using GGUF models or any VRAM management/cleaning nodes.
I use your workflows for WAN2.2 specifically, along with the Advanced Diffusion Loader (KJ node) paired with the default ComfyUI templates for LTX, Qwen-Image-Edit, and WAN2.2 FP16.
I always start ComfyUI with these flags:
"python main.py --disable-dynamic-vram --disable-smart-memory"
Without those flags, and when using the default ComfyUI diffusion loader, I run into out-of-system-memory issues.
The previous Ubuntu LTS was smarter , it would kill the Python process about 30 seconds after the device became unresponsive. The 26 version only kills the browser, which frees some memory, but about 10 seconds later the device becomes unresponsive again unless I quickly use Ctrl+C to kill the Python process.
With the flags python main.py --disable-dynamic-vram --disable-smart-memory, everything works fine with no issues.

dynamic VRAM for me at least is nothing but disaster. 128GB RAM dual GPU rtx-3080., rtx 5080, i7 nvme etc, first run goes smooth, 10 minutes later i have to restart the pc manually., python 3.12 cuda 13.2, conda env, no error msg, just sys ram gets full and stays full

Usually when a memory leaks occurred on a user space application, you only need to restart the application and the system will free any memory allocated by that application upon terminating/killing it.

But if the memory leaks reside on kernel/driver space, you may need to restart the system.

So, may be try using a different graphics driver 🤔 assuming it wasn't because the python process that runs ComfyUI stuck in memory even after you shut down ComfyUI, where you will need to killed the process manually (check your task manager) in order to free the resources used by that process.

Yes, memory leaks were my first hunch too. It could be either a firmware issue or some misconfiguration in the OS/ComfyUI.
Checking the task manager is impossible when it happens.
Here’s what happened after I wrote the previous comment: I started ComfyUI without any flags, used the LTX workflow, and tried to render a 5-second video. Suddenly everything froze. The process was still running in the background, but the screen became completely unresponsive. My monitors are connected to the internal GPU (Intel Arc).
If Ubuntu doesn’t kill the Python process or the browser, the only option is a full restart. It seems to occur when ComfyUI tries to offload the model. It could be totally unrelated to ComfyUI itself and might just be a side effect of a misconfiguration elsewhere.
My OS is pretty clean , not like an Arch setup. I only have a few default applications, WebUI for LLMs, and that’s it.
My gut tells me this is Wayland-related. I know it sounds crazy, but I’m going to try installing an X11 display manager, which Ubuntu no longer ships with by default and will see

So it happened on your 1st inference? 🤔 That is strange, especially for someone with 128GB RAM, because on other people it usually happened after a few runs where system memory usage keeps growing on every inference.

The freezes probably because it tried to use memory larger than your RAM, thus fallen to the very slow page file (even with NVME, it will still be much slower than RAM).

This comment has been hidden (marked as Low Quality)

Sign up or log in to comment