FP4 - Special Node to Load It?

#43

Jan 23

I'm trying to figure out how to get this fp4 working properly. I do have a 50 series, with everything up to date but I think the reason the generations are so slow are because the model loaders don't support fp4. Manager isn't really helping find anything either.

Kijai

Owner Jan 23

It doesn't need a new loader, but you do need new enough torch version compiled with cuda 13.0 support.

aaltomar

Jan 23

Torch 2.9.1. and cu130 installed here but I'm getting mat multiplication errors:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5000x4096 and 2048x4096)
ERROR lora diffusion_model.transformer_blocks.20.attn1.to_v.weight shape '[4096, 2048]' is invalid for input of size 16777216
etc

I'm trying to load the official repo 20GB "ltx-2-19b-dev-fp4.safetensors" in a workflow where I've been using GGUFs'.
Gemini suggests to download the "projections" file...?

Best Fix:
Use the ltx-2-19b-dev-fp4_projections_only.safetensors with a DualClipLoader (Gemma in slot 1, Projections in slot 2) to ensure the text embeddings are correctly resized before hitting the transformer.

ScramboSplinergy

Jan 23

I thought for sure not having an fp4 model for model type in the model loader had something to do with it, but double checking the dependencies

ScramboSplinergy

Jan 23

It doesn't need a new loader, but you do need new enough torch version compiled with cuda 13.0 support.

I have the 2.9.1 and cu130 and something just isn't right. The generations are absurdly slow. The fp8 model is approximately 30 times faster than this. I'm no expert, the fp4 thing was all I could think of.

Sometimes certain Lora models slow down the fp8, haven't tried without any LoRa yet but I'm basically shooting in the dark

aaltomar

Jan 23

•

edited Jan 23

I have it working now, I tried using the full 20GB LTX-2 FP4 safetensors file but ran into those mat* errors. With transformer only from Kijai's repo it now works.
Here are my model loaders. I'm running 5070 Ti, 64GB, Torch 2.9.1 + cu 130, Triton 3.5x, sage attention etc.. (although sage att. doesn't work with fp4)

121 frames, prompt executed in 62.25 seconds

ScramboSplinergy

Jan 23

I have it working now, I tried using the full 20GB LTX-2 FP4 safetensors file but ran into those mat* errors. With transformer only from Kijai's repo it now works.

Yeah I have no idea what to say. Mine still works like dogshit but I know it has everything to do with me missing something. I switched to the ComfyUI app and now I have to go through the whole triton and sage attention installs to see if somehow missing those has something to do with it but I don't think either are even enabled in my workflow.

holycowdude

Feb 26

@ScramboSplinergy
ComfyUI-Easy-Install makes installing Triton & Sage really easy, give it a go:
https://github.com/Tavris1/ComfyUI-Easy-Install

Conleoni

Mar 9

load diffusion model on default doesn't use fp4. it falls back to bf16 on the backend, or am I crazy?

anr2me

Mar 9

load diffusion model on default doesn't use fp4. it falls back to bf16 on the backend, or am I crazy?

What kind of GPU did you have?

Currently only RTX 50-series that natively support FP4, older architectures that doesn't natively support FP4 will fallback to a larger type (8/16 bits).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment