Two Questions about nvfp4 testing.

#8
by ApacheOne - opened

I have been testing NVFP4 quantization for a few months now. With this I find that only during training do the last block/first block kept at high precision matters.

My first question, has this model been fully quantized to its true possible ability and tested internally to find out unstable output or is this something that has not been tested as last blocks such as model.diffusion_model.transformer_blocks.47 have been kept at high precision?

Also to build off of that because this is a video model has there been tests with maintaining some layers at fp32 instead of bf16 for the NVFP4 mix as other video models benefit from this for NVFP4 such as wan2.2? This would make it possible to maybe overall deceases the size and maintain the stable output if this has been tested with last/first blocks quantized and found unstable output.

I look forward to hear back about this! Thank you!

a bit late, but i've been way into fp4 experiments and things for awhile now.
i doubt any fp4 would every be fully in fp4, they are all mixed. first/last, adaln, proj_in/out, projection, generally needs to remain in higher precision, and even then its best to keep first and last 4 in higher precision.
i have a 11gb that uses fp8 for those layers instead of bf16, and its usable until you add lora's.
and that's been my biggest notice of fp4, it is stable pretty low, until you add any sort of external adjustments to the model with lora's or ic or anything, then it starts to fall apart.

ltx2.3 is pretty resilient to lower bit's, and with a tweaked stack, you can even use lora's at a 11gb base model size, but that requires crazy custom block weight strengths.
is there some loss in quality at these levels, yes.
but its a trade off, either take 20 minutes during sampling by using offload, or run in 3 minutes and clear it up with vsr or something.
and i think for a lot of us, that's the key thing, yes, i can get better quality in 20 to 30 minutes, or get passable quality in 3 to 5 minutes.

Sign up or log in to comment