gemma3 4bit support

#4
by ZKong - opened

this space uses 4 bit gemma. very efficient. I tried severial results T2V and I2V both very good.
Can you migrate it to comfyui ? it will be so cool, comfyui seems to have no 4 bit clip support.
https://huggingface.co/spaces/alexnasa/ltx-2-TURBO
unsloth/gemma-3-12b-it-qat-bnb-4bit

For now, this one works (fp8), quite a lot smaller than the default one
https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_fp8_e4m3fn.safetensors

4bit is 1/2 size than fp8. i only use 4 bit or GGUF q4, good enough

For now, this one works (fp8), quite a lot smaller than the default one
https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_fp8_e4m3fn.safetensors

That may be true for some setups, but that's not true across the board. For example, I do not think e4m3fn will work with the Ampere (RTX3090) that I have.

That may be true for some setups, but that's not true across the board. For example, I do not think e4m3fn will work with the Ampere (RTX3090) that I have.

I also have 3090 and works fine ;-) For torch compile / triton you need a later version were support for this was also added for the 3xxx series

fp4 onlys works on 50** card . Good news , we finally got gguf , smaller than int4

ZKong changed discussion status to closed

fp4 onlys works on 50** card . Good news , we finally got gguf , smaller than int4

I have a 4090 and it works fine for me. You need to make sure to install the comfy-kitchen dependency that Comfy added this week. It adds general NVFP4 kernels. The GGUFs still require a PR I believe.

That may be true for some setups, but that's not true across the board. For example, I do not think e4m3fn will work with the Ampere (RTX3090) that I have.

I also have 3090 and works fine ;-) For torch compile / triton you need a later version were support for this was also added for the 3xxx series

How much memory does it use? Do you still get the benefit of reduced vram usage?

How much memory does it use? Do you still get the benefit of reduced vram usage?

Think its a bit of a "software emulation" since hardware support is on later GPU's, but not noticing it being significantly slower than other, or using much more ram.
Most my later models are e4m3fn, and those uploading models also seem to just do e4m3fn by now, since the support goes way back to even 3xxx series

4090 can use fp4? however the fp4 is not 1/2 fp8 size,still very big. gguf much better😄

Sign up or log in to comment