gemma3 4bit support

by ZKong - opened Jan 9

Jan 9

•

this space uses 4 bit gemma. very efficient. I tried severial results T2V and I2V both very good.
Can you migrate it to comfyui ? it will be so cool, comfyui seems to have no 4 bit clip support.
https://huggingface.co/spaces/alexnasa/ltx-2-TURBO
unsloth/gemma-3-12b-it-qat-bnb-4bit

RuneXX

Jan 9

For now, this one works (fp8), quite a lot smaller than the default one
https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_fp8_e4m3fn.safetensors

ZKong

Jan 9

•

edited Jan 9

4bit is 1/2 size than fp8. i only use 4 bit or GGUF q4, good enough

ohBeOne

Jan 10

•

edited Jan 10

For now, this one works (fp8), quite a lot smaller than the default one
https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_fp8_e4m3fn.safetensors

That may be true for some setups, but that's not true across the board. For example, I do not think e4m3fn will work with the Ampere (RTX3090) that I have.

samgreen

Jan 10

https://huggingface.co/GitMylo/LTX-2-comfy_gemma_fp8_e4m3fn/blob/main/gemma_3_12B_it_nvfp4_uncalibrated.safetensors

RuneXX

Jan 10

https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp4_mixed.safetensors

RuneXX

Jan 10

That may be true for some setups, but that's not true across the board. For example, I do not think e4m3fn will work with the Ampere (RTX3090) that I have.

I also have 3090 and works fine ;-) For torch compile / triton you need a later version were support for this was also added for the 3xxx series

ZKong

Jan 10

fp4 onlys works on 50** card . Good news , we finally got gguf , smaller than int4

ZKong changed discussion status to closed Jan 10

samgreen

Jan 10

fp4 onlys works on 50** card . Good news , we finally got gguf , smaller than int4

I have a 4090 and it works fine for me. You need to make sure to install the comfy-kitchen dependency that Comfy added this week. It adds general NVFP4 kernels. The GGUFs still require a PR I believe.

ohBeOne

Jan 10

That may be true for some setups, but that's not true across the board. For example, I do not think e4m3fn will work with the Ampere (RTX3090) that I have.

I also have 3090 and works fine ;-) For torch compile / triton you need a later version were support for this was also added for the 3xxx series

How much memory does it use? Do you still get the benefit of reduced vram usage?

RuneXX

Jan 10

How much memory does it use? Do you still get the benefit of reduced vram usage?

Think its a bit of a "software emulation" since hardware support is on later GPU's, but not noticing it being significantly slower than other, or using much more ram.
Most my later models are e4m3fn, and those uploading models also seem to just do e4m3fn by now, since the support goes way back to even 3xxx series

ZKong

Jan 10

4090 can use fp4? however the fp4 is not 1/2 fp8 size,still very big. gguf much better😄

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment