Can you redece the size from 62 GB to about 35-40 GB range in 4bit or lesser?

by Prompt48 - opened Nov 28, 2025

Nov 28, 2025

Thanks for the amazing job of converting this to a 4bit quantization.
However, I required an even smaller size.
WIll you be kind enough to reduce the size from 62 GB to an even lesser range 35-40 GB file.
Even a 2 bit or 1 bit will do
Thanks.

Firworks

Owner Nov 28, 2025

Well my quant is specifically using Nvidia's hardware accelerated 4-bit floating point format so it can't really go any lower but there have already been some GGUFs made to run the model in integer at lower bit sizes:
bartowski/PrimeIntellect_INTELLECT-3-GGUF

Prompt48

Nov 28, 2025

Im new to all these.
Im really confused if I should asking this. Can you share the setup you used or can you divert me to appropriate links where I can run this myself. like converting from full precision to fp4 etc?

Firworks

Owner Nov 28, 2025

I shared my process in this post:
https://huggingface.co/Firworks/MiroThinker-v1.0-30B-nvfp4/discussions/1#69269c6d40ce1d3b1a6ca1cc

I think if you show that to any of the big LLMs they should be able to walk you through converting a model to NVFP4. That said, if you're new to running and working with models you might have better luck learning to do integer quantization to GGUFs as there will be a lot less experimentation needed and more robust tools / tutorials. NVFP4 is still rough around the edges and can need more in depth troubleshooting to get it working.

Prompt48

Nov 29, 2025

Thank you so much. I cant thank you enough.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment