Can you redece the size from 62 GB to about 35-40 GB range in 4bit or lesser?
Thanks for the amazing job of converting this to a 4bit quantization.
However, I required an even smaller size.
WIll you be kind enough to reduce the size from 62 GB to an even lesser range 35-40 GB file.
Even a 2 bit or 1 bit will do
Thanks.
Well my quant is specifically using Nvidia's hardware accelerated 4-bit floating point format so it can't really go any lower but there have already been some GGUFs made to run the model in integer at lower bit sizes:
bartowski/PrimeIntellect_INTELLECT-3-GGUF
Im new to all these.
Im really confused if I should asking this. Can you share the setup you used or can you divert me to appropriate links where I can run this myself. like converting from full precision to fp4 etc?
I shared my process in this post:
https://huggingface.co/Firworks/MiroThinker-v1.0-30B-nvfp4/discussions/1#69269c6d40ce1d3b1a6ca1cc
I think if you show that to any of the big LLMs they should be able to walk you through converting a model to NVFP4. That said, if you're new to running and working with models you might have better luck learning to do integer quantization to GGUFs as there will be a lot less experimentation needed and more robust tools / tutorials. NVFP4 is still rough around the edges and can need more in depth troubleshooting to get it working.
Thank you so much. I cant thank you enough.