Please release a model with native 4-bit quantization

#4
by calycekr - opened

Since Kimi currently provides an INT4 quantized model, could you deliver a model that is natively quantized to 4-bit precision?

For practical deployment scenarios, many people rely on native 4-bit models for memory and throughput efficiency. Given that Kimi-K2.5 already ships an INT4 variant, it would be helpful if GLM-5 also offered a natively quantized 4-bit model so benchmarks can be compared under equivalent conditions.

Agreed. This model now no longer fits within 8xH100 and 4xH200 setups at FP8

Indeed, native 4-bit QAT (like in case of Kimi) I think provide the best quality and size ratio, better than post-training quantization. Great release nonetheless! I guess I will have to wait a bit for 4-bit quants to be able to run on my hardware.

Would love this in nvfp4

Sign up or log in to comment