Quantized Variants Now Available for FP8 and W8A16

#60
by Geodd - opened

Weโ€™ve published two quantised variants of GLM-4.7-Flash for the community, optimized for different NVIDIA GPU generations:

FP8 (Hopper-class GPUs)
๐Ÿ‘‰ https://huggingface.co/Geodd/GLM-4.7-Flash-FP8

W8A16 (Ampere-class GPUs)
๐Ÿ‘‰ https://huggingface.co/Geodd/GLM-4.7-Flash-W8A16

Sign up or log in to comment