Quantized Variants Now Available for FP8 and W8A16
#60
by
Geodd
- opened
Weโve published two quantised variants of GLM-4.7-Flash for the community, optimized for different NVIDIA GPU generations:
FP8 (Hopper-class GPUs)
๐ https://huggingface.co/Geodd/GLM-4.7-Flash-FP8
W8A16 (Ampere-class GPUs)
๐ https://huggingface.co/Geodd/GLM-4.7-Flash-W8A16