Warning: This model has severe quality issues. Use it only for research purpose. You can find verified quantized models of good quality here.

This is zai-org/GLM-4.7-Flash quantized with AutoRound to mixed-precision with a target at 3.5 bits per weight. The model is compatible with vLLM (tested: v0.15.0). Tested with an RTX Pro 6000 WK.

Downloads last month
26
Safetensors
Model size
5B params
Tensor type
I32
F16
F32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for kaitchup/GLM-4.7-Flash-autoround-3.5bpw

Quantized
(68)
this model