GLM
Collection
Baa.ai quantized versions of GLM models • 4 items • Updated
Mixed-precision quantized version of zai-org/GLM-4.7-Flash optimised by baa.ai using a proprietary Black Sheep AI method.
Per-tensor bit-width allocation via advanced sensitivity analysis and budget-constrained optimisation — no calibration data required.
| Metric | Value |
|---|---|
| Size | 19 GB |
| Average bits | 5.1 |
| WikiText-2 PPL (median) | 8.7520 |
| PPL vs BF16 | +10.5% |
| MMLU vs BF16 | 107.3% of BF16 |
from mlx_lm import load, generate
model, tokenizer = load("baa-ai/GLM-4.7-Flash-RAM-20GB-MLX")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)
Quantized by baa.ai
4-bit
Base model
zai-org/GLM-4.7-Flash