GLM-5.2-MLX-mixed-3_6bit

MLX (Apple Silicon) conversion of zai-org/GLM-5.2 — a glm_moe_dsa MoE (256 experts, DeepSeek-V3.2-style sparse attention) — quantized to mixed.

Quantizations

Part of the GLM-5.2 MLX collection.

Variant Notes
8-bit 8-bit · ~800GB · needs ~1TB RAM · integrity-checked
6-bit 6-bit · ~625GB · needs ~768GB RAM · integrity-checked
5-bit 5-bit · ~530GB · needs ~640GB RAM · integrity-checked
4-bit 4-bit · ~430GB · tight on 512GB · smoke-tested
mixed (this repo) mixed · experts@3-bit / non-expert@6-bit · ~360GB · 512GB-fit · smoke-tested

Use with mlx-lm

pip install mlx-lm
python -m mlx_lm generate --model pipenetwork/GLM-5.2-MLX-mixed-3_6bit --prompt "Hello" -m 256

Validation

Smoke-tested locally (loads + generates coherent text).

License

MIT (inherited from base). Quantization config (excerpt): {"group_size": 64, "bits": 6, "mode": "affine", "model.embed_tokens": {"group_size": 64, "bits": 6}, "model.layers.0.self_attn.q_a_proj": {"group_size": 64, "bits": 6}, "model.layers.0.self_attn.q_b_p.

Downloads last month
2,760
Safetensors
Model size
743B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pipenetwork/GLM-5.2-MLX-mixed-3_6bit

Base model

zai-org/GLM-5.2
Quantized
(29)
this model

Collection including pipenetwork/GLM-5.2-MLX-mixed-3_6bit