Huihui GLM 5.1 abliterated optimized to run on a Mac Studio M3 512. Non-abliterated versions here: larger, smaller.

  • This is NOT a faithful recreation of the original GGUF, so much as a "I wonder if..." science project. It worked! But YMMV.
  • Converted from a Q3_K GGUF, with important layers merged with Unsloth's UD_Q3_K_XL to offset quantization loss.
  • Fits into ~360 GB memory, leaving plenty of room to run parallel models (ex: Qwen 3.6 35B).

Usage

# Start server at http://localhost:8080/chat/completions
uvx --from mlx-lm mlx_lm.server \
  --host 127.0.0.1 \
  --port 8080 \
  --model spicyneuron/Huihui-GLM-5.1-abliterated-MLX-3.9bit

Benchmarks

metric 2.9 bit 3.6 bit 3.9 bit abliterated (this model)
bpw 2.906 3.645 3.895
base memory 251.702 315.648 341.770
peak memory (1024/512) 272.358 341.020 364.299
prompt tok/s (1024) 194.216 ± 0.167 190.508 ± 0.880 192.922 ± 0.107
gen tok/s (512) 19.527 ± 0.035 17.873 ± 0.156 18.191 ± 0.062
kl mean 0.268 ± 0.009 0.117 ± 0.004 0.221 ± 0.007
kl p95 0.537 ± 0.009 0.236 ± 0.004 0.468 ± 0.007
perplexity 4.118 ± 0.016 3.945 ± 0.016 4.195 ± 0.024
piqa 0.794 ± 0.009 0.820 ± 0.017 0.826 ± 0.017

Tested on a Mac Studio M3 Ultra with:

mlx_lm.kld --baseline-model path/to/mlx-full-precision
mlx_lm.perplexity --sequence-length 2048 --seed 123
mlx_lm.benchmark --prompt-tokens 1024 --generation-tokens 512 --num-trials 5
mlx_lm.evaluate --tasks piqa --seed 123 --num-shots 0 --limit 500

Note:

  • mlx_lm.kld is approximate, based on top_k not full logits. Here's the code.
  • GLM 5.1 KL divergence calculated against the largest quant I could run locally (~495 GB), so real KL is higher.

Methodology

Created with a custom workflow that:

  1. Compared GGUF quants for similarity
  2. Merged select higher-quant layers
  3. Dequantized to F32
  4. Requantized to MLX
Downloads last month
1,461
Safetensors
Model size
754B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spicyneuron/Huihui-GLM-5.1-abliterated-MLX-3.9bit

Base model

zai-org/GLM-5.1
Quantized
(1)
this model