spicyneuron commited on
Commit
713e080
·
verified ·
1 Parent(s): 82affb7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -13
README.md CHANGED
@@ -30,27 +30,32 @@ uvx --from mlx-lm mlx_lm.server \
30
 
31
  # Benchmarks
32
 
33
- metric | baa-ai/GLM-5.1-RAM-270GB-MLX | 2.9bit
34
- --- | --- | ---
35
- bpw | 3.1096 | 2.9064
36
- peak memory (1024/512) | 291.257 | 272.358
37
- prompt tok/s (1024) | 194.958 ± 0.075 | 194.216 ± 0.167
38
- gen tok/s (512) | 21.381 ± 0.050 | 19.527 ± 0.035
39
- perplexity | 4.780 ± 0.020 | 4.118 ± 0.016
40
- hellaswag | 0.546 ± 0.011 | 0.59 ± 0.011
41
- piqa | 0.776 ± 0.01 | 0.794 ± 0.009
42
- winogrande | 0.668 ± 0.013 | 0.695 ± 0.013
 
43
 
44
  Tested on a Mac Studio M3 Ultra with:
45
 
46
  ```
 
47
  mlx_lm.perplexity --sequence-length 2048 --seed 123
48
  mlx_lm.benchmark --prompt-tokens 1024 --generation-tokens 512 --num-trials 5
49
- mlx_lm.evaluate --tasks hellaswag --seed 123 --num-shots 0 --limit 2000
50
- mlx_lm.evaluate --tasks piqa --seed 123 --num-shots 0 --limit 2000
51
- mlx_lm.evaluate --tasks winogrande --seed 123 --num-shots 0 --limit 2000
52
  ```
53
 
 
 
 
 
 
54
  # Methodology
55
 
56
  Quantized with a [mlx-lm fork](https://github.com/ml-explore/mlx-lm/pull/922),
 
30
 
31
  # Benchmarks
32
 
33
+ metric | baa-ai/GLM-5.1-RAM-270GB-MLX | 2.9 bit (this model) | 3.6 bit
34
+ --- | --- | --- | ---
35
+ bpw | 3.110 | 2.906 | 3.645
36
+ base memory | 269.303 | 251.702 | 315.648
37
+ peak memory (1024/512) | 291.257 | 272.358 | 341.020
38
+ prompt tok/s (1024) | 194.958 ± 0.075 | 194.216 ± 0.167 | 190.508 ± 0.880
39
+ gen tok/s (512) | 21.381 ± 0.050 | 19.527 ± 0.035 | 17.873 ± 0.156
40
+ kl mean | 0.686 ± 0.054 | 0.268 ± 0.009 | 0.117 ± 0.004
41
+ kl p95 | 1.478 ± 0.054 | 0.537 ± 0.009 | 0.236 ± 0.004
42
+ perplexity | 4.780 ± 0.020 | 4.118 ± 0.016 | 3.945 ± 0.016
43
+ piqa | 0.776 ± 0.010 | 0.794 ± 0.009 | 0.820 ± 0.017
44
 
45
  Tested on a Mac Studio M3 Ultra with:
46
 
47
  ```
48
+ mlx_lm.kld --baseline-model path/to/mlx-full-precision
49
  mlx_lm.perplexity --sequence-length 2048 --seed 123
50
  mlx_lm.benchmark --prompt-tokens 1024 --generation-tokens 512 --num-trials 5
51
+ mlx_lm.evaluate --tasks piqa --seed 123 --num-shots 0 --limit 500
 
 
52
  ```
53
 
54
+ Note:
55
+
56
+ - `mlx_lm.kld` is approximate, based on `top_k` not full logits. Here's the [code](https://github.com/ml-explore/mlx-lm/pull/1146).
57
+ - GLM 5.1 KL divergence calculated against the largest quant I could run locally (~495 GB), so real KL is higher.
58
+
59
  # Methodology
60
 
61
  Quantized with a [mlx-lm fork](https://github.com/ml-explore/mlx-lm/pull/922),