spicyneuron commited on
Commit
7b96841
·
verified ·
1 Parent(s): 19d5dd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -32,9 +32,18 @@ MLX quantization options differ than llama.cpp, but the principles are the same:
32
 
33
  # Benchmarks
34
 
35
- (WIP)
36
-
37
- Tested with:
 
 
 
 
 
 
 
 
 
38
 
39
  ```
40
  mlx_lm.perplexity --sequence-length 2048 --seed 123
 
32
 
33
  # Benchmarks
34
 
35
+ metric | baa-ai/GLM-5.1-RAM-270GB-MLX | 2.9bit (this model)
36
+ --- | --- | ---
37
+ bpw | 3.1096 | 2.9064
38
+ peak memory (1024/512) | 291.257 | 272.358
39
+ prompt tok/s (1024) | 194.958 ± 0.075 | 194.216 ± 0.167
40
+ gen tok/s (512) | 21.381 ± 0.050 | 19.527 ± 0.035
41
+ perplexity | 4.780 ± 0.020 | 4.118 ± 0.016
42
+ hellaswag | 0.546 ± 0.011 | 0.59 ± 0.011
43
+ piqa | 0.776 ± 0.01 | 0.794 ± 0.009
44
+ winogrande | 0.668 ± 0.013 | 0.695 ± 0.013
45
+
46
+ Tested on a Mac Studio M3 Ultra with:
47
 
48
  ```
49
  mlx_lm.perplexity --sequence-length 2048 --seed 123