Add MMLU-200 (93.5%), speed/memory benchmarks, fix variants table sizes (47→56 GB), expand topic tags

#1
by dealignai - opened
Files changed (1) hide show
  1. README.md +22 -4
README.md CHANGED
@@ -14,6 +14,12 @@ tags:
14
  - minimax-m2
15
  - moe
16
  - apple-silicon
 
 
 
 
 
 
17
  pipeline_tag: text-generation
18
  base_model: MiniMaxAI/MiniMax-M2.7
19
  base_model_relation: quantized
@@ -39,6 +45,18 @@ JANGTQ_K** quantization in JANGTQ-PRESTACK layout.
39
  - **Bundle size:** **~74 GB on-disk** (~3-bit avg routed)
40
  - **Runs on:** M3 Max 96 GB+ / M4 Max 128 GB / M5 Max 128 GB / Mac Studio
41
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## Why mixed-bit?
43
 
44
  `down_proj`'s output enters the residual stream and accumulates across
@@ -49,10 +67,10 @@ quality close to full-4-bit (~115 GB) at **64% the size**.
49
 
50
  ## Variants in the MiniMax-M2.7 line
51
 
52
- | Variant | Routed bits (avg) | Bundle size | Use case |
53
- |---|---|---|---|
54
- | `MiniMax-M2.7-JANGTQ` | 2-bit | 47 GB | smallest, best for tight RAM |
55
- | **`MiniMax-M2.7-JANGTQ_K` (this)** | **~3-bit (mixed 2/4)** | **74 GB** | **quality close to 4-bit at 2-bit-ish size** |
56
 
57
  ## Loading
58
 
 
14
  - minimax-m2
15
  - moe
16
  - apple-silicon
17
+ - text-generation
18
+ - conversational
19
+ - reasoning
20
+ - chain-of-thought
21
+ - quantization
22
+ - 230b
23
  pipeline_tag: text-generation
24
  base_model: MiniMaxAI/MiniMax-M2.7
25
  base_model_relation: quantized
 
45
  - **Bundle size:** **~74 GB on-disk** (~3-bit avg routed)
46
  - **Runs on:** M3 Max 96 GB+ / M4 Max 128 GB / M5 Max 128 GB / Mac Studio
47
 
48
+ ## Benchmarks
49
+
50
+ | Metric | Value | Setup |
51
+ |---|---|---|
52
+ | **MMLU-200** | **93.5%** (187/200) | thinking ON, `q_per_subject=20`, 10 subjects |
53
+ | Median speed | ~37 tok/s | M4 Max 128 GB, MLX 0.31 |
54
+ | GPU memory at load | ~75 GB | warm |
55
+
56
+ MMLU eval used the standard `mmlu_jangtq_resume.py` runner with the model's
57
+ default chat template (`enable_thinking` undefined → thinking ON, which the
58
+ M2.7 template auto-opens with `<think>\n` after the assistant prefix).
59
+
60
  ## Why mixed-bit?
61
 
62
  `down_proj`'s output enters the residual stream and accumulates across
 
67
 
68
  ## Variants in the MiniMax-M2.7 line
69
 
70
+ | Variant | Routed bits (avg) | Size | MMLU-200 | Use case |
71
+ |---|---|---|---|---|
72
+ | [`MiniMax-M2.7-JANGTQ`](https://huggingface.co/OsaurusAI/MiniMax-M2.7-JANGTQ) | 2-bit | 56 GB | 91.5% | smallest, best for tight RAM |
73
+ | **`MiniMax-M2.7-JANGTQ_K` (this)** | **~3-bit (mixed 2/4)** | **74 GB** | **93.5%** | **+2.0pp MMLU vs JANGTQ for +18 GB** |
74
 
75
  ## Loading
76