Update README.md
Browse files
README.md
CHANGED
|
@@ -10,15 +10,20 @@ tags:
|
|
| 10 |
- exl3
|
| 11 |
---
|
| 12 |
|
| 13 |
-
Quantization was performed using [exllama3 v0.0.29](https://github.com/turboderp-org/exllamav3).
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
[
|
| 18 |
-
[
|
| 19 |
-
[
|
| 20 |
-
[
|
| 21 |
-
[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
<details>
|
| 24 |
<summary>Quantization Notes</summary>
|
|
|
|
| 10 |
- exl3
|
| 11 |
---
|
| 12 |
|
| 13 |
+
Quantization was performed using [exllama3 v0.0.29](https://github.com/turboderp-org/exllamav3) (commit `cb1a436`).
|
| 14 |
|
| 15 |
+
| Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 16 |
+
|---|---|---|---|---|---|---|---|
|
| 17 |
+
| [2.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/2.0bpw) | 55.14 | 2.00 | 76.0% | 41.8% | 18.5% | 7.1% | 2.5% |
|
| 18 |
+
| [3.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/3.0bpw) | 81.61 | 3.00 | 85.6% | 59.3% | 35.1% | 18.5% | 8.9% |
|
| 19 |
+
| [4.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/4.0bpw) | 108.09 | 4.00 | 90.3% | 70.5% | 49.0% | 31.2% | 18.5% |
|
| 20 |
+
| [5.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/5.0bpw) | 134.56 | 5.00 | 92.9% | 77.5% | 59.1% | 41.7% | 27.7% |
|
| 21 |
+
| [6.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/6.0bpw) | 161.18 | 6.00 | 94.4% | 81.5% | 65.2% | 49.1% | 35.0% |
|
| 22 |
+
| [7.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/7.0bpw) | 187.65 | 7.00 | 94.9% | 83.2% | 68.0% | 52.5% | 38.6% |
|
| 23 |
+
| [8.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/8.0bpw) | 214.13 | 8.00 | 95.2% | 84.0% | 69.5% | 54.4% | 40.7% |
|
| 24 |
+
| original | 214.36 | 8.00 | — | — | — | — | — |
|
| 25 |
+
|
| 26 |
+
\* Original model produces inf/NaN in layer 61, making PPL and KL divergence non-computable.
|
| 27 |
|
| 28 |
<details>
|
| 29 |
<summary>Quantization Notes</summary>
|