NeuroSenko
/

MiniMax-M2.7-exl3

Text Generation

Model card Files Files and versions

NeuroSenko commited on about 19 hours ago

Commit

a6e91ea

·

verified ·

1 Parent(s): cd11968

Update README.md

Files changed (1) hide show

README.md +13 -8

README.md CHANGED Viewed

@@ -10,15 +10,20 @@ tags:
   - exl3
 ---
-Quantization was performed using [exllama3 v0.0.29](https://github.com/turboderp-org/exllamav3).
-[2.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/2.0bpw)
-[3.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/3.0bpw)
-[4.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/4.0bpw)
-[5.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/5.0bpw)
-[6.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/6.0bpw)
-[7.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/7.0bpw)
-[8.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/8.0bpw)
 <details>
 <summary>Quantization Notes</summary>

   - exl3
 ---
+Quantization was performed using [exllama3 v0.0.29](https://github.com/turboderp-org/exllamav3) (commit `cb1a436`).
+| Quant | Size (GB) | Actual bpw | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
+|---|---|---|---|---|---|---|---|
+| [2.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/2.0bpw) | 55.14 | 2.00 | 76.0% | 41.8% | 18.5% | 7.1% | 2.5% |
+| [3.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/3.0bpw) | 81.61 | 3.00 | 85.6% | 59.3% | 35.1% | 18.5% | 8.9% |
+| [4.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/4.0bpw) | 108.09 | 4.00 | 90.3% | 70.5% | 49.0% | 31.2% | 18.5% |
+| [5.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/5.0bpw) | 134.56 | 5.00 | 92.9% | 77.5% | 59.1% | 41.7% | 27.7% |
+| [6.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/6.0bpw) | 161.18 | 6.00 | 94.4% | 81.5% | 65.2% | 49.1% | 35.0% |
+| [7.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/7.0bpw) | 187.65 | 7.00 | 94.9% | 83.2% | 68.0% | 52.5% | 38.6% |
+| [8.0bpw](https://huggingface.co/NeuroSenko/MiniMax-M2.7-exl3/tree/8.0bpw) | 214.13 | 8.00 | 95.2% | 84.0% | 69.5% | 54.4% | 40.7% |
+| original | 214.36 | 8.00 | — | — | — | — | — |
+\* Original model produces inf/NaN in layer 61, making PPL and KL divergence non-computable.
 <details>
 <summary>Quantization Notes</summary>