Update README.md
Browse files
README.md
CHANGED
|
@@ -15,12 +15,14 @@ tags:
|
|
| 15 |
---
|
| 16 |
Quantization was performed using [exllamav3 v0.0.28](https://github.com/turboderp-org/exllamav3) (commit `ea87af6`).
|
| 17 |
|
| 18 |
-
| Quant | Size (GB) | KL-div (
|
| 19 |
-
|---|---|---|---|---|---|---|---|---|---|
|
| 20 |
-
| [4.0bpw](https://huggingface.co/NeuroSenko/ToriiGate-0.5-exl3/tree/4.0bpw) | 4 |
|
| 21 |
-
| [
|
| 22 |
-
| [
|
| 23 |
-
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
### Metrics
|
| 26 |
- **PPL** (Perplexity) — how well the model predicts the next token. Lower is better. The original model's PPL is the baseline.
|
|
|
|
| 15 |
---
|
| 16 |
Quantization was performed using [exllamav3 v0.0.28](https://github.com/turboderp-org/exllamav3) (commit `ea87af6`).
|
| 17 |
|
| 18 |
+
| Quant | Size (GB) | Actual bpw | PPL | KL-div (q→o) | KL-div (o→q) | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 19 |
+
|---|---|---|---|---|---|---|---|---|---|---|
|
| 20 |
+
| [4.0bpw](https://huggingface.co/NeuroSenko/ToriiGate-0.5-exl3/tree/4.0bpw) | 3.94 | 4.00 | 29.100 | 0.0150 | 0.0150 | 93.1% | 80.3% | 64.8% | 49.4% | 35.8% |
|
| 21 |
+
| [5.0bpw](https://huggingface.co/NeuroSenko/ToriiGate-0.5-exl3/tree/5.0bpw) | 4.51 | 5.00 | 28.854 | 0.0042 | 0.0042 | 96.2% | 88.6% | 78.6% | 67.2% | 55.8% |
|
| 22 |
+
| [6.0bpw](https://huggingface.co/NeuroSenko/ToriiGate-0.5-exl3/tree/6.0bpw) | 4.92 | 6.00 | 28.666 | 0.0013 | 0.0013 | 97.9% | 93.7% | 87.6% | 80.1% | 71.8% |
|
| 23 |
+
| [7.0bpw](https://huggingface.co/NeuroSenko/ToriiGate-0.5-exl3/tree/7.0bpw) | 5.34 | 7.00 | 28.610 | 0.0004 | 0.0004 | 98.7% | 96.0% | 92.2% | 87.2% | 81.4% |
|
| 24 |
+
| [8.0bpw](https://huggingface.co/NeuroSenko/ToriiGate-0.5-exl3/tree/8.0bpw) | 5.75 | 8.00 | 28.621 | 0.0002 | 0.0002 | 99.1% | 97.2% | 94.4% | 90.8% | 86.4% |
|
| 25 |
+
| original | 9.66 | 16.00 | 28.596 | — | — | — | — | — | — | — |
|
| 26 |
|
| 27 |
### Metrics
|
| 28 |
- **PPL** (Perplexity) — how well the model predicts the next token. Lower is better. The original model's PPL is the baseline.
|