Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,9 @@ This repo contains specialized MoE-quants for MiniMax-M2.5. The idea being that
|
|
| 12 |
| IQ4_XS | 101.10 GiB (3.80 BPW) | Q8_0 / IQ3_S / IQ3_S / IQ4_XS | 7.513587 ± 0.122746 | +6.0549% | 0.095077 ± 0.002168 |
|
| 13 |
| IQ3_S | 78.76 GiB (2.96 BPW) | Q8_0 / IQ2_S / IQ2_S / IQ3_S | 8.284882 ± 0.135705 | +16.9418% | 0.244096 ± 0.004148 |
|
| 14 |
|
| 15 |
-
Provided here as well as a couple of graphs showing the Pareto frontier for KLD and PPL for my quants vs Unsloth.
|
|
|
|
|
|
|
| 16 |
|
| 17 |
While the PPL between the quant methods is similar, I feel like the KLD of the quants provided here are slightly better and that these quants will offer better long context performance due to keeping the default type as Q8_0. This comes with a slight performance penalty in PP / TG due to the higher quality quantization but I think the tradeoff is worthwhile.
|
| 18 |
|
|
|
|
| 12 |
| IQ4_XS | 101.10 GiB (3.80 BPW) | Q8_0 / IQ3_S / IQ3_S / IQ4_XS | 7.513587 ± 0.122746 | +6.0549% | 0.095077 ± 0.002168 |
|
| 13 |
| IQ3_S | 78.76 GiB (2.96 BPW) | Q8_0 / IQ2_S / IQ2_S / IQ3_S | 8.284882 ± 0.135705 | +16.9418% | 0.244096 ± 0.004148 |
|
| 14 |
|
| 15 |
+
Provided here as well as a couple of graphs showing the Pareto frontier for KLD and PPL for my quants vs Unsloth.
|
| 16 |
+
|
| 17 |
+
Full graphs of all of the quants are available in the `kld_data` directory, as well as the raw data broken down per quant as well as a CSV with the collated data.
|
| 18 |
|
| 19 |
While the PPL between the quant methods is similar, I feel like the KLD of the quants provided here are slightly better and that these quants will offer better long context performance due to keeping the default type as Q8_0. This comes with a slight performance penalty in PP / TG due to the higher quality quantization but I think the tradeoff is worthwhile.
|
| 20 |
|