Update README.md
Browse files
README.md
CHANGED
|
@@ -10,6 +10,8 @@ library_name: exllamav3
|
|
| 10 |
|
| 11 |
Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3).
|
| 12 |
|
|
|
|
|
|
|
| 13 |
| Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
|
| 14 |
|---|---|---|---|---|---|---|---|---|---|
|
| 15 |
| [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) | 20 | 0.52142615 | 0.52278535 | 23.73415073 | 0.6961 | 0.3484 | 0.1402 | 0.0498 | 0.0167 |
|
|
|
|
| 10 |
|
| 11 |
Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3).
|
| 12 |
|
| 13 |
+
> **Note:** In exllamav3 v0.0.21, there were [fixes to the Qwen3-Next inference pipeline](https://github.com/turboderp-org/exllamav3/commit/d3e02500e0dac2d67ca7fc9babed5d40dcf33689). These quants still work fine, but with v0.0.21+ they should perform even better. It is recommended to use exllamav3 v0.0.21 or later for best results.
|
| 14 |
+
|
| 15 |
| Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
|
| 16 |
|---|---|---|---|---|---|---|---|---|---|
|
| 17 |
| [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) | 20 | 0.52142615 | 0.52278535 | 23.73415073 | 0.6961 | 0.3484 | 0.1402 | 0.0498 | 0.0167 |
|