NeuroSenko
/

Qwen3-Coder-Next-exl3

Text Generation

Model card Files Files and versions

NeuroSenko commited on 13 days ago

Commit

c6587a4

·

verified ·

1 Parent(s): 410edb0

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -10,6 +10,8 @@ library_name: exllamav3
 Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3).
 | Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
 |---|---|---|---|---|---|---|---|---|---|
 | [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) | 20 | 0.52142615 | 0.52278535 | 23.73415073 | 0.6961 | 0.3484 | 0.1402 | 0.0498 | 0.0167 |

 Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3).
+> **Note:** In exllamav3 v0.0.21, there were [fixes to the Qwen3-Next inference pipeline](https://github.com/turboderp-org/exllamav3/commit/d3e02500e0dac2d67ca7fc9babed5d40dcf33689). These quants still work fine, but with v0.0.21+ they should perform even better. It is recommended to use exllamav3 v0.0.21 or later for best results.
 | Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
 |---|---|---|---|---|---|---|---|---|---|
 | [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) | 20 | 0.52142615 | 0.52278535 | 23.73415073 | 0.6961 | 0.3484 | 0.1402 | 0.0498 | 0.0167 |