NeuroSenko commited on
Commit
c6587a4
·
verified ·
1 Parent(s): 410edb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -10,6 +10,8 @@ library_name: exllamav3
10
 
11
  Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3).
12
 
 
 
13
  | Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
14
  |---|---|---|---|---|---|---|---|---|---|
15
  | [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) | 20 | 0.52142615 | 0.52278535 | 23.73415073 | 0.6961 | 0.3484 | 0.1402 | 0.0498 | 0.0167 |
 
10
 
11
  Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3).
12
 
13
+ > **Note:** In exllamav3 v0.0.21, there were [fixes to the Qwen3-Next inference pipeline](https://github.com/turboderp-org/exllamav3/commit/d3e02500e0dac2d67ca7fc9babed5d40dcf33689). These quants still work fine, but with v0.0.21+ they should perform even better. It is recommended to use exllamav3 v0.0.21 or later for best results.
14
+
15
  | Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 |
16
  |---|---|---|---|---|---|---|---|---|---|
17
  | [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) | 20 | 0.52142615 | 0.52278535 | 23.73415073 | 0.6961 | 0.3484 | 0.1402 | 0.0498 | 0.0167 |