|
|
--- |
|
|
license: apache-2.0 |
|
|
license_link: https://huggingface.co/Qwen/Qwen3-Coder-Next/blob/main/LICENSE |
|
|
pipeline_tag: text-generation |
|
|
base_model: Qwen/Qwen3-Coder-Next |
|
|
base_model_relation: quantized |
|
|
quantization: exl3 |
|
|
library_name: exllamav3 |
|
|
--- |
|
|
|
|
|
Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3). |
|
|
|
|
|
> **Note:** In exllamav3 v0.0.21, there were [fixes to the Qwen3-Next inference pipeline](https://github.com/turboderp-org/exllamav3/commit/d3e02500e0dac2d67ca7fc9babed5d40dcf33689). These quants still work fine, but with v0.0.21+ they should perform even better. It is recommended to use exllamav3 v0.0.21 or later for best results. |
|
|
|
|
|
| Quant | Size (GB) | KL-div (quant, orig) | KL-div (orig, quant) | Perplexity | Top-K K=1 | Top-K K=2 | Top-K K=3 | Top-K K=4 | Top-K K=5 | |
|
|
|---|---|---|---|---|---|---|---|---|---| |
|
|
| [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) | 20 | 0.52142615 | 0.52278535 | 23.73415073 | 0.6961 | 0.3484 | 0.1402 | 0.0498 | 0.0167 | |
|
|
| [3.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/3.0bpw) | 29 | 0.24568403 | 0.24622221 | 20.58547252 | 0.7866 | 0.4894 | 0.2579 | 0.1190 | 0.0513 | |
|
|
| [4.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/4.0bpw) | 38 | 0.15672405 | 0.15667850 | 19.63543922 | 0.8338 | 0.5783 | 0.3511 | 0.1923 | 0.0990 | |
|
|
| [5.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/5.0bpw) | 47 | 0.12297954 | 0.12280908 | 19.81022066 | 0.8562 | 0.6287 | 0.4088 | 0.2463 | 0.1388 | |
|
|
| [6.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/6.0bpw) | 57 | 0.10448053 | 0.10464503 | 19.88056610 | 0.8707 | 0.6590 | 0.4502 | 0.2848 | 0.1704 | |
|
|
| [7.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/7.0bpw) | 66 | 0.10106506 | 0.10081614 | 19.61846442 | 0.8730 | 0.6666 | 0.4614 | 0.2983 | 0.1821 | |
|
|
| [8.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/8.0bpw) | 75 | 0.13291914 | 0.13419860 | 19.85572412 | 0.8631 | 0.6503 | 0.4468 | 0.2885 | 0.1771 | |
|
|
| original | 148 | - | - | 19.78538866 | - | - | - | - | - | |
|
|
|
|
|
## Tool Calls Support for Qwen/GLM Models |
|
|
|
|
|
The official tabbyAPI doesn't support tool calls for Qwen and GLM models yet. |
|
|
|
|
|
If you're using Qwen-Code, OpenClaw, or similar software that need tool call support, you can use [my fork](https://github.com/NeuroSenko/tabbyAPI/tree/tools-support) with the `tools-support` branch: |
|
|
|
|
|
**Clone directly:** |
|
|
```bash |
|
|
git clone -b tools-support https://github.com/NeuroSenko/tabbyAPI |
|
|
``` |
|
|
|
|
|
**Or add to existing tabbyAPI installation:** |
|
|
```bash |
|
|
git remote add neurosenko https://github.com/NeuroSenko/tabbyAPI |
|
|
git fetch neurosenko |
|
|
git checkout -b tools-support neurosenko/tools-support |
|
|
``` |
|
|
|
|
|
This branch includes native tool calling support for Qwen and GLM model families. |