NeuroSenko's picture
Update README.md
62aed5e verified
metadata
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-Coder-Next/blob/main/LICENSE
pipeline_tag: text-generation
base_model: Qwen/Qwen3-Coder-Next
base_model_relation: quantized
quantization: exl3
library_name: exllamav3

Quantization was performed using exllama3 v0.0.20.

Note: In exllamav3 v0.0.21, there were fixes to the Qwen3-Next inference pipeline. These quants still work fine, but with v0.0.21+ they should perform even better. It is recommended to use exllamav3 v0.0.21 or later for best results.

Quant Size (GB) KL-div (quant, orig) KL-div (orig, quant) Perplexity Top-K K=1 Top-K K=2 Top-K K=3 Top-K K=4 Top-K K=5
2.0bpw 20 0.52142615 0.52278535 23.73415073 0.6961 0.3484 0.1402 0.0498 0.0167
3.0bpw 29 0.24568403 0.24622221 20.58547252 0.7866 0.4894 0.2579 0.1190 0.0513
4.0bpw 38 0.15672405 0.15667850 19.63543922 0.8338 0.5783 0.3511 0.1923 0.0990
5.0bpw 47 0.12297954 0.12280908 19.81022066 0.8562 0.6287 0.4088 0.2463 0.1388
6.0bpw 57 0.10448053 0.10464503 19.88056610 0.8707 0.6590 0.4502 0.2848 0.1704
7.0bpw 66 0.10106506 0.10081614 19.61846442 0.8730 0.6666 0.4614 0.2983 0.1821
8.0bpw 75 0.13291914 0.13419860 19.85572412 0.8631 0.6503 0.4468 0.2885 0.1771
original 148 - - 19.78538866 - - - - -

Tool Calls Support for Qwen/GLM Models

The official tabbyAPI doesn't support tool calls for Qwen and GLM models yet.

If you're using Qwen-Code, OpenClaw, or similar software that need tool call support, you can use my fork with the tools-support branch:

Clone directly:

git clone -b tools-support https://github.com/NeuroSenko/tabbyAPI

Or add to existing tabbyAPI installation:

git remote add neurosenko https://github.com/NeuroSenko/tabbyAPI
git fetch neurosenko
git checkout -b tools-support neurosenko/tools-support

This branch includes native tool calling support for Qwen and GLM model families.