Quantization was performed using exllama3 v0.0.20.

Note: In exllamav3 v0.0.21, there were fixes to the Qwen3-Next inference pipeline. These quants still work fine, but with v0.0.21+ they should perform even better. It is recommended to use exllamav3 v0.0.21 or later for best results.

Update: The KL-divergence anomaly observed in exllama3 v0.0.20 (where 8bpw had higher KL-div than 5bpw) has been resolved in v0.0.22. Measurements have been re-run and the table below reflects the corrected values. All quantization levels now show the expected monotonic decrease in KL-divergence as bpw increases.

Measurements table for exllama3 v0.0.22

Quant	Size (GB)	KL-div (quant, orig)	KL-div (orig, quant)	Perplexity	Top-K K=1	Top-K K=2	Top-K K=3	Top-K K=4	Top-K K=5
2.0bpw	20	0.41110482	0.45863510	8.85093561	0.7669	0.4276	0.1963	0.0789	0.0296
3.0bpw	29	0.16125607	0.16561898	8.06653676	0.8536	0.5947	0.3567	0.1923	0.0960
4.0bpw	38	0.05995151	0.06084711	7.72232643	0.9079	0.7220	0.5146	0.3383	0.2098
5.0bpw	47	0.02719813	0.02733112	7.68017339	0.9376	0.8013	0.6315	0.4682	0.3279
6.0bpw	57	0.01553572	0.01543948	7.72972846	0.9522	0.8440	0.7019	0.5538	0.4183
7.0bpw	66	0.01088568	0.01090296	7.71071654	0.9611	0.8696	0.7452	0.6116	0.4822
8.0bpw	75	0.00899026	0.00897780	7.70958606	0.9652	0.8816	0.7673	0.6398	0.5159
original	148	-	-	7.70773351	-	-	-	-	-

Measurements table for exllama3 v0.0.20 (deprecated)

Quant	Size (GB)	KL-div (quant, orig)	KL-div (orig, quant)	Perplexity	Top-K K=1	Top-K K=2	Top-K K=3	Top-K K=4	Top-K K=5
2.0bpw	20	0.52142615	0.52278535	23.73415073	0.6961	0.3484	0.1402	0.0498	0.0167
3.0bpw	29	0.24568403	0.24622221	20.58547252	0.7866	0.4894	0.2579	0.1190	0.0513
4.0bpw	38	0.15672405	0.15667850	19.63543922	0.8338	0.5783	0.3511	0.1923	0.0990
5.0bpw	47	0.12297954	0.12280908	19.81022066	0.8562	0.6287	0.4088	0.2463	0.1388
6.0bpw	57	0.10448053	0.10464503	19.88056610	0.8707	0.6590	0.4502	0.2848	0.1704
7.0bpw	66	0.10106506	0.10081614	19.61846442	0.8730	0.6666	0.4614	0.2983	0.1821
8.0bpw	75	0.13291914	0.13419860	19.85572412	0.8631	0.6503	0.4468	0.2885	0.1771
original	148	-	-	19.78538866	-	-	-	-	-

Tool Calls Support for Qwen/GLM Models

The official tabbyAPI doesn't support tool calls for Qwen and GLM models yet.

If you're using Qwen-Code, OpenClaw, or similar software that need tool call support, you can use my fork with the tools-support branch:

Clone directly:

git clone -b tools-support https://github.com/NeuroSenko/tabbyAPI

Or add to existing tabbyAPI installation:

git remote add neurosenko https://github.com/NeuroSenko/tabbyAPI
git fetch neurosenko
git checkout -b tools-support neurosenko/tools-support

This branch includes native tool calling support for Qwen and GLM model families.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for NeuroSenko/Qwen3-Coder-Next-exl3

Base model

Qwen/Qwen3-Coder-Next

Quantized

(101)

this model