NeuroSenko
/

Qwen3-Coder-Next-exl3

Text Generation

Model card Files Files and versions

Qwen3-Coder-Next-exl3 / README.md

NeuroSenko's picture

Update README.md

62aed5e verified 3 days ago

|

history blame contribute delete

2.83 kB

	---
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3-Coder-Next/blob/main/LICENSE
	pipeline_tag: text-generation
	base_model: Qwen/Qwen3-Coder-Next
	base_model_relation: quantized
	quantization: exl3
	library_name: exllamav3
	---

	Quantization was performed using [exllama3 v0.0.20](https://github.com/turboderp-org/exllamav3).

	> Note: In exllamav3 v0.0.21, there were [fixes to the Qwen3-Next inference pipeline](https://github.com/turboderp-org/exllamav3/commit/d3e02500e0dac2d67ca7fc9babed5d40dcf33689). These quants still work fine, but with v0.0.21+ they should perform even better. It is recommended to use exllamav3 v0.0.21 or later for best results.

	\| Quant \| Size (GB) \| KL-div (quant, orig) \| KL-div (orig, quant) \| Perplexity \| Top-K K=1 \| Top-K K=2 \| Top-K K=3 \| Top-K K=4 \| Top-K K=5 \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| [2.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/2.0bpw) \| 20 \| 0.52142615 \| 0.52278535 \| 23.73415073 \| 0.6961 \| 0.3484 \| 0.1402 \| 0.0498 \| 0.0167 \|
	\| [3.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/3.0bpw) \| 29 \| 0.24568403 \| 0.24622221 \| 20.58547252 \| 0.7866 \| 0.4894 \| 0.2579 \| 0.1190 \| 0.0513 \|
	\| [4.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/4.0bpw) \| 38 \| 0.15672405 \| 0.15667850 \| 19.63543922 \| 0.8338 \| 0.5783 \| 0.3511 \| 0.1923 \| 0.0990 \|
	\| [5.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/5.0bpw) \| 47 \| 0.12297954 \| 0.12280908 \| 19.81022066 \| 0.8562 \| 0.6287 \| 0.4088 \| 0.2463 \| 0.1388 \|
	\| [6.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/6.0bpw) \| 57 \| 0.10448053 \| 0.10464503 \| 19.88056610 \| 0.8707 \| 0.6590 \| 0.4502 \| 0.2848 \| 0.1704 \|
	\| [7.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/7.0bpw) \| 66 \| 0.10106506 \| 0.10081614 \| 19.61846442 \| 0.8730 \| 0.6666 \| 0.4614 \| 0.2983 \| 0.1821 \|
	\| [8.0bpw](https://huggingface.co/NeuroSenko/Qwen3-Coder-Next-exl3/tree/8.0bpw) \| 75 \| 0.13291914 \| 0.13419860 \| 19.85572412 \| 0.8631 \| 0.6503 \| 0.4468 \| 0.2885 \| 0.1771 \|
	\| original \| 148 \| - \| - \| 19.78538866 \| - \| - \| - \| - \| - \|

	## Tool Calls Support for Qwen/GLM Models

	The official tabbyAPI doesn't support tool calls for Qwen and GLM models yet.

	If you're using Qwen-Code, OpenClaw, or similar software that need tool call support, you can use [my fork](https://github.com/NeuroSenko/tabbyAPI/tree/tools-support) with the `tools-support` branch:

	Clone directly:
	```bash
	git clone -b tools-support https://github.com/NeuroSenko/tabbyAPI
	```

	Or add to existing tabbyAPI installation:
	```bash
	git remote add neurosenko https://github.com/NeuroSenko/tabbyAPI
	git fetch neurosenko
	git checkout -b tools-support neurosenko/tools-support
	```

	This branch includes native tool calling support for Qwen and GLM model families.