Update README.md
Browse files
README.md
CHANGED
|
@@ -17,6 +17,15 @@ tags:
|
|
| 17 |
These models are exl3 quantization models of [Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release.
|
| 18 |
I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.2).
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## Credits
|
| 21 |
|
| 22 |
Thanks to excellent work of exllamav3 dev teams.
|
|
|
|
| 17 |
These models are exl3 quantization models of [Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release.
|
| 18 |
I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.2).
|
| 19 |
|
| 20 |
+
## EXL3 Quantized Models
|
| 21 |
+
|
| 22 |
+
[4.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/4.0bpw)
|
| 23 |
+
[6.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/6.0bpw)
|
| 24 |
+
[8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw)
|
| 25 |
+
|
| 26 |
+
For coding, I found >6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>6.0bpw) is much better than 4.0bpw.
|
| 27 |
+
If you are using these models only for short Auto Completion, 4.0bpw is usable.
|
| 28 |
+
|
| 29 |
## Credits
|
| 30 |
|
| 31 |
Thanks to excellent work of exllamav3 dev teams.
|