LLMJapan commited on
Commit
2e1d3ee
·
verified ·
1 Parent(s): a7be4a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -17,6 +17,15 @@ tags:
17
  These models are exl3 quantization models of [Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release.
18
  I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.2).
19
 
 
 
 
 
 
 
 
 
 
20
  ## Credits
21
 
22
  Thanks to excellent work of exllamav3 dev teams.
 
17
  These models are exl3 quantization models of [Qwen2.5-Coder-32B](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release.
18
  I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/releases/tag/v0.0.2).
19
 
20
+ ## EXL3 Quantized Models
21
+
22
+ [4.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/4.0bpw)
23
+ [6.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/6.0bpw)
24
+ [8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw)
25
+
26
+ For coding, I found >6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>6.0bpw) is much better than 4.0bpw.
27
+ If you are using these models only for short Auto Completion, 4.0bpw is usable.
28
+
29
  ## Credits
30
 
31
  Thanks to excellent work of exllamav3 dev teams.