LLMJapan commited on
Commit
df66b16
·
verified ·
1 Parent(s): 2e1d3ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -20,10 +20,12 @@ I used [exllamav3 version 0.0.2](https://github.com/turboderp-org/exllamav3/rele
20
  ## EXL3 Quantized Models
21
 
22
  [4.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/4.0bpw)
 
23
  [6.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/6.0bpw)
 
24
  [8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw)
25
 
26
- For coding, I found >6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>6.0bpw) is much better than 4.0bpw.
27
  If you are using these models only for short Auto Completion, 4.0bpw is usable.
28
 
29
  ## Credits
 
20
  ## EXL3 Quantized Models
21
 
22
  [4.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/4.0bpw)
23
+
24
  [6.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/6.0bpw)
25
+
26
  [8.0bpw](https://huggingface.co/LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3/tree/8.0bpw)
27
 
28
+ For coding, I found >6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>Q6) is much better than 4.0bpw.
29
  If you are using these models only for short Auto Completion, 4.0bpw is usable.
30
 
31
  ## Credits