datatab
/

Yugo45A-GPT-Quantized-GGUF

text-generation-inference

Model card Files Files and versions

Yugo45A-GPT-Quantized-GGUF

35.8 GB

1 contributor

History: 27 commits

datatab's picture

q4_k_s: Uses Q4_K for all tensors

7969851 verified almost 2 years ago

.gitattributes

2.61 kB

q4_k_s: Uses Q4_K for all tensors almost 2 years ago
README.md

267 Bytes

Update README.md almost 2 years ago
Yugo45A-GPT-Quantized-GGUF.Q3_K_M.gguf

3.52 GB
xet

q3_k_m: Uses Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else Q3_K almost 2 years ago
Yugo45A-GPT-Quantized-GGUF.Q4_K_M.gguf

4.37 GB
xet

q4_k_m: Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K almost 2 years ago
Yugo45A-GPT-Quantized-GGUF.Q4_K_S.gguf

4.14 GB
xet

q4_k_s: Uses Q4_K for all tensors almost 2 years ago
Yugo45A-GPT-Quantized-GGUF.Q5_0.gguf

5 GB
xet

q5_0: Higher accuracy, higher resource usage and slower inference. almost 2 years ago
Yugo45A-GPT-Quantized-GGUF.Q5_K_M.gguf

5.13 GB
xet

q5_k_m: Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K almost 2 years ago
Yugo45A-GPT-Quantized-GGUF.Q6_K.gguf

5.94 GB
xet

q6_k: Uses Q6_K for all tensors almost 2 years ago
Yugo45A-GPT-Quantized-GGUF.Q8_0.gguf

7.7 GB
xet

q8_0: Fast conversion. High resource use, but generally acceptable. almost 2 years ago
config.json

31 Bytes

Create config.json almost 2 years ago