Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -16,11 +16,14 @@ They run at about 3-6 t/sec on CPU only using llama.cpp
|
|
| 16 |
And obviously faster on computers with potent GPUs
|
| 17 |
|
| 18 |
ALL the models were quantized in this way:
|
|
|
|
|
|
|
|
|
|
| 19 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
|
| 20 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
|
| 21 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
|
| 22 |
quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
|
| 23 |
-
|
| 24 |
|
| 25 |
* [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
|
| 26 |
* [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)
|
|
|
|
| 16 |
And obviously faster on computers with potent GPUs
|
| 17 |
|
| 18 |
ALL the models were quantized in this way:
|
| 19 |
+
```
|
| 20 |
+
python llama.cpp/convert_hf_to_gguf.py --outtype f16 model --outfile model.f16.gguf
|
| 21 |
+
|
| 22 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q5.gguf q5_k
|
| 23 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q6_k
|
| 24 |
quantize.exe --allow-requantize --output-tensor-type f16 --token-embedding-type f16 model.f16.gguf model.f16.q6.gguf q8_0
|
| 25 |
quantize.exe --allow-requantize --pure model.f16.gguf model.f16.q8_p.gguf q8_0
|
| 26 |
+
```
|
| 27 |
|
| 28 |
* [ZeroWw/Mistral-Nemo-Instruct-2407-GGUF](https://huggingface.co/ZeroWw/Mistral-Nemo-Instruct-2407-GGUF)
|
| 29 |
* [ZeroWw/L3-8B-Celeste-V1.2-GGUF](https://huggingface.co/ZeroWw/L3-8B-Celeste-V1.2-GGUF)
|