Update README.md
Browse files
README.md
CHANGED
|
@@ -10,9 +10,11 @@ tags:
|
|
| 10 |
quantized_by: bartowski
|
| 11 |
---
|
| 12 |
|
|
|
|
|
|
|
| 13 |
## Llamacpp imatrix Quantizations of Qwen2-72B-Instruct
|
| 14 |
|
| 15 |
-
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a>
|
| 16 |
|
| 17 |
Original model: https://huggingface.co/Qwen/Qwen2-72B-Instruct
|
| 18 |
|
|
|
|
| 10 |
quantized_by: bartowski
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# <b>Heads up:</b> currently CUDA offloading is broken unless you enable flash attention
|
| 14 |
+
|
| 15 |
## Llamacpp imatrix Quantizations of Qwen2-72B-Instruct
|
| 16 |
|
| 17 |
+
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> commit <a href="https://github.com/ggerganov/llama.cpp/commit/ee459f40f65810a810151b24eba5b8bd174ceffe">ee459f40f65810a810151b24eba5b8bd174ceffe</a> for quantization.
|
| 18 |
|
| 19 |
Original model: https://huggingface.co/Qwen/Qwen2-72B-Instruct
|
| 20 |
|