Update README.md
Browse files
README.md
CHANGED
|
@@ -5,16 +5,16 @@ EXL2 quants of [gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
|
|
| 5 |
|
| 6 |
My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume **8k context**.
|
| 7 |
|
| 8 |
-
bpw|head|4 bit cache|16 bit cache
|
| 9 |
-
--:|--:|--:|--:
|
| 10 |
-
|
| 11 |
-
๐ [**6.5**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5_h8)|8 bit|**23.81 GB**|25.65 GB
|
| 12 |
-
|
| 13 |
|
| 14 |
For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
|
| 15 |
|
| 16 |
---
|
| 17 |
-
Check out turboderp's quants &
|
| 18 |
[3.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.0bpw)
|
| 19 |
[3.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.5bpw)
|
| 20 |
[4.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.0bpw)
|
|
|
|
| 5 |
|
| 6 |
My quants are meant to be a tight fit in 24 GB VRAM. Following VRAM usage numbers assume **8k context**.
|
| 7 |
|
| 8 |
+
bpw|head|4 bit cache|16 bit cache|Notes
|
| 9 |
+
--:|--:|--:|--:|:--
|
| 10 |
+
[**5.8**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/5.8_h8)|**8 bit**|21.85 GB|**23.69 GB**|16 bit cache, but lower BPW
|
| 11 |
+
๐ [**6.5**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.5_h8)|**8 bit**|**23.81 GB**|25.65 GB|๐ my recommendation
|
| 12 |
+
[**6.6**](https://huggingface.co/mo137/gemma-2-27b-it-exl2/tree/6.6_h6)|6 bit|**23.86 GB**|25.70 GB|slightly higher BPW, but less precise head
|
| 13 |
|
| 14 |
For this model the difference between 6 bit and 8 bit head is ~300 MB, it's not huge. It could be exchanged for about 0.1 bpw in the body.
|
| 15 |
|
| 16 |
---
|
| 17 |
+
Check out turboderp's quants & measurement.json:
|
| 18 |
[3.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.0bpw)
|
| 19 |
[3.50 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/3.5bpw)
|
| 20 |
[4.00 bits per weight](https://huggingface.co/turboderp/gemma-2-27b-it-exl2/tree/4.0bpw)
|