Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
inference: false
|
| 3 |
+
base_model: TheDrummer/Behemoth-R1-123B-v2
|
| 4 |
+
base_model_relation: quantized
|
| 5 |
+
tags:
|
| 6 |
+
- exl2
|
| 7 |
+
library_name: exllamav2
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
exllamav2 quantizations of TheDrummer's [Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2)
|
| 12 |
+
|
| 13 |
+
[2.50bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/2.50bpw_H6) (Quantizing)
|
| 14 |
+
[4.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/4.25bpw_H6) (61.324 GiB)
|
| 15 |
+
[8.00bpw h8](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/8.00bpw_H8) (114.559 GiB)
|
| 16 |
+
[measurement.json](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/resolve/main/measurement.json?download=true)
|
| 17 |
+
|
| 18 |
+
The 4.25bpw quant will squeeze into 3 24GB GPUs with 16k fp16 context, but can load with more than 64k context in 4 24GB GPUs.
|
| 19 |
+
The 8.00bpw quant requires 6 24 GB GPUs (or equivalent)
|