Update README.md
Browse files
README.md
CHANGED
|
@@ -10,10 +10,11 @@ pipeline_tag: text-generation
|
|
| 10 |
|
| 11 |
exllamav2 quantizations of TheDrummer's [Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2)
|
| 12 |
|
| 13 |
-
[2.
|
| 14 |
[4.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/4.25bpw_H6) (61.324 GiB)
|
| 15 |
-
[8.00bpw h8](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/8.00bpw_H8) (114.559 GiB)
|
| 16 |
[measurement.json](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/resolve/main/measurement.json?download=true)
|
| 17 |
|
| 18 |
-
The
|
|
|
|
| 19 |
The 8.00bpw quant requires 6 24 GB GPUs (or equivalent)
|
|
|
|
| 10 |
|
| 11 |
exllamav2 quantizations of TheDrummer's [Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2)
|
| 12 |
|
| 13 |
+
[2.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/2.25bpw_H6) (32.964 GiB)(Uploading)
|
| 14 |
[4.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/4.25bpw_H6) (61.324 GiB)
|
| 15 |
+
[8.00bpw h8](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/8.00bpw_H8) (114.559 GiB)
|
| 16 |
[measurement.json](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/resolve/main/measurement.json?download=true)
|
| 17 |
|
| 18 |
+
The 2.25bpw quant will load with 28k fp16 context on 2 24 GB GPUs, or 89k fp16 context on 3 24 GB GPUs.
|
| 19 |
+
The 4.25bpw quant will squeeze into 3 24GB GPUs with 16k fp16 context, but can load with 73k of fp16 context in 4 24GB GPUs.
|
| 20 |
The 8.00bpw quant requires 6 24 GB GPUs (or equivalent)
|