MikeRoz commited on
Commit
76b7bff
·
verified ·
1 Parent(s): 91190b2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ base_model: TheDrummer/Behemoth-R1-123B-v2
4
+ base_model_relation: quantized
5
+ tags:
6
+ - exl2
7
+ library_name: exllamav2
8
+ pipeline_tag: text-generation
9
+ ---
10
+
11
+ exllamav2 quantizations of TheDrummer's [Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2)
12
+
13
+ [2.50bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/2.50bpw_H6) (Quantizing)
14
+ [4.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/4.25bpw_H6) (61.324 GiB)
15
+ [8.00bpw h8](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/8.00bpw_H8) (114.559 GiB)
16
+ [measurement.json](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/resolve/main/measurement.json?download=true)
17
+
18
+ The 4.25bpw quant will squeeze into 3 24GB GPUs with 16k fp16 context, but can load with more than 64k context in 4 24GB GPUs.
19
+ The 8.00bpw quant requires 6 24 GB GPUs (or equivalent)