MikeRoz commited on
Commit
7f8ca8b
·
verified ·
1 Parent(s): 01c7f6e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -3
README.md CHANGED
@@ -12,10 +12,7 @@ exllamav2 quantizations of TheDrummer's [Behemoth-R1-123B-v2](https://huggingfac
12
 
13
  [2.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/2.25bpw_H6) (32.964 GiB)
14
  [4.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/4.25bpw_H6) (61.324 GiB)
15
- [5.00bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/5.00bpw_H6) (71.959 GiB)
16
- [8.00bpw h8](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/8.00bpw_H8) (114.559 GiB)
17
  [measurement.json](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/resolve/main/measurement.json?download=true)
18
 
19
  The 2.25bpw quant will load with 28k fp16 context on 2 24 GB GPUs, or 89k fp16 context on 3 24 GB GPUs.
20
  The 4.25bpw quant will squeeze into 3 24GB GPUs with 16k fp16 context, but can load with 73k of fp16 context in 4 24GB GPUs.
21
- The 8.00bpw quant requires 6 24 GB GPUs (or equivalent)
 
12
 
13
  [2.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/2.25bpw_H6) (32.964 GiB)
14
  [4.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/4.25bpw_H6) (61.324 GiB)
 
 
15
  [measurement.json](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/resolve/main/measurement.json?download=true)
16
 
17
  The 2.25bpw quant will load with 28k fp16 context on 2 24 GB GPUs, or 89k fp16 context on 3 24 GB GPUs.
18
  The 4.25bpw quant will squeeze into 3 24GB GPUs with 16k fp16 context, but can load with 73k of fp16 context in 4 24GB GPUs.