MikeRoz
/

Behemoth-R1-123B-v2-exl2

Text Generation

Model card Files Files and versions

Behemoth-R1-123B-v2-exl2 / README.md

MikeRoz's picture

Update README.md

01c7f6e verified 4 months ago

|

history blame contribute delete

1.12 kB

	---
	inference: false
	base_model: TheDrummer/Behemoth-R1-123B-v2
	base_model_relation: quantized
	tags:
	- exl2
	library_name: exllamav2
	pipeline_tag: text-generation
	---

	exllamav2 quantizations of TheDrummer's [Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2)

	[2.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/2.25bpw_H6) (32.964 GiB)
	[4.25bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/4.25bpw_H6) (61.324 GiB)
	[5.00bpw h6](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/5.00bpw_H6) (71.959 GiB)
	[8.00bpw h8](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/tree/8.00bpw_H8) (114.559 GiB)
	[measurement.json](https://huggingface.co/MikeRoz/Behemoth-R1-123B-v2-exl2/resolve/main/measurement.json?download=true)

	The 2.25bpw quant will load with 28k fp16 context on 2 24 GB GPUs, or 89k fp16 context on 3 24 GB GPUs.
	The 4.25bpw quant will squeeze into 3 24GB GPUs with 16k fp16 context, but can load with 73k of fp16 context in 4 24GB GPUs.
	The 8.00bpw quant requires 6 24 GB GPUs (or equivalent)