exllamav3 quantizations of TheDrummer/Anubis-70B-v1.2. Quantized using commit 144d826 of the dev branch (version was bumped to v0.2.23 in the next commit).

Quant	Size	KLD	PPL	GPU Requirement Hint
2.00bpw h6	18.672 GiB	0.785939	7.183849	1x24GB with 20,480 FP16 Context
2.15bpw h6 ("Optimized")	19.842 GiB	0.713115	6.743676	1x24GB with 16,384 FP16 Context
3.00bpw h6	26.641 GiB	0.297314	5.491453	2x24GB with 69,632 FP16 Context
4.00bpw h6	34.610 GiB	0.097188	5.083009	2x24GB with 40,960 FP16 Context
4.25bpw h6 ("Optimized")	36.575 GiB	0.072762	5.063066	2x24GB with 32,768 FP16 Context
5.00bpw h6	42.579 GiB	0.023965	4.975570	2x24GB with 16,384 FP16 Context
6.00bpw h6	50.547 GiB	0.023965	4.933002	3x24GB with 57,344 FP16 Context
8.00bpw h8	66.730 GiB	0.000850	4.929197	3x24GB with 8,192 FP16 Context or 4x24GB with 81,920 FP16 Context

Thought I'd try my hand at applying the optimization techniques used for some recent large MoE models to this Llama-3.3 finetune. As you can see, the scaling was pretty unremarkably linear.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MikeRoz/Anubis-70B-v1.2-exl3

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Finetuned

TheDrummer/Anubis-70B-v1.2

Quantized

(7)

this model