exllamav3 quantizations of TheDrummer/Anubis-70B-v1.2. Quantized using commit 144d826 of the dev branch (version was bumped to v0.2.23 in the next commit).

Quant Size KLD PPL GPU Requirement Hint
2.00bpw h6 18.672 GiB 0.785939 7.183849 1x24GB with 20,480 FP16 Context
2.15bpw h6 ("Optimized") 19.842 GiB 0.713115 6.743676 1x24GB with 16,384 FP16 Context
3.00bpw h6 26.641 GiB 0.297314 5.491453 2x24GB with 69,632 FP16 Context
4.00bpw h6 34.610 GiB 0.097188 5.083009 2x24GB with 40,960 FP16 Context
4.25bpw h6 ("Optimized") 36.575 GiB 0.072762 5.063066 2x24GB with 32,768 FP16 Context
5.00bpw h6 42.579 GiB 0.023965 4.975570 2x24GB with 16,384 FP16 Context
6.00bpw h6 50.547 GiB 0.023965 4.933002 3x24GB with 57,344 FP16 Context
8.00bpw h8 66.730 GiB 0.000850 4.929197 3x24GB with 8,192 FP16 Context or 4x24GB with 81,920 FP16 Context

Thought I'd try my hand at applying the optimization techniques used for some recent large MoE models to this Llama-3.3 finetune. As you can see, the scaling was pretty unremarkably linear.

KLD Chart

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MikeRoz/Anubis-70B-v1.2-exl3

Quantized
(7)
this model