exllamav3 quantizations of TheDrummer/Anubis-70B-v1.2. Quantized using commit 144d826 of the dev branch (version was bumped to v0.2.23 in the next commit).
| Quant | Size | KLD | PPL | GPU Requirement Hint |
|---|---|---|---|---|
| 2.00bpw h6 | 18.672 GiB | 0.785939 | 7.183849 | 1x24GB with 20,480 FP16 Context |
| 2.15bpw h6 ("Optimized") | 19.842 GiB | 0.713115 | 6.743676 | 1x24GB with 16,384 FP16 Context |
| 3.00bpw h6 | 26.641 GiB | 0.297314 | 5.491453 | 2x24GB with 69,632 FP16 Context |
| 4.00bpw h6 | 34.610 GiB | 0.097188 | 5.083009 | 2x24GB with 40,960 FP16 Context |
| 4.25bpw h6 ("Optimized") | 36.575 GiB | 0.072762 | 5.063066 | 2x24GB with 32,768 FP16 Context |
| 5.00bpw h6 | 42.579 GiB | 0.023965 | 4.975570 | 2x24GB with 16,384 FP16 Context |
| 6.00bpw h6 | 50.547 GiB | 0.023965 | 4.933002 | 3x24GB with 57,344 FP16 Context |
| 8.00bpw h8 | 66.730 GiB | 0.000850 | 4.929197 | 3x24GB with 8,192 FP16 Context or 4x24GB with 81,920 FP16 Context |
Thought I'd try my hand at applying the optimization techniques used for some recent large MoE models to this Llama-3.3 finetune. As you can see, the scaling was pretty unremarkably linear.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for MikeRoz/Anubis-70B-v1.2-exl3
Base model
meta-llama/Llama-3.1-70B Finetuned
meta-llama/Llama-3.3-70B-Instruct Finetuned
TheDrummer/Anubis-70B-v1.2