Model Card for ealexeev/TheDrummer-Anubis-70B-v1.2-NVFP4
This is an NVFP4 quantization of TheDrummer/Anubis-70B-v1.2.
Quantization Details
Used https://github.com/ealexeev/llm-quantization script.
Calibration dataset size: 1024 Calibration data:
- HuggingFaceH4/ultrachat_200k
- allenai/c4_en
- mrcedric98/fiction_books_v8
These were shuffled and mixed at a ratio of 3:2:3
Procedure
python ./quantize_nvfp4.py --model TheDrummer/Anubis-70B-v1.2 --output ./TheDrummer/Anubis-70B-v1.2 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3
I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 32, 64, 128, 256, and 512 samples. This 1024 version hit the sweet spot in these particular evals.
Quantization Evals
ARE PENDING
| Metric | Base Model (BF16) | NVFP4 (Quantized) | Delta |
|---|---|---|---|
| ARC Challenge (Logic/Reasoning) | 0.61 | 0.5887 | -3.5% |
| IFEval (Strict Instruction Following) | 0.57 | 0.536 | -6% |
| HellaSwag (Flow/Common Sense) | 2.813 | 2.996 | +6.5% |
| Wikitext (Word Perplexity) | 5.318 | 6.7278 | +26.5% |
| Lambada (Perplexity) | 0.6671 | 0.6464 | -3.1% |
Bias, Risks, and Limitations
This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.
How To Use
bash
vllm serve ealexeev/TheDrummer-Anubis-70B-v1.2-NVFP4 \
--tensor-parallel-size 1 \ # 1 GPU
--gpu-memory-utilization 0.8 \ # Else it will take it all for KV
- Downloads last month
- 62
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for ealexeev/TheDrummer-Anubis-70B-v1.2-NVFP4
Base model
meta-llama/Llama-3.1-70B Finetuned
meta-llama/Llama-3.3-70B-Instruct Finetuned
TheDrummer/Anubis-70B-v1.2