Model Card for ealexeev/TheDrummer-Anubis-70B-v1.2-NVFP4

This is an NVFP4 quantization of TheDrummer/Anubis-70B-v1.2.

Quantization Details

Used https://github.com/ealexeev/llm-quantization script.

Calibration dataset size: 1024 Calibration data:

  • HuggingFaceH4/ultrachat_200k
  • allenai/c4_en
  • mrcedric98/fiction_books_v8

These were shuffled and mixed at a ratio of 3:2:3

Procedure

python ./quantize_nvfp4.py --model TheDrummer/Anubis-70B-v1.2 --output ./TheDrummer/Anubis-70B-v1.2 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3

I had read in VLLM docs that NVFP4 quantization needs very few samples. I ran multiple quants of 32, 64, 128, 256, and 512 samples. This 1024 version hit the sweet spot in these particular evals.

Quantization Evals

ARE PENDING

Metric Base Model (BF16) NVFP4 (Quantized) Delta
ARC Challenge (Logic/Reasoning) 0.61 0.5887 -3.5%
IFEval (Strict Instruction Following) 0.57 0.536 -6%
HellaSwag (Flow/Common Sense) 2.813 2.996 +6.5%
Wikitext (Word Perplexity) 5.318 6.7278 +26.5%
Lambada (Perplexity) 0.6671 0.6464 -3.1%

Bias, Risks, and Limitations

This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.

How To Use

bash
vllm serve ealexeev/TheDrummer-Anubis-70B-v1.2-NVFP4 \
    --tensor-parallel-size 1 \      # 1 GPU
    --gpu-memory-utilization 0.8 \  # Else it will take it all for KV
Downloads last month
62
Safetensors
Model size
41B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ealexeev/TheDrummer-Anubis-70B-v1.2-NVFP4

Quantized
(7)
this model