Model Card for ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4

This is an NVFP4 quantization of TheDrummer/Behemoth-X-123B-v2.

Quantization Details

Used https://github.com/ealexeev/llm-quantization script.

Calibration dataset size: 1024 Calibration data:

  • HuggingFaceH4/ultrachat_200k
  • allenai/c4_en
  • mrcedric98/fiction_books_v8

These were shuffled and mixed at a ratio of 3:2:3

Procedure

python ./quantize_nvfp4.py --model TheDrummer/Behemoth-X-123B-v2 --output ./TheDrummer/Behemoth-X-123B-v2 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3

I had read in VLLM docs that NVFP4 quantization needs very few samples. In my experience 1024 is a good number for larger models.

Quantization Evals

Metric NVFP4
ARC Challenge (Logic/Reasoning) 0.6391
IFEval (Strict Instruction Following) 0.7431
HellaSwag (Flow/Common Sense) 0.6961
Wikitext (Word Perplexity) 4.5030
Lambada (Perplexity) 2.4814
Winogrande 0.8089

Bias, Risks, and Limitations

This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.

How To Use

bash
vllm serve ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4 \
    --tensor-parallel-size 1 \      # 1 GPU
    --gpu-memory-utilization 0.8 \  # Else it will take it all for KV
Downloads last month
9
Safetensors
Model size
69B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4

Quantized
(10)
this model