Model Card for ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4

This is an NVFP4 quantization of TheDrummer/Behemoth-X-123B-v2.

Quantization Details

Used https://github.com/ealexeev/llm-quantization script.

Calibration dataset size: 1024 Calibration data:

HuggingFaceH4/ultrachat_200k
allenai/c4_en
mrcedric98/fiction_books_v8

These were shuffled and mixed at a ratio of 3:2:3

Procedure

python ./quantize_nvfp4.py --model TheDrummer/Behemoth-X-123B-v2 --output ./TheDrummer/Behemoth-X-123B-v2 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3

I had read in VLLM docs that NVFP4 quantization needs very few samples. In my experience 1024 is a good number for larger models.

Quantization Evals

Metric	NVFP4
ARC Challenge (Logic/Reasoning)	0.6391
IFEval (Strict Instruction Following)	0.7431
HellaSwag (Flow/Common Sense)	0.6961
Wikitext (Word Perplexity)	4.5030
Lambada (Perplexity)	2.4814
Winogrande	0.8089

Bias, Risks, and Limitations

This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.

How To Use

bash
vllm serve ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4 \
    --tensor-parallel-size 1 \      # 1 GPU
    --gpu-memory-utilization 0.8 \  # Else it will take it all for KV

Downloads last month: 9

Safetensors

Model size

69B params

Tensor type

F32

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4

Base model

mistralai/Mistral-Large-Instruct-2411

Finetuned

TheDrummer/Behemoth-X-123B-v2

Quantized

(10)

this model