Model Card for ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4
This is an NVFP4 quantization of TheDrummer/Behemoth-X-123B-v2.
Quantization Details
Used https://github.com/ealexeev/llm-quantization script.
Calibration dataset size: 1024 Calibration data:
- HuggingFaceH4/ultrachat_200k
- allenai/c4_en
- mrcedric98/fiction_books_v8
These were shuffled and mixed at a ratio of 3:2:3
Procedure
python ./quantize_nvfp4.py --model TheDrummer/Behemoth-X-123B-v2 --output ./TheDrummer/Behemoth-X-123B-v2 --size 1024 --seed 42 --ultra_chat 3 --c4_en 2 --fiction_v8 3
I had read in VLLM docs that NVFP4 quantization needs very few samples. In my experience 1024 is a good number for larger models.
Quantization Evals
| Metric | NVFP4 |
|---|---|
| ARC Challenge (Logic/Reasoning) | 0.6391 |
| IFEval (Strict Instruction Following) | 0.7431 |
| HellaSwag (Flow/Common Sense) | 0.6961 |
| Wikitext (Word Perplexity) | 4.5030 |
| Lambada (Perplexity) | 2.4814 |
| Winogrande | 0.8089 |
Bias, Risks, and Limitations
This is already a creative fine-tune. It was quantized with that usecase in mind. Probably not gonna pass any leet-coder challenges with this one.
How To Use
bash
vllm serve ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4 \
--tensor-parallel-size 1 \ # 1 GPU
--gpu-memory-utilization 0.8 \ # Else it will take it all for KV
- Downloads last month
- 9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for ealexeev/TheDrummer-Behemoth-X-123B-v2-NVFP4
Base model
mistralai/Mistral-Large-Instruct-2411 Finetuned
TheDrummer/Behemoth-X-123B-v2