ealexeev
/

Mistral-Small-24B-NVFP4

Text Generation

8-bit precision

compressed-tensors

Model card Files Files and versions

Mistral-Small-24B-NVFP4 / README.md

ealexeev's picture

Update README.md

071bfc1 verified 3 months ago

|

history blame contribute delete

999 Bytes

	---
	datasets:
	- HuggingFaceH4/ultrachat_200k
	base_model:
	- mistralai/Mistral-Small-24B-Instruct-2501
	pipeline_tag: text-generation
	tags:
	- mistral
	- quantization
	- vllm
	- nvfp4
	license: apache-2.0
	---
	# Model Card for ealexeev/Mistral-Small-24B-NVFP4

	## Model Description
	This is a compressed version of Mistral Small 24B Instruct, quantized to NVFP4 using `llm-compressor`.

	This model is optimized for NVIDIA Blackwell/Hopper GPUs (H100, B200) and vLLM.

	## Benchmarks (Run on DGX Spark)
	\| Metric \| Base Model (FP16) \| This Model (NVFP4) \| Delta \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| HellaSwag (Logic) \| 83.47% \| 83.20% \| -0.27% \|
	\| IFEval (Strict) \| 71.46% \| 70.50% \| -0.96% \|
	\| Throughput \| 712 tok/s \| 1344 tok/s \| +1.88x \|

	## Usage
	⚠️ This model requires vLLM. It will NOT work with GGUF/llama.cpp/Ollama.

	### vLLM Command
	```bash
	vllm serve ealexeev/Mistral-Small-24B-NVFP4 \
	--tensor-parallel-size 1 \
	--gpu-memory-utilization 0.8 \
	--enforce-eager