ealexeev's picture
Update README.md
071bfc1 verified
---
datasets:
- HuggingFaceH4/ultrachat_200k
base_model:
- mistralai/Mistral-Small-24B-Instruct-2501
pipeline_tag: text-generation
tags:
- mistral
- quantization
- vllm
- nvfp4
license: apache-2.0
---
# Model Card for ealexeev/Mistral-Small-24B-NVFP4
## Model Description
This is a compressed version of Mistral Small 24B Instruct, quantized to **NVFP4** using `llm-compressor`.
**This model is optimized for NVIDIA Blackwell/Hopper GPUs (H100, B200) and vLLM.**
## Benchmarks (Run on DGX Spark)
| Metric | Base Model (FP16) | This Model (NVFP4) | Delta |
| :--- | :--- | :--- | :--- |
| **HellaSwag (Logic)** | 83.47% | 83.20% | -0.27% |
| **IFEval (Strict)** | 71.46% | 70.50% | -0.96% |
| **Throughput** | 712 tok/s | 1344 tok/s | **+1.88x** |
## Usage
**⚠️ This model requires vLLM. It will NOT work with GGUF/llama.cpp/Ollama.**
### vLLM Command
```bash
vllm serve ealexeev/Mistral-Small-24B-NVFP4 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.8 \
--enforce-eager