File size: 1,131 Bytes

4e070ce

---
license: apache-2.0
---

## Deepseek-R1-W4AFP8

## Model Overview
- **Model Architecture:** DeepseekV3ForCausalLM
  - **Input:** Text
  - **Output:** Text
- **Model Optimizations:**
  - **Dense Weight quantization:** FP8
  - **MOE Weight quantization:** INT4
  - **Activation quantization:** FP8
- **Release Date:** 25/10/2025
- **Version:** 1.0

Quantized version of [deepseek-ai/Deepseek-R1-W4AFP8](https://huggingface.co/deepseek-ai/Deepseek-R1-W4AFP8)

| Model| MMLU |
|-------|-------|
| novita/Deepseek-R1-W4AFP8 | 0.8705 | 


### Model Optimizations
These models were obtained by quantizing the weights and activations of DeepSeek models to mixed-precision data types (W4(int)A(FP)8 for MoE layers and FP8 for dense layers).
This optimization reduces the number of bits per parameter 4/8, significantly reducing GPU memory requirements.

## Use with SGLANG
This model can be deployed efficiently using the SGLANG backend with only H200x4, as shown in the example below.
```bash
python -m sglang.launch_server --model novita/Deepseek-R1-W4AFP8  --mem-fraction-static 0.85 --disable-shared-experts-fusion --tp-size 4
```