File size: 4,329 Bytes
cd6d224 3169f14 cd6d224 fdfe6e3 3169f14 cd6d224 5d5854a cd6d224 becd6ee cd6d224 b422366 cd6d224 913fc83 cd6d224 fdfe6e3 11b5d19 913fc83 11b5d19 5d44488 11b5d19 07dfa3d 913fc83 11b5d19 07dfa3d 162ba26 cb4f49f 162ba26 913fc83 162ba26 11b5d19 cd6d224 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | ---
license: mit
base_model:
- deepseek-ai/DeepSeek-R1-0528
---
# Model Overview
- **Model Architecture:** DeepSeek-R1-0528
- **Input:** Text
- **Output:** Text
- **Supported Hardware Microarchitecture:** AMD MI350/MI355
- **ROCm**: 7.0
- **PyTorch**: 2.8.0
- **Transformers**: 4.53.0
- **Operating System(s):** Linux
- **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10)
- **Weight quantization:** OCP MXFP4, Static
- **Activation quantization:** OCP MXFP4, Dynamic
- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
This model was built with deepseek-ai DeepSeek-R1-0528 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
# Model Quantization
The model was quantized from [deepseek-ai/DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Both weights and activations were quantized to MXFP4 format.
**Preprocessing requirement:**
Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16.
You can either perform the dequantization manually using this [conversion script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py), or use the pre-converted BFloat16 model available at [amd/DeepSeek-R1-0528-BF16](https://huggingface.co/amd/DeepSeek-R1-0528-BF16).
**Quantization scripts:**
```
cd Quark/examples/torch/language_modeling/llm_ptq/
exclude_layers="*self_attn* *mlp.gate.* *lm_head model.layers.61.*"
python3 quantize_quark.py --model_dir $MODEL_DIR \
--quant_scheme w_mxfp4_a_mxfp4 \
--group_size 32 \
--num_calib_data 128 \
--exclude_layers $exclude_layers \
--skip_evaluation \
--multi_gpu \
--model_export hf_format \
--output_dir amd/DeepSeek-R1-0528-MXFP4
```
# Deployment
This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai/) and [vLLM](https://docs.vllm.ai/en/latest/) backends.
## Evaluation
The model was evaluated on AIME24, GPQA Diamond, and MATH-500 benchmarks using the [lighteval](https://github.com/huggingface/lighteval/tree/v0.10.0) framework. Each benchmark was run 10 times with different random seeds for reliable performance estimation.
### Accuracy
<table>
<tr>
<td><strong>Benchmark</strong>
</td>
<td><strong>DeepSeek-R1-0528 </strong>
</td>
<td><strong>DeepSeek-R1-0528-MXFP4 (this model)</strong>
</td>
<td><strong>Recovery</strong>
</td>
</tr>
<tr>
<td>AIME24
</td>
<td>88.00
</td>
<td>85.00
</td>
<td>96.59%
</td>
</tr>
<tr>
<td>GPQA Diamond
</td>
<td>79.90
</td>
<td>79.34
</td>
<td>99.31%
</td>
</tr>
<tr>
<td>MATH-500
</td>
<td>97.06
</td>
<td>97.84
</td>
<td>100.80%
</td>
</tr>
</table>
### Reproduction
The results of AIME24, MATH-500, and GPQA Diamond, were obtained using forked [lighteval](https://github.com/zhaolin-amd/lighteval/tree/v0.10-release-custom) and vLLM docker (emulation qdq) `rocm/vllm-private:pytorch-vllm-gfx950-mxfp4-mxfp6-v3`.
```
# Set docker env
export VLLM_QUARK_F4F6_OFFLINE_DEQUANT_TMPENVVAR=1
# Set model args
OUTPUT_DIR="results/DeepSeek-R1-0528-MXFP4-Seed"
LOG="logs/deepseek_0528_maxfp4.log"
# Evaluating 10 rounds
for i in $(seq 1 10); do
# seed in [0, 2**30 - 1]
SEED=$(shuf -i 0-1073741823 -n 1)
MODEL_ARGS="model_name=amd/DeepSeek-R1-0528-MXFP4,dtype=bfloat16,tensor_parallel_size=8,max_model_length=71536,max_num_batched_tokens=32768,gpu_memory_utilization=0.85,generation_parameters={max_new_tokens:65536,temperature:0.6,top_p:0.95,seed:$SEED}"
lighteval vllm $MODEL_ARGS "custom|aime24_single|0|0,custom|math_500_single|0|0,custom|gpqa:diamond_single|0|0" \
--use-chat-template \
--output-dir "$OUTPUT_DIR/seed_$SEED" \
2>&1 | tee -a "$LOG"
```
# License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved. |