Model Overview

Model Architecture: DeepSeek-R1-0528
- Input: Text
- Output: Text
Supported Hardware Microarchitecture: AMD MI350/MI355
ROCm: 7.0
PyTorch: 2.8.0
Transformers: 4.53.0
Operating System(s): Linux
Inference Engine: SGLang/vLLM
Model Optimizer: AMD-Quark (V0.10)
- Weight quantization: OCP MXFP4, Static
- Activation quantization: OCP MXFP4, Dynamic
Calibration Dataset: Pile

This model was built with deepseek-ai DeepSeek-R1-0528 model by applying AMD-Quark for MXFP4 quantization.

Model Quantization

The model was quantized from deepseek-ai/DeepSeek-R1-0528 using AMD-Quark. Both weights and activations were quantized to MXFP4 format.

Preprocessing requirement:

Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16. You can either perform the dequantization manually using this conversion script, or use the pre-converted BFloat16 model available at amd/DeepSeek-R1-0528-BF16.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/
exclude_layers="*lm_head model.layers.61.*"
python3 quantize_quark.py --model_dir $MODEL_DIR \
                          --quant_scheme w_mxfp4_a_mxfp4 \
                          --group_size 32 \
                          --num_calib_data 128 \
                          --exclude_layers $exclude_layers \
                          --skip_evaluation \
                          --multi_gpu \
                          --model_export hf_format \
                          --output_dir amd/DeepSeek-R1-0528-MXFP4-V2

Deployment

This model can be deployed efficiently using the SGLang and vLLM backends.

Evaluation

The model was evaluated on AIME24, and GSM8K benchmarks using the lm-evaluation-harness framework.

Accuracy

Benchmark	DeepSeek-R1-0528-MXFP4-V2 (non MTP)	DeepSeek-R1-0528-MXFP4-V2 (MTP=3)
AIME24	80.00	83.33
GSM8K	95.00	95.30

Reproduction

The results of AIME24 and GSM8K, were obtained using forked lm-evaluation-harness.

Launch Server

#!/bin/bash
MODEL=/models/amd/DeepSeek-R1-0528-MXFP4-V2
LOG="sglang-serving.log"

SGLANG_AITER_MLA_PERSIST=1 \
python3 -m sglang.launch_server \
--model-path $MODEL \
--tensor-parallel-size 8 \
--trust-remote-code \
--chunked-prefill-size 131072 \
--host 0.0.0.0 \
--port 8321 \
--disable-radix-cache \
--mem-fraction-static 0.8 \
--max-running-requests 64 \
--attention-backend aiter 2>&1 | tee $LOG

AIME24

lm_eval --model local-completions \
    --model_args model=/models/amd/DeepSeek-R1-0528-MXFP4-V2,base_url=http://0.0.0.0:8321/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
    --tasks aime24 \
    --num_fewshot 0 \
    --gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
    --batch_size auto 2>&1 | tee aime24.log

GSM8K

lm_eval --model local-completions \
    --model_args model=/models/amd/DeepSeek-R1-0528-MXFP4-V2,base_url=http://0.0.0.0:8321/v1/completions,num_concurrent=256,max_retries=10,max_gen_toks=2048,tokenized_requests=False \
    --tasks gsm8k \
    --num_fewshot 5 \
    --batch_size auto 2>&1 | tee gsm8k.log

License

Downloads last month: 2

Safetensors

Model size

350B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ichbinblau/DeepSeek-R1-0528-MXFP4

Base model

deepseek-ai/DeepSeek-R1-0528

Finetuned

(60)

this model