haoyang-amd commited on
Commit
760133b
·
verified ·
1 Parent(s): ac140e0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1-0528
5
+ ---
6
+
7
+
8
+ # Model Overview
9
+
10
+ - **Model Architecture:** DeepSeek-R1-0528
11
+ - **Input:** Text
12
+ - **Output:** Text
13
+ - **Supported Hardware Microarchitecture:** AMD MI350/MI355
14
+ - **ROCm**: 7.0
15
+ - **Operating System(s):** Linux
16
+ - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
17
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10)
18
+ - **Weight quantization:** Perchannel, FP8E4M3, Static
19
+ - **Activation quantization:** Pertoken, FP8E4M3, Dynamic
20
+ - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
21
+
22
+ This model was built with deepseek-ai DeepSeek-R1-0528 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for FP8E4M3 PTPC quantization.
23
+
24
+ # Model Quantization
25
+
26
+ The model was quantized from [deepseek-ai/DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to FP8 and activations are quantized to FP8.
27
+
28
+ **Preprocessing requirement:**
29
+
30
+ Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16.
31
+ You can either perform the dequantization manually using this [conversion script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py), or use the pre-converted BFloat16 model available at [unsloth/DeepSeek-R1-0528-BF16](https://huggingface.co/unsloth/DeepSeek-R1-0528-BF16).
32
+
33
+
34
+ **Quantization scripts:**
35
+ ```
36
+ cd Quark/examples/torch/language_modeling/llm_ptq/
37
+ python3 internal_scripts/quantize_quark.py \
38
+ --model_dir deepseek-ai/DeepSeek-R1-0528-bf16 \
39
+ --quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
40
+ --exclude_layers "*lm_head" "*mlp.gate" \
41
+ --num_calib_data 128 \
42
+ --output_dir DeepSeek-R1-0528-ptpc \
43
+ --model_export hf_format
44
+ ```
45
+
46
+ # Deployment
47
+
48
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backends.
49
+
50
+ # License
51
+ Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.