amd
/

DeepSeek-R1-0528-ptpc

Model card Files Files and versions

haoyang-amd commited on Nov 7, 2025

Commit

760133b

·

verified ·

1 Parent(s): ac140e0

Create README.md

Files changed (1) hide show

README.md +51 -0

README.md ADDED Viewed

	@@ -0,0 +1,51 @@

+---
+license: mit
+base_model:
+- deepseek-ai/DeepSeek-R1-0528
+---
+# Model Overview
+- **Model Architecture:** DeepSeek-R1-0528
+  - **Input:** Text
+  - **Output:** Text
+- **Supported Hardware Microarchitecture:** AMD MI350/MI355
+- **ROCm**: 7.0
+- **Operating System(s):** Linux
+- **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
+- **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.10)
+  - **Weight quantization:** Perchannel, FP8E4M3, Static
+  - **Activation quantization:** Pertoken, FP8E4M3, Dynamic
+- **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
+This model was built with deepseek-ai DeepSeek-R1-0528 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for FP8E4M3 PTPC quantization.
+# Model Quantization
+The model was quantized from [deepseek-ai/DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to FP8 and activations are quantized to FP8.
+**Preprocessing requirement:**
+Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16.
+You can either perform the dequantization manually using this [conversion script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py), or use the pre-converted BFloat16 model available at [unsloth/DeepSeek-R1-0528-BF16](https://huggingface.co/unsloth/DeepSeek-R1-0528-BF16).
+**Quantization scripts:**
+```
+cd Quark/examples/torch/language_modeling/llm_ptq/
+python3 internal_scripts/quantize_quark.py \
+    --model_dir deepseek-ai/DeepSeek-R1-0528-bf16 \
+    --quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
+    --exclude_layers "*lm_head" "*mlp.gate" \
+    --num_calib_data 128 \
+    --output_dir DeepSeek-R1-0528-ptpc \
+    --model_export hf_format
+```
+# Deployment
+This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backends.
+# License
+Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.