haoyang-amd's picture
Create README.md
760133b verified
|
raw
history blame
2.16 kB
metadata
license: mit
base_model:
  - deepseek-ai/DeepSeek-R1-0528

Model Overview

  • Model Architecture: DeepSeek-R1-0528
    • Input: Text
    • Output: Text
  • Supported Hardware Microarchitecture: AMD MI350/MI355
  • ROCm: 7.0
  • Operating System(s): Linux
  • Inference Engine: SGLang/vLLM
  • Model Optimizer: AMD-Quark (V0.10)
    • Weight quantization: Perchannel, FP8E4M3, Static
    • Activation quantization: Pertoken, FP8E4M3, Dynamic
  • Calibration Dataset: Pile

This model was built with deepseek-ai DeepSeek-R1-0528 model by applying AMD-Quark for FP8E4M3 PTPC quantization.

Model Quantization

The model was quantized from deepseek-ai/DeepSeek-R1-0528 using AMD-Quark. The weights are quantized to FP8 and activations are quantized to FP8.

Preprocessing requirement:

Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16. You can either perform the dequantization manually using this conversion script, or use the pre-converted BFloat16 model available at unsloth/DeepSeek-R1-0528-BF16.

Quantization scripts:

cd Quark/examples/torch/language_modeling/llm_ptq/
python3 internal_scripts/quantize_quark.py \
    --model_dir deepseek-ai/DeepSeek-R1-0528-bf16 \
    --quant_scheme w_fp8_per_channel_static_a_fp8_per_token_dynamic \
    --exclude_layers "*lm_head" "*mlp.gate" \
    --num_calib_data 128 \
    --output_dir DeepSeek-R1-0528-ptpc \
    --model_export hf_format

Deployment

This model can be deployed efficiently using the vLLM backends.

License

Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.