amd
/

DeepSeek-R1-0528-BF16

Text Generation

text-generation-inference

Model card Files Files and versions

DeepSeek-R1-0528-BF16 / README.md

bowenbaoamd's picture

Update README.md

159dc7b verified about 2 months ago

|

history blame contribute delete

2.92 kB

	---
	base_model:
	- unsloth/DeepSeek-R1-0528-BF16
	language:
	- en
	library_name: transformers
	license: mit
	---

	# Model Overview

	- Model Architecture: DeepSeek-R1-0528
	- Input: Text
	- Output: Text
	- Supported Hardware Microarchitecture: AMD MI350/MI355
	- ROCm: 7.0
	- PyTorch: 2.8.0
	- Transformers: 4.56.1
	- Operating System(s): Linux
	- Inference Engine: [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)

	# Model Details

	In the original `modeling_deepseek.py` file from the `unsloth/DeepSeek-R1-0528-BF16` model, there is no definition or implementation of the MTP (Multi-Token-Predictor) layer. As a result, when you load the original model, there is no MTP layer included, and MTP-specific quantization cannot be performed.

	To enable MTP layer loading and quantization, this model is adapted from `unsloth/DeepSeek-R1-0528-BF16` by adding an MTP layer in the `modeling_deepseek.py` file. With this modification, it is possible to use [AMD-Quark](https://quark.docs.amd.com/latest/index.html) to quantize the DeepSeek-R1-0528 model with the MTP layer included.

	Important Notes:
	- When loading this model, you must set `trust_remote_code=True` to ensure that changes related to the MTP layer in `modeling_deepseek.py` take effect.
	- After loading this model with `transformers`, evaluation should NOT be performed directly. The reason is that the forward function for the added MTP layer in `modeling_deepseek.py` is implemented only for calibration during the quantization process, so computation is not guaranteed to be the same as the original DeepSeek-R1-0528.
	- Therefore, when quantizing with AMD-Quark, you must add the `--skip_evaluation` option to skip the evaluation step and only perform quantization.
	- To skip quantization for the MTP layers, set `exclude_layers="lm_head self_attn mlp.gate eh_proj shared_head.head model.layers.61."`.

	Below is an example of how to quantize this model:

	```bash
	cd Quark/examples/torch/language_modeling/llm_ptq/
	exclude_layers="lm_head self_attn mlp.gate eh_proj *shared_head.head"
	python3 quantize_quark.py --model_dir $MODEL_DIR \
	--quant_scheme w_mxfp4_a_mxfp4 \
	--num_calib_data 32 \
	--output_dir $output_dir \
	--exclude_layers $exclude_layers \
	--dataset pileval \
	--multi_gpu \
	--model_export hf_format \
	--trust_remote_code \
	--skip_evaluation \
	--seq_len 512
	```
	For further details or issues, please refer to the [AMD-Quark](https://quark.docs.amd.com/latest/index.html) documentation or contact the respective developers.

	# License
	Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.