DeepSeek-R1-Distill-Qwen-32B OpenVINO INT4

This repository contains an unofficial OpenVINO™ IR conversion of deepseek-ai/DeepSeek-R1-Distill-Qwen-32B with INT4 weight compression.

The model was converted using Optimum Intel and is intended for local inference with OpenVINO. For generative inference, this repository also includes an OpenVINO GenAI example, which is the preferred runtime path for getting strong performance from OpenVINO-converted large language models on Intel hardware.

Original model

Original model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Original creator: DeepSeek
Architecture family: Qwen-based DeepSeek-R1 distilled reasoning model
Converted format: OpenVINO IR
Weight format: INT4
Task: text generation / reasoning

This is an unofficial converted model repository. It is not an official DeepSeek or OpenVINO release.

License and publishability

The original deepseek-ai/DeepSeek-R1-Distill-Qwen-32B model card states that both the code repository and model weights are licensed under the MIT License. It also states that the DeepSeek-R1 series supports commercial use and allows modifications and derivative works, including distillation.

Because of that, this OpenVINO INT4 conversion can be published publicly under the same MIT license, provided the original license notice is preserved. This repository includes the original MIT LICENSE file for attribution and compliance.

Please refer to the original model card for full model details, intended use, safety notes, license terms, and limitations.

Model summary

DeepSeek-R1-Distill-Qwen-32B is a distilled reasoning model based on Qwen. It belongs to the DeepSeek-R1 family, which was released to provide strong reasoning capabilities in smaller distilled models.

This OpenVINO version is designed for local inference on Intel hardware using OpenVINO and OpenVINO GenAI.

Conversion

This model was converted with Optimum Intel using the OpenVINO export path.

optimum-cli export openvino \
  --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
  --weight-format int4 \
  --group-size 128 \
  --ratio 1.0 \
  ov_DeepSeek-R1-Distill-Qwen-32B_int4
Quantization
Weight format: INT4
Group size: 128
Ratio: 1.0
Export tool: Optimum Intel
Compression backend: NNCF through Optimum Intel
Runtime format: OpenVINO IR

INT4 compression is intended to reduce model size and memory usage compared with higher precision weights. As with any converted and quantized model, output quality and numerical behavior may differ from the original model and should be validated for your use case.
Installation
pip install -r examples/requirements.txt
6. Test with Optimum Intel first

Create or use the included script:

python examples/test_deepseek32b_ov_optimum.py \
  --model-dir . \
  --device CPU \
  --max-new-tokens 128 \
  --prompt "Explain OpenVINO in one short paragraph."
7. Test with OpenVINO GenAI

OpenVINO GenAI provides a clean runtime path for generative inference with OpenVINO-converted models.

Run from inside the model directory:

python examples/test_deepseek32b_ov_genai.py \
  --model-dir . \
  --device CPU \
  --max-new-tokens 128 \
  --prompt "Explain OpenVINO in one short paragraph."
If CPU works, then try GPU

First check that OpenVINO detects GPU devices. The included GenAI script prints available OpenVINO devices.

Then run:

python examples/test_deepseek32b_ov_genai.py \
  --model-dir . \
  --device GPU.0 \
  --max-new-tokens 64 \
  --prompt "Explain OpenVINO in one short paragraph."
Notes
This model is text-only.
This repository uses both Optimum Intel and OpenVINO GenAI examples.
The Optimum Intel path is useful for validating the exported model with Transformers-style APIs.
The OpenVINO GenAI path is recommended for generative inference with OpenVINO-converted models.
OpenVINO Model Server compatibility is not claimed unless separately validated.
Limitations

This repository inherits the limitations of the original deepseek-ai/DeepSeek-R1-Distill-Qwen-32B model. Additional differences may arise from OpenVINO conversion, INT4 compression, runtime package versions, and generation configuration.

Attribution

This is an unofficial OpenVINO conversion of the original DeepSeek model. All rights to the original model, training, and licensing remain with the original authors.

Here is how I converted the original model to OV:

cd ~

python3.11 -m venv deepseek32b_ov_env
source ~/deepseek32b_ov_env/bin/activate

python -m pip install --upgrade pip setuptools wheel

pip install -U \
  "openvino>=2025.1.0" \
  "optimum-intel[openvino]>=1.22.0" \
  "nncf>=2.14.0" \
  "transformers>=4.48.0" \
  "accelerate" \
  "safetensors" \
  "huggingface_hub" \
  "sentencepiece" \
  "protobuf"

cd ~/ov_models

MODEL_ID="deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
OUT_DIR="ov_DeepSeek-R1-Distill-Qwen-32B_int4"

mkdir -p export_logs

optimum-cli export openvino \
  --model "$MODEL_ID" \
  --weight-format int4 \
  --group-size 128 \
  --ratio 1.0 \
  "$OUT_DIR" \
  2>&1 | tee export_logs/deepseek_r1_distill_qwen_32b_int4_export.log

Downloads last month: 9

Model tree for Morteza89/DeepSeek-R1-Distill-Qwen-32B-int4-ov

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Finetuned

(88)

this model