DeepSeek-R1-Distill-Llama-70B OpenVINO INT4

This repository contains an unofficial OpenVINO™ IR conversion of deepseek-ai/DeepSeek-R1-Distill-Llama-70B with INT4 weight compression.

The model was converted using Optimum Intel and is intended for local inference with OpenVINO. For generative inference, this repository also includes an OpenVINO GenAI example, which is the recommended runtime path for getting strong performance from OpenVINO-converted large language models on Intel hardware.

Original model

  • Original model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
  • Original creator: DeepSeek
  • Base lineage: distilled from DeepSeek-R1 and derived from Llama 3.3 70B Instruct
  • Converted format: OpenVINO IR
  • Weight format: INT4
  • Task: text generation / reasoning

This is an unofficial converted model repository. It is not an official DeepSeek, Meta, or OpenVINO release.

License and publishability

The original deepseek-ai/DeepSeek-R1-Distill-Llama-70B model card states that the code repository and model weights are licensed under the MIT License. It also states that the DeepSeek-R1 series supports commercial use, modifications, and derivative works.

The original model card further notes that DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct, which was originally licensed under the Llama 3.3 license.

This OpenVINO INT4 conversion is published under the same MIT license as the DeepSeek model repository. The original MIT LICENSE file is included in this repository for attribution and compliance.

Please refer to the original model card for full model details, intended use, safety notes, license terms, and limitations.

Model summary

DeepSeek-R1-Distill-Llama-70B is a distilled reasoning model from the DeepSeek-R1 family. It is based on Llama 3.3 70B Instruct and distilled from DeepSeek-R1 outputs to provide strong reasoning capability in an open model.

This OpenVINO version is designed for local inference on Intel hardware using OpenVINO and OpenVINO GenAI.

Conversion

This model was converted with Optimum Intel using the OpenVINO export path.

optimum-cli export openvino \
  --model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
  --weight-format int4 \
  --group-size 128 \
  --ratio 1.0 \
  ov_DeepSeek-R1-Distill-Llama-70B_int4
Quantization
Weight format: INT4
Group size: 128
Ratio: 1.0
Export tool: Optimum Intel
Compression backend: NNCF through Optimum Intel
Runtime format: OpenVINO IR

INT4 compression is intended to reduce model size and memory usage compared with higher precision weights. As with any converted and quantized model, output quality and numerical behavior may differ from the original model and should be validated for your use case.
Test with Optimum Intel first

Run from inside the model directory:

python examples/test_deepseek70b_llama_ov_optimum.py \
  --model-dir . \
  --device CPU \
  --max-new-tokens 128 \
  --prompt "Explain OpenVINO in one short paragraph."
Test with OpenVINO GenAI

OpenVINO GenAI provides a clean runtime path for generative inference with OpenVINO-converted models.

Run from inside the model directory:

python examples/test_deepseek70b_llama_ov_genai.py \
  --model-dir . \
  --device CPU \
  --max-new-tokens 64 \
  --prompt "Explain OpenVINO in one short paragraph."
For GPU

First confirm that OpenVINO detects GPU devices. The included OpenVINO GenAI script prints all available OpenVINO devices.

Then run:

python examples/test_deepseek70b_llama_ov_genai.py \
  --model-dir . \
  --device GPU.0 \
  --max-new-tokens 64 \
  --prompt "Explain OpenVINO in one short paragraph."
Here is my full setup and export flow:

cd ~

python3.11 -m venv deepseek70b_llama_ov_env
source ~/deepseek70b_llama_ov_env/bin/activate

python -m pip install --upgrade pip setuptools wheel

pip install -U \
  "openvino>=2025.1.0" \
  "optimum-intel[openvino]>=1.22.0" \
  "nncf>=2.14.0" \
  "transformers>=4.48.0" \
  "accelerate" \
  "safetensors" \
  "huggingface_hub" \
  "sentencepiece" \
  "protobuf"

cd ~/ov_models

MODEL_ID="deepseek-ai/DeepSeek-R1-Distill-Llama-70B"
OUT_DIR="ov_DeepSeek-R1-Distill-Llama-70B_int4"

mkdir -p export_logs

optimum-cli export openvino \
  --model "$MODEL_ID" \
  --weight-format int4 \
  --group-size 128 \
  --ratio 1.0 \
  "$OUT_DIR" \
  2>&1 | tee export_logs/deepseek_r1_distill_llama_70b_int4_export.log
Notes
This model is text-only.
This repository uses both Optimum Intel and OpenVINO GenAI examples.
The Optimum Intel path is useful for validating the exported model with Transformers-style APIs.
The OpenVINO GenAI path is recommended for generative inference with OpenVINO-converted models.
OpenVINO Model Server compatibility is not claimed unless separately validated.
Limitations

This repository inherits the limitations of the original deepseek-ai/DeepSeek-R1-Distill-Llama-70B model. Additional differences may arise from OpenVINO conversion, INT4 compression, runtime package versions, and generation configuration.

Attribution

This is an unofficial OpenVINO conversion of the original DeepSeek model. All rights to the original model, training, and licensing remain with the original authors.
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Morteza89/DeepSeek-R1-Distill-Llama-70B-int4-ov

Finetuned
(21)
this model