Instructions to use deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

SGLang

How to use deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with Docker Model Runner:
```
docker model run hf.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
```

System Prompt

by Wanfq - opened Jan 20, 2025

Discussion

Wanfq

Jan 20, 2025

We have tested the system prompt with temperature of 0.7.

You are a helpful and harmless assistant. You should think step-by-step.

Here are the evaluation results.

Models	AIME24	MATH500	GSM8K	GPQA-Diamond	ARC-Challenge	MMLU-Pro	MMLU	LiveCodeBench
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	46.67	88.20	-	57.58	-	-	-	-

More evaluation results can be found at https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview

MonolithFoundation

Jan 21, 2025

The result seems biased.

Wanfq

Jan 21, 2025

I'm also confused to get this much lower results compared to their reported, especially on AIME24...

MonolithFoundation

Jan 21, 2025

Wanfq

Jan 21, 2025

We have tested the system prompt with temperature of 0.7.
You are a helpful and harmless assistant. You should think step-by-step.
Here are the evaluation results.

Models AIME24 MATH500 GSM8K GPQA-Diamond ARC-Challenge MMLU-Pro MMLU LiveCodeBench

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 46.67 88.20 - 57.58 - - - -

More evaluation results can be found at https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview

The evaluation code is modified from SkyThought. In our evaluation, we set the temperature to 0.7 and the max_tokens to 32768. We provide the example to reproduce our results in evaluation.

The system prompt for evaluation is set to:

You are a helpful and harmless assistant. You should think step-by-step.

We are currently attempting to reproduce the results reported in the DeepSeek-R1 paper by experimenting with different system prompts. We will update our findings once we have acquired the original system prompt used in their study.

The updated evaluation results are presented here:

Models	AIME24	MATH500	GSM8K	GPQA-Diamond	ARC-Challenge	MMLU-Pro	MMLU	LiveCodeBench
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B	46.67	88.20	93.71	57.58	95.90	68.70	82.17	59.69

More evaluation results can be found at https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview

Bennixzp2024

Jan 21, 2025

•

edited Jan 21, 2025

如何评价

guoday

Jan 21, 2025

Please kindly refer to the following link:
https://github.com/deepseek-ai/DeepSeek-R1#usage-recommendations

x5wow

Jan 22, 2025

Покажи пример Telegram bot, python aiogram принимающий видео от юзера, обрабатывающий его и отправляющий обратно. База данных mongodb

x5wow

Jan 22, 2025

Сделай

urtuuuu

Jan 22, 2025

Can you confirm that this model can't answer the coding question which even standard qwen-7b-instruct answers?

Explain the bug in the following code:

from time import sleep
from multiprocessing.pool import ThreadPool
 
def task():
    sleep(1)
    return 'all done'

if __name__ == '__main__':
    with ThreadPool() as pool:
        result = pool.apply_async(task())
        value = result.get()
        print(value)

After long thinking it always answers that there is no bug. But the bug is in result = pool.apply_async(task). Almost all recent models of similar size answer it easily.

Wanfq

Jan 23, 2025

•

edited Jan 23, 2025

We find the evaluation results for math and code are not correct in our current version. To address this issue, we use the code from Qwen2.5-Math and Qwen2.5-Coder for math and code evaluation. With this approach, we have successfully reproduced the results reported in the DeepSeek-R1 paper.

We have finished all the evaluation and updated the results here:

The reproduce details can be found in our blog: https://huggingface.co/blog/Wanfq/fuseo1-preview

We also provide the code in our github repo: https://github.com/fanqiwan/FuseAI/tree/main/FuseO1-Preview

Our models are in : https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977

Have fun!

haili-tian

Feb 5, 2025

They do not recommend use system_prompt

mathcrazyy

Feb 6, 2025

•

edited Feb 6, 2025

i use this code: https://github.com/TIGER-AI-Lab/CritiqueFineTuning, and get the result below:

mathcrazyy

Feb 8, 2025

•

edited Feb 8, 2025

prompt:
"Please reason step by step, and put your final answer within \boxed{{}}.(Don't make your reasoning too long)\nUser: {input}\nAssistant: 《think》\n"
(把《》替换为<>，我这里使用会被吞掉)
max_lenght=16384

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment