Instructions to use deepseek-ai/DeepSeek-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepseek-ai/DeepSeek-R1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use deepseek-ai/DeepSeek-R1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepseek-ai/DeepSeek-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepseek-ai/DeepSeek-R1

SGLang

How to use deepseek-ai/DeepSeek-R1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-R1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepseek-ai/DeepSeek-R1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepseek-ai/DeepSeek-R1 with Docker Model Runner:
```
docker model run hf.co/deepseek-ai/DeepSeek-R1
```

Thoughts on deepseek-r1. Correct me if I'm wrong

#69

by pkms - opened Jan 28, 2025

Discussion

pkms

Jan 28, 2025

Both reputable and clickbait-driven news outlets are missing key technical details regarding deepseek-r1

Sensationalist news will capitalize on this, enhancing emotional responses and market volatility.

It's built on top of deepseek-V3, a 671B parameter model. GPT-3 is 175B, GPT-4 is 1.76B parameters.

To train an LLM is different than to use an LLM. Training 671B still requires datacenter grade compute.

Using an LLM can be via web (most people) or running in your own machine (physical or cloud, constraint is usually how much ram the system has).

Via web, backend compute required for uptime and low latency can be greatly reduced.

In your own machine, either cloud (eg.: hosting your own LLM for your company) or physical (eg.: 0 cost code assistant AI running locally), allow for reduced costs of cloud compute, and possibility to run better-performing/smarter LLM in the same limited amount of ram.

Only model weights and model's white paper were published. Training code, architecture, and datasets was not, therefore it shouldn't be called open source.

If (and most likely when) reproduced successfully, deepseek-r1 will allow for LLM implementation to become even more ubiquitous due to freed up compute that will allow for more processing to be done.

Deepseek is not going to crash global demand for AI chips. It was perhaps more of a well timed ice bath to US recent announcements involving AI, and there is an argument being pushed that deepseek-r1 results were falsified and that it was actually trained in US Chips in breach of the US's embargo on exportation of top-grade ai chips.

AinoSoft

Jan 28, 2025

•

edited Jan 28, 2025

Update note -
the github repo shared in the link below is actually an opensource / contributor driven effort to reconstruct a trainer codebase that can be used to train using the deepseek model architecture. So not code from deepseek founding team itself.

Deepseek R1 - if the code available within the repo is assumed to be definitive and complete model code , its quite a simple iteration on Deepseek V2. But confirming that the code is complete is going to be pain. From the the way the repo is prepared its my assumption that the team behind intends to prevent complete replication from scratch and rather have you dependent on their final model files and may support fine tuning using additional data.

Early observations ---
It seems code for deepseek-R1 is indeep open source though some of it is hidden within those 200+ model files within huggingface repo i.e. this one.

rest code seems to be here https://github.com/huggingface/open-r1 . I am still trying to wrap my head around these two topics.

Not sure why or what for such obscured presentation of a opensource repo. If you wish to train your own from scratch you will need to aggregate your own data which makes sense considering all the copyright issues that might follow.

since the core codebasse seems to be actually open sourced between these two repos it might make sense to fine tune the model file in absence or large scale compute rather than train one from scratch.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment