Instructions to use deepseek-ai/DeepSeek-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepseek-ai/DeepSeek-R1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use deepseek-ai/DeepSeek-R1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deepseek-ai/DeepSeek-R1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/DeepSeek-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/deepseek-ai/DeepSeek-R1
- SGLang
How to use deepseek-ai/DeepSeek-R1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "deepseek-ai/DeepSeek-R1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/DeepSeek-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "deepseek-ai/DeepSeek-R1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/DeepSeek-R1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use deepseek-ai/DeepSeek-R1 with Docker Model Runner:
docker model run hf.co/deepseek-ai/DeepSeek-R1
Thoughts on deepseek-r1. Correct me if I'm wrong
Both reputable and clickbait-driven news outlets are missing key technical details regarding deepseek-r1
Sensationalist news will capitalize on this, enhancing emotional responses and market volatility.
It's built on top of deepseek-V3, a 671B parameter model. GPT-3 is 175B, GPT-4 is 1.76B parameters.
To train an LLM is different than to use an LLM. Training 671B still requires datacenter grade compute.
Using an LLM can be via web (most people) or running in your own machine (physical or cloud, constraint is usually how much ram the system has).
Via web, backend compute required for uptime and low latency can be greatly reduced.
In your own machine, either cloud (eg.: hosting your own LLM for your company) or physical (eg.: 0 cost code assistant AI running locally), allow for reduced costs of cloud compute, and possibility to run better-performing/smarter LLM in the same limited amount of ram.
Only model weights and model's white paper were published. Training code, architecture, and datasets was not, therefore it shouldn't be called open source.
If (and most likely when) reproduced successfully, deepseek-r1 will allow for LLM implementation to become even more ubiquitous due to freed up compute that will allow for more processing to be done.
Deepseek is not going to crash global demand for AI chips. It was perhaps more of a well timed ice bath to US recent announcements involving AI, and there is an argument being pushed that deepseek-r1 results were falsified and that it was actually trained in US Chips in breach of the US's embargo on exportation of top-grade ai chips.
Update note -
the github repo shared in the link below is actually an opensource / contributor driven effort to reconstruct a trainer codebase that can be used to train using the deepseek model architecture. So not code from deepseek founding team itself.
Deepseek R1 - if the code available within the repo is assumed to be definitive and complete model code , its quite a simple iteration on Deepseek V2. But confirming that the code is complete is going to be pain. From the the way the repo is prepared its my assumption that the team behind intends to prevent complete replication from scratch and rather have you dependent on their final model files and may support fine tuning using additional data.
Early observations ---
It seems code for deepseek-R1 is indeep open source though some of it is hidden within those 200+ model files within huggingface repo i.e. this one.
rest code seems to be here https://github.com/huggingface/open-r1 . I am still trying to wrap my head around these two topics.
Not sure why or what for such obscured presentation of a opensource repo. If you wish to train your own from scratch you will need to aggregate your own data which makes sense considering all the copyright issues that might follow.
since the core codebasse seems to be actually open sourced between these two repos it might make sense to fine tune the model file in absence or large scale compute rather than train one from scratch.