Instructions to use amd/DeepSeek-OCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amd/DeepSeek-OCR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="amd/DeepSeek-OCR", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("amd/DeepSeek-OCR", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use amd/DeepSeek-OCR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "amd/DeepSeek-OCR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amd/DeepSeek-OCR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/amd/DeepSeek-OCR
- SGLang
How to use amd/DeepSeek-OCR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "amd/DeepSeek-OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amd/DeepSeek-OCR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "amd/DeepSeek-OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amd/DeepSeek-OCR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use amd/DeepSeek-OCR with Docker Model Runner:
docker model run hf.co/amd/DeepSeek-OCR
Model Overview
- Model Architecture: DeepSeek-OCR
- Input: Image/Text
- Output: Text
- Supported Hardware Microarchitecture: AMD MI300/MI350/MI355
- ROCm: 7.1.0
- PyTorch: 2.8.0
- Transformers: 4.57.3
- Operating System(s): Linux
Model Details
The official version of deepseek-ai/DeepSeek-OCR restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the modeling_deepseekocr.py file has been updated for improved usability, and modeling_deepseekv2.py has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers.
This model can be quantized by using AMD-Quark, and the resulting quantized model is available at amd/DeepSeek-OCR-MXFP4.
Usage
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["HIP_VISIBLE_DEVICES"] = '0'
model_name = 'amd/DeepSeek-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)
# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'
# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
- Downloads last month
- 14
docker model run hf.co/amd/DeepSeek-OCR