Image-Text-to-Text
Transformers
Safetensors
multilingual
deepseek_vl_v2
feature-extraction
deepseek
vision-language
ocr
custom_code
mps
cpu
metal
Instructions to use Dogacel/Universal-DeepSeek-OCR-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Dogacel/Universal-DeepSeek-OCR-2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Dogacel/Universal-DeepSeek-OCR-2", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Dogacel/Universal-DeepSeek-OCR-2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Dogacel/Universal-DeepSeek-OCR-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Dogacel/Universal-DeepSeek-OCR-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dogacel/Universal-DeepSeek-OCR-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Dogacel/Universal-DeepSeek-OCR-2
- SGLang
How to use Dogacel/Universal-DeepSeek-OCR-2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Dogacel/Universal-DeepSeek-OCR-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dogacel/Universal-DeepSeek-OCR-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Dogacel/Universal-DeepSeek-OCR-2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Dogacel/Universal-DeepSeek-OCR-2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Dogacel/Universal-DeepSeek-OCR-2 with Docker Model Runner:
docker model run hf.co/Dogacel/Universal-DeepSeek-OCR-2
Universal DeepSeek-OCR 2 – CPU, MPS, CUDA Support
This repository uses the weights from the original DeepSeek-OCR 2 and modifies model to support inference on different devices such as CPU and MPS (Apple Metal GPU). By default runs on CPU.
Explore more human-like visual encoding.
Usage
Sample code available at: https://github.com/Dogacel/Universal-DeepSeek-OCR-2
mamba create -n deepseek-ocr-2 python=3.12.9
mamba activate deepseek-ocr-2
pip install torch==2.6.0 torchvision Pillow transformers==4.46.3 tokenizers==0.20.3 einops addict easydict
from transformers import AutoModel, AutoTokenizer
import torch
model_name = '.'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True, use_safetensors=True)
model = model.eval().to("cpu").to(torch.float16)
# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'samples/paper.png'
output_path = 'tmp'
res = model.infer(
tokenizer,
prompt=prompt,
image_file=image_file,
output_path = output_path,
base_size = 1024,
image_size = 768,
crop_mode = True,
save_results = True,
test_compress = True,
)
To change device type, you should update two things,
- model.eval().to("cpu").to(torch.float16)
+ model = model.eval().to("mps").to(torch.float16)
res = model.infer(
tokenizer,
prompt=prompt,
image_file=image_file,
output_path = output_path,
base_size = 1024,
image_size = 768,
crop_mode = True,
save_results = True,
test_compress = True,
+ device = "mps",
+ dtype = torch.float16,
)
For CUDA, you should also use bfloat16 to get as close as possible to the original implementation.
- model.eval().to("cpu").to(torch.float16)
+ model = model.eval().to("cuda").to(torch.bfloat16)
res = model.infer(
tokenizer,
prompt=prompt,
image_file=image_file,
output_path = output_path,
base_size = 1024,
image_size = 768,
crop_mode = True,
save_results = True,
test_compress = True,
+ device = "cuda",
+ dtype = torch.bfloat16,
)
- Downloads last month
- 1,155
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Dogacel/Universal-DeepSeek-OCR-2
Base model
deepseek-ai/DeepSeek-OCR-2