File size: 2,527 Bytes
6a1cb21
 
 
 
 
 
 
 
 
 
 
 
bd8a462
6a1cb21
bd8a462
 
 
eb8d4d0
bd8a462
 
 
 
 
 
f6cbc7b
bd8a462
4483955
bd8a462
eb8d4d0
 
bd8a462
eb8d4d0
bd8a462
eb8d4d0
 
 
bd8a462
eb8d4d0
 
 
bd8a462
eb8d4d0
 
 
 
bd8a462
eb8d4d0
bd8a462
eb8d4d0
 
 
 
bd8a462
eb8d4d0
bd8a462
eb8d4d0
bd8a462
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
pipeline_tag: image-text-to-text
language:
- multilingual
tags:
- deepseek
- vision-language
- ocr
- custom_code
license: mit
library_name: transformers
---
# Model Overview

- **Model Architecture:** DeepSeek-OCR
  - **Input:** Image/Text
  - **Output:** Text
- **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355
- **ROCm**: 7.1.0
- **PyTorch**: 2.8.0
- **Transformers**: 4.57.3
- **Operating System(s):** Linux

# Model Details
The official version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the `modeling_deepseekocr.py` file has been updated for improved usability, and `modeling_deepseekv2.py` has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers. 

This model can be quantized by using [AMD-Quark](https://quark.docs.amd.com/latest/index.html), and the resulting quantized model is available at [amd/DeepSeek-OCR-MXFP4](https://huggingface.co/amd/DeepSeek-OCR-MXFP4).


# Usage
```python
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["HIP_VISIBLE_DEVICES"] = '0'
model_name = 'amd/DeepSeek-OCR'

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)

# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'

# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):

# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False

# Gundam: base_size = 1024, image_size = 640, crop_mode = True

res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
```

# License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.