File size: 2,527 Bytes
6a1cb21 bd8a462 6a1cb21 bd8a462 eb8d4d0 bd8a462 f6cbc7b bd8a462 4483955 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 eb8d4d0 bd8a462 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | ---
pipeline_tag: image-text-to-text
language:
- multilingual
tags:
- deepseek
- vision-language
- ocr
- custom_code
license: mit
library_name: transformers
---
# Model Overview
- **Model Architecture:** DeepSeek-OCR
- **Input:** Image/Text
- **Output:** Text
- **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355
- **ROCm**: 7.1.0
- **PyTorch**: 2.8.0
- **Transformers**: 4.57.3
- **Operating System(s):** Linux
# Model Details
The official version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the `modeling_deepseekocr.py` file has been updated for improved usability, and `modeling_deepseekv2.py` has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers.
This model can be quantized by using [AMD-Quark](https://quark.docs.amd.com/latest/index.html), and the resulting quantized model is available at [amd/DeepSeek-OCR-MXFP4](https://huggingface.co/amd/DeepSeek-OCR-MXFP4).
# Usage
```python
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["HIP_VISIBLE_DEVICES"] = '0'
model_name = 'amd/DeepSeek-OCR'
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
model = model.eval().cuda().to(torch.bfloat16)
# prompt = "<image>\nFree OCR. "
prompt = "<image>\n<|grounding|>Convert the document to markdown. "
image_file = 'your_image.jpg'
output_path = 'your/output/dir'
# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
# Tiny: base_size = 512, image_size = 512, crop_mode = False
# Small: base_size = 640, image_size = 640, crop_mode = False
# Base: base_size = 1024, image_size = 1024, crop_mode = False
# Large: base_size = 1280, image_size = 1280, crop_mode = False
# Gundam: base_size = 1024, image_size = 640, crop_mode = True
res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
```
# License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved. |