amd
/

DeepSeek-OCR

Image-Text-to-Text

feature-extraction

vision-language

Model card Files Files and versions

DeepSeek-OCR / README.md

linzhao-amd's picture

Update README.md

4483955 verified about 1 month ago

|

history blame contribute delete

2.53 kB

	---
	pipeline_tag: image-text-to-text
	language:
	- multilingual
	tags:
	- deepseek
	- vision-language
	- ocr
	- custom_code
	license: mit
	library_name: transformers
	---
	# Model Overview

	- Model Architecture: DeepSeek-OCR
	- Input: Image/Text
	- Output: Text
	- Supported Hardware Microarchitecture: AMD MI300/MI350/MI355
	- ROCm: 7.1.0
	- PyTorch: 2.8.0
	- Transformers: 4.57.3
	- Operating System(s): Linux

	# Model Details
	The official version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the `modeling_deepseekocr.py` file has been updated for improved usability, and `modeling_deepseekv2.py` has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers.

	This model can be quantized by using [AMD-Quark](https://quark.docs.amd.com/latest/index.html), and the resulting quantized model is available at [amd/DeepSeek-OCR-MXFP4](https://huggingface.co/amd/DeepSeek-OCR-MXFP4).


	# Usage
	```python
	from transformers import AutoModel, AutoTokenizer
	import torch
	import os
	os.environ["HIP_VISIBLE_DEVICES"] = '0'
	model_name = 'amd/DeepSeek-OCR'

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
	model = model.eval().cuda().to(torch.bfloat16)

	# prompt = "<image>\nFree OCR. "
	prompt = "<image>\n<\|grounding\|>Convert the document to markdown. "
	image_file = 'your_image.jpg'
	output_path = 'your/output/dir'

	# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):

	# Tiny: base_size = 512, image_size = 512, crop_mode = False
	# Small: base_size = 640, image_size = 640, crop_mode = False
	# Base: base_size = 1024, image_size = 1024, crop_mode = False
	# Large: base_size = 1280, image_size = 1280, crop_mode = False

	# Gundam: base_size = 1024, image_size = 640, crop_mode = True

	res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
	```

	# License
	Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.