--- pipeline_tag: image-text-to-text language: - multilingual tags: - deepseek - vision-language - ocr - custom_code license: mit library_name: transformers --- # Model Overview - **Model Architecture:** DeepSeek-OCR - **Input:** Image/Text - **Output:** Text - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355 - **ROCm**: 7.1.0 - **PyTorch**: 2.8.0 - **Transformers**: 4.57.3 - **Operating System(s):** Linux # Model Details The official version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the `modeling_deepseekocr.py` file has been updated for improved usability, and `modeling_deepseekv2.py` has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers. This model can be quantized by using [AMD-Quark](https://quark.docs.amd.com/latest/index.html), and the resulting quantized model is available at [amd/DeepSeek-OCR-MXFP4](https://huggingface.co/amd/DeepSeek-OCR-MXFP4). # Usage ```python from transformers import AutoModel, AutoTokenizer import torch import os os.environ["HIP_VISIBLE_DEVICES"] = '0' model_name = 'amd/DeepSeek-OCR' tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True) model = model.eval().cuda().to(torch.bfloat16) # prompt = "\nFree OCR. " prompt = "\n<|grounding|>Convert the document to markdown. " image_file = 'your_image.jpg' output_path = 'your/output/dir' # infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False): # Tiny: base_size = 512, image_size = 512, crop_mode = False # Small: base_size = 640, image_size = 640, crop_mode = False # Base: base_size = 1024, image_size = 1024, crop_mode = False # Large: base_size = 1280, image_size = 1280, crop_mode = False # Gundam: base_size = 1024, image_size = 640, crop_mode = True res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True) ``` # License Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.