| | --- |
| | pipeline_tag: image-text-to-text |
| | language: |
| | - multilingual |
| | tags: |
| | - deepseek |
| | - vision-language |
| | - ocr |
| | - custom_code |
| | license: mit |
| | library_name: transformers |
| | --- |
| | # Model Overview |
| |
|
| | - **Model Architecture:** DeepSeek-OCR |
| | - **Input:** Image/Text |
| | - **Output:** Text |
| | - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355 |
| | - **ROCm**: 7.1.0 |
| | - **PyTorch**: 2.8.0 |
| | - **Transformers**: 4.57.3 |
| | - **Operating System(s):** Linux |
| |
|
| | # Model Details |
| | The official version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the `modeling_deepseekocr.py` file has been updated for improved usability, and `modeling_deepseekv2.py` has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers. |
| |
|
| | This model can be quantized by using [AMD-Quark](https://quark.docs.amd.com/latest/index.html), and the resulting quantized model is available at [amd/DeepSeek-OCR-MXFP4](https://huggingface.co/amd/DeepSeek-OCR-MXFP4). |
| |
|
| |
|
| | # Usage |
| | ```python |
| | from transformers import AutoModel, AutoTokenizer |
| | import torch |
| | import os |
| | os.environ["HIP_VISIBLE_DEVICES"] = '0' |
| | model_name = 'amd/DeepSeek-OCR' |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
| | model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True) |
| | model = model.eval().cuda().to(torch.bfloat16) |
| | |
| | # prompt = "<image>\nFree OCR. " |
| | prompt = "<image>\n<|grounding|>Convert the document to markdown. " |
| | image_file = 'your_image.jpg' |
| | output_path = 'your/output/dir' |
| | |
| | # infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False): |
| | |
| | # Tiny: base_size = 512, image_size = 512, crop_mode = False |
| | # Small: base_size = 640, image_size = 640, crop_mode = False |
| | # Base: base_size = 1024, image_size = 1024, crop_mode = False |
| | # Large: base_size = 1280, image_size = 1280, crop_mode = False |
| | |
| | # Gundam: base_size = 1024, image_size = 640, crop_mode = True |
| | |
| | res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True) |
| | ``` |
| |
|
| | # License |
| | Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved. |