amd
/

DeepSeek-OCR

@@ -15,76 +15,52 @@ library_name: transformers
 - **Model Architecture:** DeepSeek-OCR
   - **Input:** Image/Text
   - **Output:** Text
-- **Supported Hardware Microarchitecture:** AMD MI350/MI355
 - **ROCm**: 7.1.0
 - **PyTorch**: 2.8.0
 - **Transformers**: 4.57.3
 - **Operating System(s):** Linux
 # Model Details
-The official version of DeepSeek-OCR restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. As a result, this community edition has adjusted the modeling.py module to improve user convenience, eliminating the need for downgrading transformers. Furthermore, by following the steps below, you can quantify and obtain the perplexity value for the text-to-text generation component.
-# Model Quantization
-**Quantization scripts:**
-Before quantization, please install flash-attn in the following way:
-```
-pip install flash-attn --no-build-isolation
 ```
-Below is an example of how to quantize this model:
 ```python
 import torch
-from transformers import AutoModel, AutoTokenizer, AutoProcessor
-from quark.torch import LLMTemplate, ModelQuantizer, export_safetensors
-from datasets import load_dataset
-from quark.contrib.llm_eval import ppl_eval
-# Register DeepSeek-OCR template
-deepseek_ocr_template = LLMTemplate(
-    model_type="deepseek_vl_v2",
-    kv_layers_name=["*k_proj", "*v_proj"],
-    q_layer_name="*q_proj",
-    exclude_layers_name=["lm_head", "model.sam_model*", "model.vision_model*", "model.projector*"],
-)
-LLMTemplate.register_template(deepseek_ocr_template)
-# Configuration
-ckpt_path = "amd/DeepSeek-OCR"
-output_dir = "amd/DeepSeek-OCR-MXFP4"
-quant_scheme = "mxfp4"
-exclude_layers = ["*self_attn*", "*mlp.gate", "lm_head", "*mlp.gate_proj", "*mlp.up_proj",
-                  "*mlp.down_proj", "*shared_experts.*", "*sam_model*", "*vision_model*", "*projector*"]
-# Load model
-model = AutoModel.from_pretrained(ckpt_path, use_safetensors=True, trust_remote_code=True,
-                                   _attn_implementation='flash_attention_2', device_map="cuda:0", torch_dtype=torch.bfloat16)
-model.eval()
-tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)
-processor = AutoProcessor.from_pretrained(ckpt_path, trust_remote_code=True)
-# Get quant config from template
-template = LLMTemplate.get(model.config.model_type)
-quant_config = template.get_config(scheme=quant_scheme, exclude_layers=exclude_layers)
-# Quantize
-quantizer = ModelQuantizer(quant_config)
-model = quantizer.quantize_model(model)
-model = quantizer.freeze(model)
-# Export hf_format
-export_safetensors(model, output_dir, custom_mode="quark")
-tokenizer.save_pretrained(output_dir)
-processor.save_pretrained(output_dir)
-# Evaluate PPL (optional)
-testdata = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
-testenc = tokenizer("\n\n".join(testdata["text"]), return_tensors="pt")
-ppl = ppl_eval(model, testenc, model.device)
-print(f"Perplexity: {ppl.item()}")
 ```
-For further details or issues, please refer to the [AMD-Quark](https://quark.docs.amd.com/latest/index.html) documentation or contact the respective developers.
 # License
 Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.

 - **Model Architecture:** DeepSeek-OCR
   - **Input:** Image/Text
   - **Output:** Text
+- **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355
 - **ROCm**: 7.1.0
 - **PyTorch**: 2.8.0
 - **Transformers**: 4.57.3
 - **Operating System(s):** Linux
 # Model Details
+The official version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the modeling_deepseekocr.py file has been updated for improved usability, and modeling_deepseekv2.py has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers. Furthermore, by using [AMD-Quark](https://quark.docs.amd.com/latest/index.html) and following the commands below, you can obtain the perplexity value for the text-to-text generation component.
+```bash
+cd Quark/examples/torch/language_modeling/llm_ptq
+python3 quantize_quark.py \
+    --model_dir amd/DeepSeek-OCR \
+    --skip_quantization
 ```
+The quantized model based on this model: [amd/DeepSeek-OCR-MXFP4](https://huggingface.co/amd/DeepSeek-OCR-MXFP4)
+# Usage
 ```python
+from transformers import AutoModel, AutoTokenizer
 import torch
+import os
+os.environ["HIP_VISIBLE_DEVICES"] = '0'
+model_name = 'amd/DeepSeek-OCR'
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
+model = model.eval().cuda().to(torch.bfloat16)
+# prompt = "<image>\nFree OCR. "
+prompt = "<image>\n<|grounding|>Convert the document to markdown. "
+image_file = 'your_image.jpg'
+output_path = 'your/output/dir'
+# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
+# Tiny: base_size = 512, image_size = 512, crop_mode = False
+# Small: base_size = 640, image_size = 640, crop_mode = False
+# Base: base_size = 1024, image_size = 1024, crop_mode = False
+# Large: base_size = 1280, image_size = 1280, crop_mode = False
+# Gundam: base_size = 1024, image_size = 640, crop_mode = True
+res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
 ```
 # License
 Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.