amd
/

DeepSeek-OCR

@@ -10,6 +10,82 @@ tags:
 license: mit
 library_name: transformers
 ---
-# Disclaimer
-This model is provided for experimental purposes only. Its accuracy, stability, and suitability for deployment are not guaranteed. Users are advised to independently evaluate the model before any practical or production use.

 license: mit
 library_name: transformers
 ---
+# Model Overview
+- **Model Architecture:** DeepSeek-OCR
+  - **Input:** Image/Text
+  - **Output:** Text
+- **Supported Hardware Microarchitecture:** AMD MI350/MI355
+- **ROCm**: 7.1.0
+- **PyTorch**: 2.8.0
+- **Transformers**: 4.57.3
+- **Operating System(s):** Linux
+# Model Details
+The official version of DeepSeek-OCR has limited the transformers version to 4.46.3 and has not been adapted to the latest version. Therefore, this community edition has modified the modeling.py module to facilitate user convenience without requiring a transformers downgrade. Additionally, follow the steps below to quantify can obtain the perplexity value for the text-to-text generation part.
+# Model Quantization
+**Quantization scripts:**
+Before quantization, please install flash-attn in the following way:
+```
+pip install flash-attn --no-build-isolation
+```
+Below is an example of how to quantize this model:
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer, AutoProcessor
+from quark.torch import LLMTemplate, ModelQuantizer, export_safetensors
+from datasets import load_dataset
+from quark.contrib.llm_eval import ppl_eval
+# Register DeepSeek-OCR template
+deepseek_ocr_template = LLMTemplate(
+    model_type="deepseek_vl_v2",
+    kv_layers_name=["*k_proj", "*v_proj"],
+    q_layer_name="*q_proj",
+    exclude_layers_name=["lm_head", "model.sam_model*", "model.vision_model*", "model.projector*"],
+)
+LLMTemplate.register_template(deepseek_ocr_template)
+# Configuration
+ckpt_path = "amd/DeepSeek-OCR"
+output_dir = "amd/DeepSeek-OCR-MXFP4"
+quant_scheme = "mxfp4"
+exclude_layers = ["*self_attn*", "*mlp.gate", "lm_head", "*mlp.gate_proj", "*mlp.up_proj",
+                  "*mlp.down_proj", "*shared_experts.*", "*sam_model*", "*vision_model*", "*projector*"]
+# Load model
+model = AutoModel.from_pretrained(ckpt_path, use_safetensors=True, trust_remote_code=True,
+                                   _attn_implementation='flash_attention_2', device_map="cuda:0", torch_dtype=torch.bfloat16)
+model.eval()
+tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)
+processor = AutoProcessor.from_pretrained(ckpt_path, trust_remote_code=True)
+# Get quant config from template
+template = LLMTemplate.get(model.config.model_type)
+quant_config = template.get_config(scheme=quant_scheme, exclude_layers=exclude_layers)
+# Quantize
+quantizer = ModelQuantizer(quant_config)
+model = quantizer.quantize_model(model)
+model = quantizer.freeze(model)
+# Export hf_format
+export_safetensors(model, output_dir, custom_mode="quark")
+tokenizer.save_pretrained(output_dir)
+processor.save_pretrained(output_dir)
+# Evaluate PPL (optional)
+testdata = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
+testenc = tokenizer("\n\n".join(testdata["text"]), return_tensors="pt")
+ppl = ppl_eval(model, testenc, model.device)
+print(f"Perplexity: {ppl.item()}")
+```
+For further details or issues, please refer to the [AMD-Quark](https://quark.docs.amd.com/latest/index.html) documentation or contact the respective developers.
+# License
+Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.