jiaxwang commited on
Commit
eb8d4d0
·
verified ·
1 Parent(s): 671e91e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -52
README.md CHANGED
@@ -15,76 +15,52 @@ library_name: transformers
15
  - **Model Architecture:** DeepSeek-OCR
16
  - **Input:** Image/Text
17
  - **Output:** Text
18
- - **Supported Hardware Microarchitecture:** AMD MI350/MI355
19
  - **ROCm**: 7.1.0
20
  - **PyTorch**: 2.8.0
21
  - **Transformers**: 4.57.3
22
  - **Operating System(s):** Linux
23
 
24
  # Model Details
25
- The official version of DeepSeek-OCR restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. As a result, this community edition has adjusted the modeling.py module to improve user convenience, eliminating the need for downgrading transformers. Furthermore, by following the steps below, you can quantify and obtain the perplexity value for the text-to-text generation component.
 
 
26
 
27
- # Model Quantization
28
- **Quantization scripts:**
29
-
30
- Before quantization, please install flash-attn in the following way:
31
- ```
32
- pip install flash-attn --no-build-isolation
33
  ```
34
- Below is an example of how to quantize this model:
35
 
 
 
 
36
  ```python
 
37
  import torch
38
- from transformers import AutoModel, AutoTokenizer, AutoProcessor
39
- from quark.torch import LLMTemplate, ModelQuantizer, export_safetensors
40
- from datasets import load_dataset
41
- from quark.contrib.llm_eval import ppl_eval
42
-
43
- # Register DeepSeek-OCR template
44
- deepseek_ocr_template = LLMTemplate(
45
- model_type="deepseek_vl_v2",
46
- kv_layers_name=["*k_proj", "*v_proj"],
47
- q_layer_name="*q_proj",
48
- exclude_layers_name=["lm_head", "model.sam_model*", "model.vision_model*", "model.projector*"],
49
- )
50
- LLMTemplate.register_template(deepseek_ocr_template)
51
-
52
- # Configuration
53
- ckpt_path = "amd/DeepSeek-OCR"
54
- output_dir = "amd/DeepSeek-OCR-MXFP4"
55
- quant_scheme = "mxfp4"
56
- exclude_layers = ["*self_attn*", "*mlp.gate", "lm_head", "*mlp.gate_proj", "*mlp.up_proj",
57
- "*mlp.down_proj", "*shared_experts.*", "*sam_model*", "*vision_model*", "*projector*"]
58
 
59
- # Load model
60
- model = AutoModel.from_pretrained(ckpt_path, use_safetensors=True, trust_remote_code=True,
61
- _attn_implementation='flash_attention_2', device_map="cuda:0", torch_dtype=torch.bfloat16)
62
- model.eval()
63
- tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)
64
- processor = AutoProcessor.from_pretrained(ckpt_path, trust_remote_code=True)
65
 
66
- # Get quant config from template
67
- template = LLMTemplate.get(model.config.model_type)
68
- quant_config = template.get_config(scheme=quant_scheme, exclude_layers=exclude_layers)
 
69
 
70
- # Quantize
71
- quantizer = ModelQuantizer(quant_config)
72
- model = quantizer.quantize_model(model)
73
- model = quantizer.freeze(model)
74
 
75
- # Export hf_format
76
- export_safetensors(model, output_dir, custom_mode="quark")
77
- tokenizer.save_pretrained(output_dir)
78
- processor.save_pretrained(output_dir)
79
 
80
- # Evaluate PPL (optional)
81
- testdata = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
82
- testenc = tokenizer("\n\n".join(testdata["text"]), return_tensors="pt")
83
- ppl = ppl_eval(model, testenc, model.device)
84
- print(f"Perplexity: {ppl.item()}")
85
 
 
86
  ```
87
- For further details or issues, please refer to the [AMD-Quark](https://quark.docs.amd.com/latest/index.html) documentation or contact the respective developers.
88
 
89
  # License
90
  Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
 
15
  - **Model Architecture:** DeepSeek-OCR
16
  - **Input:** Image/Text
17
  - **Output:** Text
18
+ - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355
19
  - **ROCm**: 7.1.0
20
  - **PyTorch**: 2.8.0
21
  - **Transformers**: 4.57.3
22
  - **Operating System(s):** Linux
23
 
24
  # Model Details
25
+ The official version of [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) restricts the transformers library to version 4.46.3 and has not been updated to support the latest release. In this community edition, the modeling_deepseekocr.py file has been updated for improved usability, and modeling_deepseekv2.py has been removed in favor of using the DeepSeekV2 model definitions provided by the transformers library, eliminating the need for downgrading transformers. Furthermore, by using [AMD-Quark](https://quark.docs.amd.com/latest/index.html) and following the commands below, you can obtain the perplexity value for the text-to-text generation component.
26
+ ```bash
27
+ cd Quark/examples/torch/language_modeling/llm_ptq
28
 
29
+ python3 quantize_quark.py \
30
+ --model_dir amd/DeepSeek-OCR \
31
+ --skip_quantization
 
 
 
32
  ```
 
33
 
34
+ The quantized model based on this model: [amd/DeepSeek-OCR-MXFP4](https://huggingface.co/amd/DeepSeek-OCR-MXFP4)
35
+
36
+ # Usage
37
  ```python
38
+ from transformers import AutoModel, AutoTokenizer
39
  import torch
40
+ import os
41
+ os.environ["HIP_VISIBLE_DEVICES"] = '0'
42
+ model_name = 'amd/DeepSeek-OCR'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
 
44
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
45
+ model = AutoModel.from_pretrained(model_name, _attn_implementation='flash_attention_2', trust_remote_code=True, use_safetensors=True)
46
+ model = model.eval().cuda().to(torch.bfloat16)
 
 
 
47
 
48
+ # prompt = "<image>\nFree OCR. "
49
+ prompt = "<image>\n<|grounding|>Convert the document to markdown. "
50
+ image_file = 'your_image.jpg'
51
+ output_path = 'your/output/dir'
52
 
53
+ # infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False):
 
 
 
54
 
55
+ # Tiny: base_size = 512, image_size = 512, crop_mode = False
56
+ # Small: base_size = 640, image_size = 640, crop_mode = False
57
+ # Base: base_size = 1024, image_size = 1024, crop_mode = False
58
+ # Large: base_size = 1280, image_size = 1280, crop_mode = False
59
 
60
+ # Gundam: base_size = 1024, image_size = 640, crop_mode = True
 
 
 
 
61
 
62
+ res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True)
63
  ```
 
64
 
65
  # License
66
  Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.