jiaxwang commited on
Commit
bd8a462
·
verified ·
1 Parent(s): 83d8f75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -2
README.md CHANGED
@@ -10,6 +10,82 @@ tags:
10
  license: mit
11
  library_name: transformers
12
  ---
13
- # Disclaimer
14
 
15
- This model is provided for experimental purposes only. Its accuracy, stability, and suitability for deployment are not guaranteed. Users are advised to independently evaluate the model before any practical or production use.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: mit
11
  library_name: transformers
12
  ---
13
+ # Model Overview
14
 
15
+ - **Model Architecture:** DeepSeek-OCR
16
+ - **Input:** Image/Text
17
+ - **Output:** Text
18
+ - **Supported Hardware Microarchitecture:** AMD MI350/MI355
19
+ - **ROCm**: 7.1.0
20
+ - **PyTorch**: 2.8.0
21
+ - **Transformers**: 4.57.3
22
+ - **Operating System(s):** Linux
23
+
24
+ # Model Details
25
+ The official version of DeepSeek-OCR has limited the transformers version to 4.46.3 and has not been adapted to the latest version. Therefore, this community edition has modified the modeling.py module to facilitate user convenience without requiring a transformers downgrade. Additionally, follow the steps below to quantify can obtain the perplexity value for the text-to-text generation part.
26
+
27
+
28
+ # Model Quantization
29
+ **Quantization scripts:**
30
+
31
+ Before quantization, please install flash-attn in the following way:
32
+ ```
33
+ pip install flash-attn --no-build-isolation
34
+ ```
35
+ Below is an example of how to quantize this model:
36
+
37
+ ```python
38
+ import torch
39
+ from transformers import AutoModel, AutoTokenizer, AutoProcessor
40
+ from quark.torch import LLMTemplate, ModelQuantizer, export_safetensors
41
+ from datasets import load_dataset
42
+ from quark.contrib.llm_eval import ppl_eval
43
+
44
+ # Register DeepSeek-OCR template
45
+ deepseek_ocr_template = LLMTemplate(
46
+ model_type="deepseek_vl_v2",
47
+ kv_layers_name=["*k_proj", "*v_proj"],
48
+ q_layer_name="*q_proj",
49
+ exclude_layers_name=["lm_head", "model.sam_model*", "model.vision_model*", "model.projector*"],
50
+ )
51
+ LLMTemplate.register_template(deepseek_ocr_template)
52
+
53
+ # Configuration
54
+ ckpt_path = "amd/DeepSeek-OCR"
55
+ output_dir = "amd/DeepSeek-OCR-MXFP4"
56
+ quant_scheme = "mxfp4"
57
+ exclude_layers = ["*self_attn*", "*mlp.gate", "lm_head", "*mlp.gate_proj", "*mlp.up_proj",
58
+ "*mlp.down_proj", "*shared_experts.*", "*sam_model*", "*vision_model*", "*projector*"]
59
+
60
+ # Load model
61
+ model = AutoModel.from_pretrained(ckpt_path, use_safetensors=True, trust_remote_code=True,
62
+ _attn_implementation='flash_attention_2', device_map="cuda:0", torch_dtype=torch.bfloat16)
63
+ model.eval()
64
+ tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True)
65
+ processor = AutoProcessor.from_pretrained(ckpt_path, trust_remote_code=True)
66
+
67
+ # Get quant config from template
68
+ template = LLMTemplate.get(model.config.model_type)
69
+ quant_config = template.get_config(scheme=quant_scheme, exclude_layers=exclude_layers)
70
+
71
+ # Quantize
72
+ quantizer = ModelQuantizer(quant_config)
73
+ model = quantizer.quantize_model(model)
74
+ model = quantizer.freeze(model)
75
+
76
+ # Export hf_format
77
+ export_safetensors(model, output_dir, custom_mode="quark")
78
+ tokenizer.save_pretrained(output_dir)
79
+ processor.save_pretrained(output_dir)
80
+
81
+ # Evaluate PPL (optional)
82
+ testdata = load_dataset("wikitext", "wikitext-2-raw-v1", split="test")
83
+ testenc = tokenizer("\n\n".join(testdata["text"]), return_tensors="pt")
84
+ ppl = ppl_eval(model, testenc, model.device)
85
+ print(f"Perplexity: {ppl.item()}")
86
+
87
+ ```
88
+ For further details or issues, please refer to the [AMD-Quark](https://quark.docs.amd.com/latest/index.html) documentation or contact the respective developers.
89
+
90
+ # License
91
+ Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.