lvyufeng commited on
Commit
80eafd7
·
verified ·
1 Parent(s): 95a3681

updata pytorch usage and prompts

Browse files
Files changed (1) hide show
  1. README.md +61 -1
README.md CHANGED
@@ -2,6 +2,8 @@
2
  license: apache-2.0
3
  pipeline_tag: image-text-to-text
4
  tags:
 
 
5
  - mindspore
6
  - mindnlp
7
  - ERNIE4.5
@@ -69,7 +71,7 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
69
  * ```2025.10.19``` 🚀 MindNLP support [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance.
70
 
71
 
72
- ## Usage
73
 
74
  ### Install Dependencies
75
 
@@ -116,6 +118,64 @@ decoded_output = processor.decode(
116
  print(decoded_output)
117
  ```
118
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
  ## Performance
120
 
121
  ### Page-Level Document Parsing
 
2
  license: apache-2.0
3
  pipeline_tag: image-text-to-text
4
  tags:
5
+ - pytorch
6
+ - transformers
7
  - mindspore
8
  - mindnlp
9
  - ERNIE4.5
 
71
  * ```2025.10.19``` 🚀 MindNLP support [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance.
72
 
73
 
74
+ ## MindSpore Usage
75
 
76
  ### Install Dependencies
77
 
 
118
  print(decoded_output)
119
  ```
120
 
121
+ ### Prompts
122
+
123
+ Besides OCR, PaddleOCR-VL also supports various tasks, including: table recognition, chart recognition and formula recognition.
124
+ You can replace the prompt with the following usages: \n
125
+
126
+ ```python
127
+ query = "OCR:"
128
+ query = "Table Recognition:"
129
+ query = "Chart Recognition:"
130
+ query = "Formula Recognition:"
131
+ ```
132
+
133
+ ## Pytorch Usage
134
+
135
+ You can also use Pytorch to use PaddleOCR-VL.
136
+
137
+ ### Install Dependencies
138
+
139
+ ```bash
140
+ pip install torch
141
+ pip install transformers==4.57.1
142
+ ```
143
+
144
+
145
+ ### Basic Usage
146
+
147
+ ```python
148
+ import torch
149
+ from transformers import AutoModel, AutoProcessor, AutoTokenizer
150
+ from transformers.image_utils import load_image
151
+
152
+
153
+ model = AutoModel.from_pretrained("lvyufeng/PaddleOCR-VL-0.9B", trust_remote_code=True, dtype=torch.bfloat16, device_map='auto')
154
+ tokenizer = AutoTokenizer.from_pretrained("lvyufeng/PaddleOCR-VL-0.9B")
155
+ processor = AutoProcessor.from_pretrained("lvyufeng/PaddleOCR-VL-0.9B", trust_remote_code=True)
156
+
157
+ image = load_image(
158
+ "https://hf-mirror.com/datasets/hf-internal-testing/fixtures_got_ocr/resolve/main/image_ocr.jpg"
159
+ )
160
+
161
+ query = 'OCR:'
162
+ messages = [
163
+ {
164
+ "role": "user",
165
+ "content": query,
166
+ }
167
+ ]
168
+
169
+ text = tokenizer.apply_chat_template(messages, tokenize=False)
170
+ inputs = processor(image, text=text, return_tensors="pt", format=True).to('cuda')
171
+ generate_ids = model.generate(**inputs, do_sample=False, num_beams=1, max_new_tokens=1024)
172
+ print(generate_ids.shape)
173
+ decoded_output = processor.decode(
174
+ generate_ids[0], skip_special_tokens=True
175
+ )
176
+ print(decoded_output)
177
+ ```
178
+
179
  ## Performance
180
 
181
  ### Page-Level Document Parsing