updata pytorch usage and prompts
Browse files
README.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: image-text-to-text
|
| 4 |
tags:
|
|
|
|
|
|
|
| 5 |
- mindspore
|
| 6 |
- mindnlp
|
| 7 |
- ERNIE4.5
|
|
@@ -69,7 +71,7 @@ PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vi
|
|
| 69 |
* ```2025.10.19``` 🚀 MindNLP support [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance.
|
| 70 |
|
| 71 |
|
| 72 |
-
## Usage
|
| 73 |
|
| 74 |
### Install Dependencies
|
| 75 |
|
|
@@ -116,6 +118,64 @@ decoded_output = processor.decode(
|
|
| 116 |
print(decoded_output)
|
| 117 |
```
|
| 118 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
## Performance
|
| 120 |
|
| 121 |
### Page-Level Document Parsing
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: image-text-to-text
|
| 4 |
tags:
|
| 5 |
+
- pytorch
|
| 6 |
+
- transformers
|
| 7 |
- mindspore
|
| 8 |
- mindnlp
|
| 9 |
- ERNIE4.5
|
|
|
|
| 71 |
* ```2025.10.19``` 🚀 MindNLP support [PaddleOCR-VL](https://github.com/PaddlePaddle/PaddleOCR), — a multilingual documents parsing via a 0.9B Ultra-Compact Vision-Language Model with SOTA performance.
|
| 72 |
|
| 73 |
|
| 74 |
+
## MindSpore Usage
|
| 75 |
|
| 76 |
### Install Dependencies
|
| 77 |
|
|
|
|
| 118 |
print(decoded_output)
|
| 119 |
```
|
| 120 |
|
| 121 |
+
### Prompts
|
| 122 |
+
|
| 123 |
+
Besides OCR, PaddleOCR-VL also supports various tasks, including: table recognition, chart recognition and formula recognition.
|
| 124 |
+
You can replace the prompt with the following usages: \n
|
| 125 |
+
|
| 126 |
+
```python
|
| 127 |
+
query = "OCR:"
|
| 128 |
+
query = "Table Recognition:"
|
| 129 |
+
query = "Chart Recognition:"
|
| 130 |
+
query = "Formula Recognition:"
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
## Pytorch Usage
|
| 134 |
+
|
| 135 |
+
You can also use Pytorch to use PaddleOCR-VL.
|
| 136 |
+
|
| 137 |
+
### Install Dependencies
|
| 138 |
+
|
| 139 |
+
```bash
|
| 140 |
+
pip install torch
|
| 141 |
+
pip install transformers==4.57.1
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
|
| 145 |
+
### Basic Usage
|
| 146 |
+
|
| 147 |
+
```python
|
| 148 |
+
import torch
|
| 149 |
+
from transformers import AutoModel, AutoProcessor, AutoTokenizer
|
| 150 |
+
from transformers.image_utils import load_image
|
| 151 |
+
|
| 152 |
+
|
| 153 |
+
model = AutoModel.from_pretrained("lvyufeng/PaddleOCR-VL-0.9B", trust_remote_code=True, dtype=torch.bfloat16, device_map='auto')
|
| 154 |
+
tokenizer = AutoTokenizer.from_pretrained("lvyufeng/PaddleOCR-VL-0.9B")
|
| 155 |
+
processor = AutoProcessor.from_pretrained("lvyufeng/PaddleOCR-VL-0.9B", trust_remote_code=True)
|
| 156 |
+
|
| 157 |
+
image = load_image(
|
| 158 |
+
"https://hf-mirror.com/datasets/hf-internal-testing/fixtures_got_ocr/resolve/main/image_ocr.jpg"
|
| 159 |
+
)
|
| 160 |
+
|
| 161 |
+
query = 'OCR:'
|
| 162 |
+
messages = [
|
| 163 |
+
{
|
| 164 |
+
"role": "user",
|
| 165 |
+
"content": query,
|
| 166 |
+
}
|
| 167 |
+
]
|
| 168 |
+
|
| 169 |
+
text = tokenizer.apply_chat_template(messages, tokenize=False)
|
| 170 |
+
inputs = processor(image, text=text, return_tensors="pt", format=True).to('cuda')
|
| 171 |
+
generate_ids = model.generate(**inputs, do_sample=False, num_beams=1, max_new_tokens=1024)
|
| 172 |
+
print(generate_ids.shape)
|
| 173 |
+
decoded_output = processor.decode(
|
| 174 |
+
generate_ids[0], skip_special_tokens=True
|
| 175 |
+
)
|
| 176 |
+
print(decoded_output)
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
## Performance
|
| 180 |
|
| 181 |
### Page-Level Document Parsing
|