dadadasadad Tingquan commited on
Commit
5b4115f
·
0 Parent(s):

Duplicate from PaddlePaddle/PP-DocLayoutV2

Browse files

Co-authored-by: Tingquan Gao <Tingquan@users.noreply.huggingface.co>

Files changed (6) hide show
  1. .gitattributes +37 -0
  2. README.md +79 -0
  3. config.json +176 -0
  4. inference.json +0 -0
  5. inference.pdiparams +3 -0
  6. inference.yml +100 -0
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ inference.pdiparams filter=lfs diff=lfs merge=lfs -text
37
+ inference.pdmodel filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: object-detection
4
+ tags:
5
+ - PaddleOCR
6
+ - PaddlePaddle
7
+ - ocr
8
+ - layout
9
+ - layout_detection
10
+ language:
11
+ - en
12
+ - zh
13
+ - multilingual
14
+ library_name: PaddleOCR
15
+ ---
16
+
17
+ ## Introduction
18
+
19
+ **PP-DocLayoutV2** is a dedicated lightweight model for layout analysis, focusing specifically on element detection, classification, and reading order
20
+ prediction.
21
+
22
+
23
+ ## **Model Architecture**
24
+
25
+ PP-DocLayoutV2 is composed of two sequentially connected networks. The first is an RT-DETR-based detection model that performs layout element detection and classification. The detected bounding boxes and class labels are then passed to a subsequent pointer network, which is responsible for ordering these layout elements.
26
+
27
+ <div align="center">
28
+ <img src="https://huggingface.co/datasets/PaddlePaddle/PaddleOCR-VL_demo/resolve/main/imgs/PP-DocLayoutV2.png" width="800"/>
29
+ </div>
30
+
31
+
32
+ ## Usage
33
+
34
+ ### Install Dependencies
35
+
36
+ Install [PaddlePaddle](https://www.paddlepaddle.org.cn/install/quick) and [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR):
37
+
38
+ ```bash
39
+ python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
40
+ python -m pip install -U "paddleocr[doc-parser]"
41
+ python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
42
+ ```
43
+
44
+ > For Windows users, please use WSL or a Docker container.
45
+
46
+
47
+ ### Basic Usage
48
+
49
+ Python API usage:
50
+
51
+ ```python
52
+ from paddleocr import LayoutDetection
53
+
54
+ model = LayoutDetection(model_name="PP-DocLayoutV2")
55
+ output = model.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/layout.jpg", batch_size=1, layout_nms=True)
56
+ for res in output:
57
+ res.print()
58
+ res.save_to_img(save_path="./output/")
59
+ res.save_to_json(save_path="./output/res.json")
60
+ ```
61
+
62
+ **For more usage details and parameter explanations, see the [documentation](https://www.paddleocr.ai/latest/en/version3.x/module_usage/layout_analysis.html).**
63
+
64
+
65
+ ## Citation
66
+
67
+ If you find PaddleOCR-VL helpful, feel free to give us a star and citation.
68
+
69
+ ```bibtex
70
+ @misc{cui2025paddleocrvlboostingmultilingualdocument,
71
+ title={PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model},
72
+ author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Handong Zheng and Jing Zhang and Jun Zhang and Yi Liu and Dianhai Yu and Yanjun Ma},
73
+ year={2025},
74
+ eprint={2510.14528},
75
+ archivePrefix={arXiv},
76
+ primaryClass={cs.CV},
77
+ url={https://arxiv.org/abs/2510.14528},
78
+ }
79
+ ```
config.json ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "mode": "paddle",
3
+ "draw_threshold": 0.5,
4
+ "metric": "COCO",
5
+ "use_dynamic_shape": false,
6
+ "Global": {
7
+ "model_name": "PP-DocLayoutV2"
8
+ },
9
+ "arch": "DETR",
10
+ "min_subgraph_size": 3,
11
+ "Preprocess": [
12
+ {
13
+ "interp": 2,
14
+ "keep_ratio": false,
15
+ "target_size": [
16
+ 800,
17
+ 800
18
+ ],
19
+ "type": "Resize"
20
+ },
21
+ {
22
+ "mean": [
23
+ 0.0,
24
+ 0.0,
25
+ 0.0
26
+ ],
27
+ "norm_type": "none",
28
+ "std": [
29
+ 1.0,
30
+ 1.0,
31
+ 1.0
32
+ ],
33
+ "type": "NormalizeImage"
34
+ },
35
+ {
36
+ "type": "Permute"
37
+ }
38
+ ],
39
+ "label_list": [
40
+ "abstract",
41
+ "algorithm",
42
+ "aside_text",
43
+ "chart",
44
+ "content",
45
+ "display_formula",
46
+ "doc_title",
47
+ "figure_title",
48
+ "footer",
49
+ "footer_image",
50
+ "footnote",
51
+ "formula_number",
52
+ "header",
53
+ "header_image",
54
+ "image",
55
+ "inline_formula",
56
+ "number",
57
+ "paragraph_title",
58
+ "reference",
59
+ "reference_content",
60
+ "seal",
61
+ "table",
62
+ "text",
63
+ "vertical_text",
64
+ "vision_footnote"
65
+ ],
66
+ "Hpi": {
67
+ "backend_configs": {
68
+ "paddle_infer": {
69
+ "trt_dynamic_shapes": {
70
+ "image": [
71
+ [
72
+ 1,
73
+ 3,
74
+ 800,
75
+ 800
76
+ ],
77
+ [
78
+ 1,
79
+ 3,
80
+ 800,
81
+ 800
82
+ ],
83
+ [
84
+ 8,
85
+ 3,
86
+ 800,
87
+ 800
88
+ ]
89
+ ],
90
+ "scale_factor": [
91
+ [
92
+ 1,
93
+ 2
94
+ ],
95
+ [
96
+ 1,
97
+ 2
98
+ ],
99
+ [
100
+ 8,
101
+ 2
102
+ ]
103
+ ]
104
+ },
105
+ "trt_dynamic_shape_input_data": {
106
+ "scale_factor": [
107
+ [
108
+ 2,
109
+ 2
110
+ ],
111
+ [
112
+ 1,
113
+ 1
114
+ ],
115
+ [
116
+ 0.67,
117
+ 0.67,
118
+ 0.67,
119
+ 0.67,
120
+ 0.67,
121
+ 0.67,
122
+ 0.67,
123
+ 0.67,
124
+ 0.67,
125
+ 0.67,
126
+ 0.67,
127
+ 0.67,
128
+ 0.67,
129
+ 0.67,
130
+ 0.67,
131
+ 0.67
132
+ ]
133
+ ]
134
+ }
135
+ },
136
+ "tensorrt": {
137
+ "dynamic_shapes": {
138
+ "image": [
139
+ [
140
+ 1,
141
+ 3,
142
+ 800,
143
+ 800
144
+ ],
145
+ [
146
+ 1,
147
+ 3,
148
+ 800,
149
+ 800
150
+ ],
151
+ [
152
+ 8,
153
+ 3,
154
+ 800,
155
+ 800
156
+ ]
157
+ ],
158
+ "scale_factor": [
159
+ [
160
+ 1,
161
+ 2
162
+ ],
163
+ [
164
+ 1,
165
+ 2
166
+ ],
167
+ [
168
+ 8,
169
+ 2
170
+ ]
171
+ ]
172
+ }
173
+ }
174
+ }
175
+ }
176
+ }
inference.json ADDED
The diff for this file is too large to render. See raw diff
 
inference.pdiparams ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45404a84c9fdf91d7bbc94bd47ac4c03649bda84167de04c62bff4726657869a
3
+ size 212170944
inference.yml ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ mode: paddle
2
+ draw_threshold: 0.5
3
+ metric: COCO
4
+ use_dynamic_shape: false
5
+ Global:
6
+ model_name: PP-DocLayoutV2
7
+ arch: DETR
8
+ min_subgraph_size: 3
9
+ Preprocess:
10
+ - interp: 2
11
+ keep_ratio: false
12
+ target_size:
13
+ - 800
14
+ - 800
15
+ type: Resize
16
+ - mean:
17
+ - 0.0
18
+ - 0.0
19
+ - 0.0
20
+ norm_type: none
21
+ std:
22
+ - 1.0
23
+ - 1.0
24
+ - 1.0
25
+ type: NormalizeImage
26
+ - type: Permute
27
+ label_list:
28
+ - abstract
29
+ - algorithm
30
+ - aside_text
31
+ - chart
32
+ - content
33
+ - display_formula
34
+ - doc_title
35
+ - figure_title
36
+ - footer
37
+ - footer_image
38
+ - footnote
39
+ - formula_number
40
+ - header
41
+ - header_image
42
+ - image
43
+ - inline_formula
44
+ - number
45
+ - paragraph_title
46
+ - reference
47
+ - reference_content
48
+ - seal
49
+ - table
50
+ - text
51
+ - vertical_text
52
+ - vision_footnote
53
+ Hpi:
54
+ backend_configs:
55
+ paddle_infer:
56
+ trt_dynamic_shapes: &id001
57
+ image:
58
+ - - 1
59
+ - 3
60
+ - 800
61
+ - 800
62
+ - - 1
63
+ - 3
64
+ - 800
65
+ - 800
66
+ - - 8
67
+ - 3
68
+ - 800
69
+ - 800
70
+ scale_factor:
71
+ - - 1
72
+ - 2
73
+ - - 1
74
+ - 2
75
+ - - 8
76
+ - 2
77
+ trt_dynamic_shape_input_data:
78
+ scale_factor:
79
+ - - 2
80
+ - 2
81
+ - - 1
82
+ - 1
83
+ - - 0.67
84
+ - 0.67
85
+ - 0.67
86
+ - 0.67
87
+ - 0.67
88
+ - 0.67
89
+ - 0.67
90
+ - 0.67
91
+ - 0.67
92
+ - 0.67
93
+ - 0.67
94
+ - 0.67
95
+ - 0.67
96
+ - 0.67
97
+ - 0.67
98
+ - 0.67
99
+ tensorrt:
100
+ dynamic_shapes: *id001