Instructions to use aoiandroid/PP-DocLayoutV3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PaddleOCR
How to use aoiandroid/PP-DocLayoutV3 with PaddleOCR:
# 1. See https://www.paddlepaddle.org.cn/en/install to install paddlepaddle # 2. pip install paddleocr from paddleocr import LayoutDetection model = LayoutDetection(model_name="PP-DocLayoutV3") output = model.predict(input="path/to/image.png", batch_size=1) for res in output: res.print() res.save_to_img(save_path="./output/") res.save_to_json(save_path="./output/res.json") - Notebooks
- Google Colab
- Kaggle
Commit ·
3bb22ea
0
Parent(s):
Duplicate from PaddlePaddle/PP-DocLayoutV3
Browse filesCo-authored-by: Yue Zhang <xiaohei66@users.noreply.huggingface.co>
- .gitattributes +36 -0
- README.md +103 -0
- inference.json +0 -0
- inference.pdiparams +3 -0
- inference.yml +100 -0
.gitattributes
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
inference.pdiparams filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-segmentation
|
| 4 |
+
tags:
|
| 5 |
+
- PaddleOCR
|
| 6 |
+
- PaddlePaddle
|
| 7 |
+
- image-segmentation
|
| 8 |
+
- ocr
|
| 9 |
+
- layout
|
| 10 |
+
- layout_detection
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
- zh
|
| 14 |
+
- multilingual
|
| 15 |
+
library_name: PaddleOCR
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
<div align="center">
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
<h1 align="center">
|
| 22 |
+
|
| 23 |
+
Layout Analysis Module of PaddleOCR-VL-1.5
|
| 24 |
+
|
| 25 |
+
</h1>
|
| 26 |
+
|
| 27 |
+
[](https://github.com/PaddlePaddle/PaddleOCR)
|
| 28 |
+
[](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3)
|
| 29 |
+
[](https://modelscope.cn/models/PaddlePaddle/PP-DocLayoutV3)
|
| 30 |
+
[](https://huggingface.co/spaces/PaddlePaddle/PaddleOCR-VL-1.5_Online_Demo)
|
| 31 |
+
[](https://modelscope.cn/studios/PaddlePaddle/PaddleOCR-VL-1.5_Online_Demo/summary)
|
| 32 |
+
[](https://discord.gg/JPmZXDsEEK)
|
| 33 |
+
[](https://x.com/PaddlePaddle)
|
| 34 |
+
[](./LICENSE)
|
| 35 |
+
|
| 36 |
+
**🔥 [Official Website](https://www.paddleocr.com)** |
|
| 37 |
+
**📝 [Technical Report](https://arxiv.org/abs/2601.21957)**
|
| 38 |
+
|
| 39 |
+
</div>
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
## Introduction
|
| 45 |
+
|
| 46 |
+
This is the PP-Doclayoutv3 model weights for the PaddlePaddle framework. Get safetensors weights at [PP-DocLayoutV3_safetensors](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors)
|
| 47 |
+
|
| 48 |
+
**PP-DocLayoutV3 is specifically engineered to handle non-planar document images. It can directly predict multi-point bounding boxes for layout elements—as opposed to standard two-point boxes—and determine logical reading orders for skewed and curved surfaces within a single forward pass, significantly reducing cascading errors.** This model is an essential component of PaddleOCR-VL-1.5, providing crucial layout analysis for the high-precision parsing of various real-world documents in PaddleOCR-VL.
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
### **Model Architecture**
|
| 52 |
+
|
| 53 |
+
<div align="center">
|
| 54 |
+
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl_1_5/PP-DocLayoutV3.png" width="800"/>
|
| 55 |
+
</div>
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
## Visualization
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
### Light Variation
|
| 62 |
+
|
| 63 |
+
<div align="center">
|
| 64 |
+
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl_1_5/layout_lighting.jpg" width="800"/>
|
| 65 |
+
</div>
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
### Skewing
|
| 69 |
+
|
| 70 |
+
<div align="center">
|
| 71 |
+
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl_1_5/layout_skew.jpg" width="800"/>
|
| 72 |
+
</div>
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
### Screen-photo
|
| 76 |
+
|
| 77 |
+
<div align="center">
|
| 78 |
+
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl_1_5/layout_screen.jpg" width="800"/>
|
| 79 |
+
</div>
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
### Curving
|
| 83 |
+
|
| 84 |
+
<div align="center">
|
| 85 |
+
<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/paddleocr_vl_1_5/layout_curv.jpg" width="800"/>
|
| 86 |
+
</div>
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
## Citation
|
| 90 |
+
|
| 91 |
+
If you find PP-DocLayoutV3 helpful, feel free to give us a star and citation.
|
| 92 |
+
|
| 93 |
+
```bibtex
|
| 94 |
+
@misc{cui2026paddleocrvl15multitask09bvlm,
|
| 95 |
+
title={PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing},
|
| 96 |
+
author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Yi Liu and Dianhai Yu and Yanjun Ma},
|
| 97 |
+
year={2026},
|
| 98 |
+
eprint={2601.21957},
|
| 99 |
+
archivePrefix={arXiv},
|
| 100 |
+
primaryClass={cs.CV},
|
| 101 |
+
url={https://arxiv.org/abs/2601.21957},
|
| 102 |
+
}
|
| 103 |
+
```
|
inference.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
inference.pdiparams
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:70bd316b0582769ec968829fd1feb1a6a58b7c941b938327e551b6b12b45c137
|
| 3 |
+
size 130806572
|
inference.yml
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
mode: paddle
|
| 2 |
+
draw_threshold: 0.5
|
| 3 |
+
metric: COCO
|
| 4 |
+
use_dynamic_shape: false
|
| 5 |
+
Global:
|
| 6 |
+
model_name: PP-DocLayoutV3
|
| 7 |
+
arch: DETR
|
| 8 |
+
min_subgraph_size: 3
|
| 9 |
+
Preprocess:
|
| 10 |
+
- interp: 2
|
| 11 |
+
keep_ratio: false
|
| 12 |
+
target_size:
|
| 13 |
+
- 800
|
| 14 |
+
- 800
|
| 15 |
+
type: Resize
|
| 16 |
+
- mean:
|
| 17 |
+
- 0.0
|
| 18 |
+
- 0.0
|
| 19 |
+
- 0.0
|
| 20 |
+
norm_type: none
|
| 21 |
+
std:
|
| 22 |
+
- 1.0
|
| 23 |
+
- 1.0
|
| 24 |
+
- 1.0
|
| 25 |
+
type: NormalizeImage
|
| 26 |
+
- type: Permute
|
| 27 |
+
label_list:
|
| 28 |
+
- abstract
|
| 29 |
+
- algorithm
|
| 30 |
+
- aside_text
|
| 31 |
+
- chart
|
| 32 |
+
- content
|
| 33 |
+
- display_formula
|
| 34 |
+
- doc_title
|
| 35 |
+
- figure_title
|
| 36 |
+
- footer
|
| 37 |
+
- footer_image
|
| 38 |
+
- footnote
|
| 39 |
+
- formula_number
|
| 40 |
+
- header
|
| 41 |
+
- header_image
|
| 42 |
+
- image
|
| 43 |
+
- inline_formula
|
| 44 |
+
- number
|
| 45 |
+
- paragraph_title
|
| 46 |
+
- reference
|
| 47 |
+
- reference_content
|
| 48 |
+
- seal
|
| 49 |
+
- table
|
| 50 |
+
- text
|
| 51 |
+
- vertical_text
|
| 52 |
+
- vision_footnote
|
| 53 |
+
Hpi:
|
| 54 |
+
backend_configs:
|
| 55 |
+
paddle_infer:
|
| 56 |
+
trt_dynamic_shapes: &id001
|
| 57 |
+
image:
|
| 58 |
+
- - 1
|
| 59 |
+
- 3
|
| 60 |
+
- 800
|
| 61 |
+
- 800
|
| 62 |
+
- - 1
|
| 63 |
+
- 3
|
| 64 |
+
- 800
|
| 65 |
+
- 800
|
| 66 |
+
- - 8
|
| 67 |
+
- 3
|
| 68 |
+
- 800
|
| 69 |
+
- 800
|
| 70 |
+
scale_factor:
|
| 71 |
+
- - 1
|
| 72 |
+
- 2
|
| 73 |
+
- - 1
|
| 74 |
+
- 2
|
| 75 |
+
- - 8
|
| 76 |
+
- 2
|
| 77 |
+
trt_dynamic_shape_input_data:
|
| 78 |
+
scale_factor:
|
| 79 |
+
- - 2
|
| 80 |
+
- 2
|
| 81 |
+
- - 1
|
| 82 |
+
- 1
|
| 83 |
+
- - 0.67
|
| 84 |
+
- 0.67
|
| 85 |
+
- 0.67
|
| 86 |
+
- 0.67
|
| 87 |
+
- 0.67
|
| 88 |
+
- 0.67
|
| 89 |
+
- 0.67
|
| 90 |
+
- 0.67
|
| 91 |
+
- 0.67
|
| 92 |
+
- 0.67
|
| 93 |
+
- 0.67
|
| 94 |
+
- 0.67
|
| 95 |
+
- 0.67
|
| 96 |
+
- 0.67
|
| 97 |
+
- 0.67
|
| 98 |
+
- 0.67
|
| 99 |
+
tensorrt:
|
| 100 |
+
dynamic_shapes: *id001
|