File size: 5,525 Bytes
504ba2b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
license: apache-2.0
tags:
  - paddleocr-vl
  - pp-doclayoutv3
  - onnxruntime
  - document-layout-analysis
  - rocm
  - vllm
  - llama-cpp
pipeline_tag: object-detection
library_name: onnxruntime
---

# PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm

This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).

## 中文说明

本仓库提供已经验证过的 `PP-DocLayoutV3-onnx` 模型文件,供 [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) 直接下载使用。

用户不需要再安装 Paddle、Paddle2ONNX,也不需要自己从 Paddle 模型导出 ONNX。克隆开源项目后,只需运行下载脚本即可准备 layout 模型。

## Files

- `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
- `inference.yml`: model configuration used by the ONNXRuntime pipeline.

Verified checksums:

| File | SHA256 |
|---|---|
| `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
| `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |

## Open-Source Project

Recommended runtime project:

[https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)

`PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:

- Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
- Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
- The project exposes both CLI and Python APIs.
- Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
- The code repository is open source and uses the MIT license.

## Why This Helps Users

This model repository removes the most painful setup step for users.

Before this model repository, users often had to:

1. Install Paddle or PaddleX dependencies.
2. Install and configure Paddle2ONNX.
3. Export PP-DocLayoutV3 by themselves.
4. Debug model file names, model config files, and ONNXRuntime input compatibility.

With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:

```powershell
pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py
```

The script downloads from this Hugging Face repository by default and prepares:

```text
models/PP-DocLayoutV3-onnx/
  inference.onnx
  inference.yml
```

This gives users a simpler path:

- No PaddlePaddle runtime is required for inference.
- No Paddle2ONNX conversion is required.
- No large model files are stored in the GitHub repo.
- The same verified model artifact is shared by all users.
- The GitHub repo stays small, clean, and easy to clone.
- ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.

## Validation Result

The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.

| Item | Result |
|---|---:|
| Full-run success | 1355 / 1355 |
| Payload alignment | 1355 / 1355 |
| Layout, crop, request order, request payload | Strictly aligned |

This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.

## Quick Start With PaddleOCR-VL-ROCm

```powershell
git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
cd PaddleOCR-VL-ROCm
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py
```

Then run inference with your OpenAI-compatible ROCm VLM endpoint:

```powershell
paddleocr-vl-rocm `
  --input examples/input/handwrite_ch_demo.png `
  --output outputs/smoke `
  --layout-model models/PP-DocLayoutV3-onnx `
  --server-url http://127.0.0.1:8000/v1 `
  --api-model-name PaddleOCR-VL-1.5-0.9B `
  --vlm-backend vllm-server
```

Expected output files:

```text
outputs/smoke/handwrite_ch_demo_res.json
outputs/smoke/handwrite_ch_demo.md
```

## Python API Example

```python
from paddleocr_vl_rocm import PaddleOCRVLROCm

pipeline = PaddleOCRVLROCm(
    layout_model_dir="models/PP-DocLayoutV3-onnx",
    vlm_server_url="http://127.0.0.1:8000/v1",
    api_model_name="PaddleOCR-VL-1.5-0.9B",
)

result = pipeline.predict("examples/input/handwrite_ch_demo.png")
result.save_to_json("outputs")
result.save_to_markdown("outputs", pretty=False)
```

## Scope

This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.

## 中文摘要

这个 Hugging Face 仓库的作用是给 `PaddleOCR-VL-ROCm` 提供可直接下载的、已验证的 `PP-DocLayoutV3-onnx` layout 模型。用户克隆 GitHub 项目后,只需要运行下载脚本即可准备模型,不需要安装 Paddle2ONNX,也不需要自己转换模型。

开源项目地址:[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)

主要好处:

- 降低安装门槛。
- 避免 Paddle2ONNX 转换差异。
- GitHub 仓库保持轻量,不提交大模型。
- ONNXRuntime 负责 layout,ROCm/vLLM 或 llama.cpp 负责 VLM 推理。
- 已在 1355 张图片上完成验证,full-run success 和 payload alignment 均为 `1355 / 1355`