File size: 5,525 Bytes
504ba2b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | ---
license: apache-2.0
tags:
- paddleocr-vl
- pp-doclayoutv3
- onnxruntime
- document-layout-analysis
- rocm
- vllm
- llama-cpp
pipeline_tag: object-detection
library_name: onnxruntime
---
# PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).
## 中文说明
本仓库提供已经验证过的 `PP-DocLayoutV3-onnx` 模型文件,供 [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) 直接下载使用。
用户不需要再安装 Paddle、Paddle2ONNX,也不需要自己从 Paddle 模型导出 ONNX。克隆开源项目后,只需运行下载脚本即可准备 layout 模型。
## Files
- `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
- `inference.yml`: model configuration used by the ONNXRuntime pipeline.
Verified checksums:
| File | SHA256 |
|---|---|
| `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
| `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |
## Open-Source Project
Recommended runtime project:
[https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
`PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
- Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
- Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
- The project exposes both CLI and Python APIs.
- Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
- The code repository is open source and uses the MIT license.
## Why This Helps Users
This model repository removes the most painful setup step for users.
Before this model repository, users often had to:
1. Install Paddle or PaddleX dependencies.
2. Install and configure Paddle2ONNX.
3. Export PP-DocLayoutV3 by themselves.
4. Debug model file names, model config files, and ONNXRuntime input compatibility.
With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:
```powershell
pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py
```
The script downloads from this Hugging Face repository by default and prepares:
```text
models/PP-DocLayoutV3-onnx/
inference.onnx
inference.yml
```
This gives users a simpler path:
- No PaddlePaddle runtime is required for inference.
- No Paddle2ONNX conversion is required.
- No large model files are stored in the GitHub repo.
- The same verified model artifact is shared by all users.
- The GitHub repo stays small, clean, and easy to clone.
- ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
## Validation Result
The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.
| Item | Result |
|---|---:|
| Full-run success | 1355 / 1355 |
| Payload alignment | 1355 / 1355 |
| Layout, crop, request order, request payload | Strictly aligned |
This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
## Quick Start With PaddleOCR-VL-ROCm
```powershell
git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
cd PaddleOCR-VL-ROCm
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py
```
Then run inference with your OpenAI-compatible ROCm VLM endpoint:
```powershell
paddleocr-vl-rocm `
--input examples/input/handwrite_ch_demo.png `
--output outputs/smoke `
--layout-model models/PP-DocLayoutV3-onnx `
--server-url http://127.0.0.1:8000/v1 `
--api-model-name PaddleOCR-VL-1.5-0.9B `
--vlm-backend vllm-server
```
Expected output files:
```text
outputs/smoke/handwrite_ch_demo_res.json
outputs/smoke/handwrite_ch_demo.md
```
## Python API Example
```python
from paddleocr_vl_rocm import PaddleOCRVLROCm
pipeline = PaddleOCRVLROCm(
layout_model_dir="models/PP-DocLayoutV3-onnx",
vlm_server_url="http://127.0.0.1:8000/v1",
api_model_name="PaddleOCR-VL-1.5-0.9B",
)
result = pipeline.predict("examples/input/handwrite_ch_demo.png")
result.save_to_json("outputs")
result.save_to_markdown("outputs", pretty=False)
```
## Scope
This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.
## 中文摘要
这个 Hugging Face 仓库的作用是给 `PaddleOCR-VL-ROCm` 提供可直接下载的、已验证的 `PP-DocLayoutV3-onnx` layout 模型。用户克隆 GitHub 项目后,只需要运行下载脚本即可准备模型,不需要安装 Paddle2ONNX,也不需要自己转换模型。
开源项目地址:[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
主要好处:
- 降低安装门槛。
- 避免 Paddle2ONNX 转换差异。
- GitHub 仓库保持轻量,不提交大模型。
- ONNXRuntime 负责 layout,ROCm/vLLM 或 llama.cpp 负责 VLM 推理。
- 已在 1355 张图片上完成验证,full-run success 和 payload alignment 均为 `1355 / 1355`。
|