AlexTransformer
/

PP-DocLayoutV3-onnx

@@ -1,156 +1,160 @@
----
-license: apache-2.0
-tags:
-- paddleocr-vl
-- pp-doclayoutv3
-- onnxruntime
-- document-layout-analysis
-- rocm
-- vllm
-- llama-cpp
-pipeline_tag: object-detection
-library_name: onnxruntime
----
-# PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
-This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).
-???????????????? `PP-DocLayoutV3-onnx` ?????? [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) ??????????????? Paddle?Paddle2ONNX?????? Paddle ???? ONNX???????????????? layout ???
-## Files
-- `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
-- `inference.yml`: model configuration used by the ONNXRuntime pipeline.
-Verified checksums:
-| File | SHA256 |
-|---|---|
-| `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
-| `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |
-## Open-Source Project
-The recommended runtime project is:
-[https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
-`PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
-- Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
-- Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
-- The project exposes both CLI and Python APIs.
-- Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
-- The code repository is open source and uses the MIT license.
-## Why This Helps Users
-This model repository is designed to remove the most painful setup step for users.
-Before this model card existed, users often had to:
-1. Install Paddle/PaddleX dependencies.
-2. Install and configure Paddle2ONNX.
-3. Export PP-DocLayoutV3 by themselves.
-4. Debug model file names, model config files, and ONNXRuntime input compatibility.
-With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:
-```powershell
-pip install -e .[download]
-python scripts/download_ppdoclayoutv3_onnx.py
-```
-The script downloads from this Hugging Face repository by default and prepares:
-```text
-models/PP-DocLayoutV3-onnx/
-  inference.onnx
-  inference.yml
-```
-This gives users a simpler path:
-- No PaddlePaddle runtime is required for inference.
-- No Paddle2ONNX conversion is required.
-- No large model files are stored in the GitHub repo.
-- The same verified model artifact is shared by all users.
-- The GitHub repo stays small, clean, and easy to clone.
-- ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
-## Validation Result
-The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.
-| Item | Result |
-|---|---:|
-| Full-run success | 1355 / 1355 |
-| Payload alignment | 1355 / 1355 |
-| Layout, crop, request order, request payload | Strictly aligned |
-This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
-## Quick Start With PaddleOCR-VL-ROCm
-```powershell
-git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
-cd PaddleOCR-VL-ROCm
-python -m venv .venv
-.\.venv\Scripts\Activate.ps1
-pip install -e .[download]
-python scripts/download_ppdoclayoutv3_onnx.py
-```
-Then run inference with your OpenAI-compatible ROCm VLM endpoint:
-```powershell
-paddleocr-vl-rocm `
-  --input examples/input/handwrite_ch_demo.png `
-  --output outputs/smoke `
-  --layout-model models/PP-DocLayoutV3-onnx `
-  --server-url http://127.0.0.1:8000/v1 `
-  --api-model-name PaddleOCR-VL-1.5-0.9B `
-  --vlm-backend vllm-server
-```
-Expected output files:
-```text
-outputs/smoke/handwrite_ch_demo_res.json
-outputs/smoke/handwrite_ch_demo.md
-```
-## Python API Example
-```python
-from paddleocr_vl_rocm import PaddleOCRVLROCm
-pipeline = PaddleOCRVLROCm(
-    layout_model_dir="models/PP-DocLayoutV3-onnx",
-    vlm_server_url="http://127.0.0.1:8000/v1",
-    api_model_name="PaddleOCR-VL-1.5-0.9B",
-)
-result = pipeline.predict("examples/input/handwrite_ch_demo.png")
-result.save_to_json("outputs")
-result.save_to_markdown("outputs", pretty=False)
-```
-## Scope
-This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.
-## ????
-?? Hugging Face ??????? `PaddleOCR-VL-ROCm` ????????????? `PP-DocLayoutV3-onnx` layout ??????? GitHub ????????????????????????? Paddle2ONNX????????????
-???????[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
-?????
-- ???????
-- ?? Paddle2ONNX ?????
-- GitHub ??????????????
-- ONNXRuntime ?? layout?ROCm/vLLM ? llama.cpp ?? VLM ???
-- ?? 1355 ?????????full-run success ? payload alignment ?? `1355 / 1355`?

+---
+license: apache-2.0
+tags:
+  - paddleocr-vl
+  - pp-doclayoutv3
+  - onnxruntime
+  - document-layout-analysis
+  - rocm
+  - vllm
+  - llama-cpp
+pipeline_tag: object-detection
+library_name: onnxruntime
+---
+# PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
+This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).
+## 中文说明
+本仓库提供已经验证过的 `PP-DocLayoutV3-onnx` 模型文件，供 [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) 直接下载使用。
+用户不需要再安装 Paddle、Paddle2ONNX，也不需要自己从 Paddle 模型导出 ONNX。克隆开源项目后，只需运行下载脚本即可准备 layout 模型。
+## Files
+- `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
+- `inference.yml`: model configuration used by the ONNXRuntime pipeline.
+Verified checksums:
+| File | SHA256 |
+|---|---|
+| `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
+| `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |
+## Open-Source Project
+Recommended runtime project:
+[https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
+`PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
+- Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
+- Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
+- The project exposes both CLI and Python APIs.
+- Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
+- The code repository is open source and uses the MIT license.
+## Why This Helps Users
+This model repository removes the most painful setup step for users.
+Before this model repository, users often had to:
+1. Install Paddle or PaddleX dependencies.
+2. Install and configure Paddle2ONNX.
+3. Export PP-DocLayoutV3 by themselves.
+4. Debug model file names, model config files, and ONNXRuntime input compatibility.
+With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:
+```powershell
+pip install -e .[download]
+python scripts/download_ppdoclayoutv3_onnx.py
+```
+The script downloads from this Hugging Face repository by default and prepares:
+```text
+models/PP-DocLayoutV3-onnx/
+  inference.onnx
+  inference.yml
+```
+This gives users a simpler path:
+- No PaddlePaddle runtime is required for inference.
+- No Paddle2ONNX conversion is required.
+- No large model files are stored in the GitHub repo.
+- The same verified model artifact is shared by all users.
+- The GitHub repo stays small, clean, and easy to clone.
+- ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
+## Validation Result
+The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.
+| Item | Result |
+|---|---:|
+| Full-run success | 1355 / 1355 |
+| Payload alignment | 1355 / 1355 |
+| Layout, crop, request order, request payload | Strictly aligned |
+This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
+## Quick Start With PaddleOCR-VL-ROCm
+```powershell
+git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
+cd PaddleOCR-VL-ROCm
+python -m venv .venv
+.\.venv\Scripts\Activate.ps1
+pip install -e .[download]
+python scripts/download_ppdoclayoutv3_onnx.py
+```
+Then run inference with your OpenAI-compatible ROCm VLM endpoint:
+```powershell
+paddleocr-vl-rocm `
+  --input examples/input/handwrite_ch_demo.png `
+  --output outputs/smoke `
+  --layout-model models/PP-DocLayoutV3-onnx `
+  --server-url http://127.0.0.1:8000/v1 `
+  --api-model-name PaddleOCR-VL-1.5-0.9B `
+  --vlm-backend vllm-server
+```
+Expected output files:
+```text
+outputs/smoke/handwrite_ch_demo_res.json
+outputs/smoke/handwrite_ch_demo.md
+```
+## Python API Example
+```python
+from paddleocr_vl_rocm import PaddleOCRVLROCm
+pipeline = PaddleOCRVLROCm(
+    layout_model_dir="models/PP-DocLayoutV3-onnx",
+    vlm_server_url="http://127.0.0.1:8000/v1",
+    api_model_name="PaddleOCR-VL-1.5-0.9B",
+)
+result = pipeline.predict("examples/input/handwrite_ch_demo.png")
+result.save_to_json("outputs")
+result.save_to_markdown("outputs", pretty=False)
+```
+## Scope
+This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.
+## 中文摘要
+这个 Hugging Face 仓库的作用是给 `PaddleOCR-VL-ROCm` 提供可直接下载的、已验证的 `PP-DocLayoutV3-onnx` layout 模型。用户克隆 GitHub 项目后，只需要运行下载脚本即可准备模型，不需要安装 Paddle2ONNX，也不需要自己转换模型。
+开源项目地址：[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
+主要好处：
+- 降低安装门槛。
+- 避免 Paddle2ONNX 转换差异。
+- GitHub 仓库保持轻量，不提交大模型。
+- ONNXRuntime 负责 layout，ROCm/vLLM 或 llama.cpp 负责 VLM 推理。
+- 已在 1355 张图片上完成验证，full-run success 和 payload alignment 均为 `1355 / 1355`。