AlexTransformer
/

PP-DocLayoutV3-onnx

@@ -6,26 +6,151 @@ tags:
 - onnxruntime
 - document-layout-analysis
 - rocm
 pipeline_tag: object-detection
 ---
-# PP-DocLayoutV3 ONNX
-Verified PP-DocLayoutV3 ONNX layout model for PaddleOCR-VL-ROCm.
-Files:
-- `inference.onnx`
-- `inference.yml`
-Checksums:
-- `inference.onnx`: `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61`
-- `inference.yml`: `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC`
-Usage:
 ```powershell
 pip install -e .[download]
-python scripts/download_ppdoclayoutv3_onnx.py --repo-id AlexTransformer/PP-DocLayoutV3-onnx
 ```

 - onnxruntime
 - document-layout-analysis
 - rocm
+- vllm
+- llama-cpp
 pipeline_tag: object-detection
+library_name: onnxruntime
 ---
+# PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
+This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).
+???????????????? `PP-DocLayoutV3-onnx` ?????? [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) ??????????????? Paddle?Paddle2ONNX?????? Paddle ???? ONNX???????????????? layout ???
+## Files
+- `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
+- `inference.yml`: model configuration used by the ONNXRuntime pipeline.
+Verified checksums:
+| File | SHA256 |
+|---|---|
+| `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
+| `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |
+## Open-Source Project
+The recommended runtime project is:
+[https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
+`PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
+- Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
+- Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
+- The project exposes both CLI and Python APIs.
+- Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
+- The code repository is open source and uses the MIT license.
+## Why This Helps Users
+This model repository is designed to remove the most painful setup step for users.
+Before this model card existed, users often had to:
+1. Install Paddle/PaddleX dependencies.
+2. Install and configure Paddle2ONNX.
+3. Export PP-DocLayoutV3 by themselves.
+4. Debug model file names, model config files, and ONNXRuntime input compatibility.
+With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:
 ```powershell
 pip install -e .[download]
+python scripts/download_ppdoclayoutv3_onnx.py
+```
+The script downloads from this Hugging Face repository by default and prepares:
+```text
+models/PP-DocLayoutV3-onnx/
+  inference.onnx
+  inference.yml
 ```
+This gives users a simpler path:
+- No PaddlePaddle runtime is required for inference.
+- No Paddle2ONNX conversion is required.
+- No large model files are stored in the GitHub repo.
+- The same verified model artifact is shared by all users.
+- The GitHub repo stays small, clean, and easy to clone.
+- ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
+## Validation Result
+The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.
+| Item | Result |
+|---|---:|
+| Full-run success | 1355 / 1355 |
+| Payload alignment | 1355 / 1355 |
+| Layout, crop, request order, request payload | Strictly aligned |
+This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
+## Quick Start With PaddleOCR-VL-ROCm
+```powershell
+git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
+cd PaddleOCR-VL-ROCm
+python -m venv .venv
+.\.venv\Scripts\Activate.ps1
+pip install -e .[download]
+python scripts/download_ppdoclayoutv3_onnx.py
+```
+Then run inference with your OpenAI-compatible ROCm VLM endpoint:
+```powershell
+paddleocr-vl-rocm `
+  --input examples/input/handwrite_ch_demo.png `
+  --output outputs/smoke `
+  --layout-model models/PP-DocLayoutV3-onnx `
+  --server-url http://127.0.0.1:8000/v1 `
+  --api-model-name PaddleOCR-VL-1.5-0.9B `
+  --vlm-backend vllm-server
+```
+Expected output files:
+```text
+outputs/smoke/handwrite_ch_demo_res.json
+outputs/smoke/handwrite_ch_demo.md
+```
+## Python API Example
+```python
+from paddleocr_vl_rocm import PaddleOCRVLROCm
+pipeline = PaddleOCRVLROCm(
+    layout_model_dir="models/PP-DocLayoutV3-onnx",
+    vlm_server_url="http://127.0.0.1:8000/v1",
+    api_model_name="PaddleOCR-VL-1.5-0.9B",
+)
+result = pipeline.predict("examples/input/handwrite_ch_demo.png")
+result.save_to_json("outputs")
+result.save_to_markdown("outputs", pretty=False)
+```
+## Scope
+This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.
+## ????
+?? Hugging Face ??????? `PaddleOCR-VL-ROCm` ????????????? `PP-DocLayoutV3-onnx` layout ??????? GitHub ????????????????????????? Paddle2ONNX????????????
+???????[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
+?????
+- ???????
+- ?? Paddle2ONNX ?????
+- GitHub ??????????????
+- ONNXRuntime ?? layout?ROCm/vLLM ? llama.cpp ?? VLM ???
+- ?? 1355 ?????????full-run success ? payload alignment ?? `1355 / 1355`?