AlexTransformer commited on
Commit
ce8d96e
·
verified ·
1 Parent(s): 7bad576

Update model card with PaddleOCR-VL-ROCm project details

Browse files
Files changed (1) hide show
  1. README.md +135 -10
README.md CHANGED
@@ -6,26 +6,151 @@ tags:
6
  - onnxruntime
7
  - document-layout-analysis
8
  - rocm
 
 
9
  pipeline_tag: object-detection
 
10
  ---
11
 
12
- # PP-DocLayoutV3 ONNX
13
 
14
- Verified PP-DocLayoutV3 ONNX layout model for PaddleOCR-VL-ROCm.
15
 
16
- Files:
17
 
18
- - `inference.onnx`
19
- - `inference.yml`
20
 
21
- Checksums:
 
22
 
23
- - `inference.onnx`: `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61`
24
- - `inference.yml`: `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC`
25
 
26
- Usage:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ```powershell
29
  pip install -e .[download]
30
- python scripts/download_ppdoclayoutv3_onnx.py --repo-id AlexTransformer/PP-DocLayoutV3-onnx
 
 
 
 
 
 
 
 
31
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - onnxruntime
7
  - document-layout-analysis
8
  - rocm
9
+ - vllm
10
+ - llama-cpp
11
  pipeline_tag: object-detection
12
+ library_name: onnxruntime
13
  ---
14
 
15
+ # PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
16
 
17
+ This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).
18
 
19
+ ???????????????? `PP-DocLayoutV3-onnx` ?????? [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) ??????????????? Paddle?Paddle2ONNX?????? Paddle ???? ONNX???????????????? layout ???
20
 
21
+ ## Files
 
22
 
23
+ - `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
24
+ - `inference.yml`: model configuration used by the ONNXRuntime pipeline.
25
 
26
+ Verified checksums:
 
27
 
28
+ | File | SHA256 |
29
+ |---|---|
30
+ | `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
31
+ | `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |
32
+
33
+ ## Open-Source Project
34
+
35
+ The recommended runtime project is:
36
+
37
+ [https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
38
+
39
+ `PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
40
+
41
+ - Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
42
+ - Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
43
+ - The project exposes both CLI and Python APIs.
44
+ - Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
45
+ - The code repository is open source and uses the MIT license.
46
+
47
+ ## Why This Helps Users
48
+
49
+ This model repository is designed to remove the most painful setup step for users.
50
+
51
+ Before this model card existed, users often had to:
52
+
53
+ 1. Install Paddle/PaddleX dependencies.
54
+ 2. Install and configure Paddle2ONNX.
55
+ 3. Export PP-DocLayoutV3 by themselves.
56
+ 4. Debug model file names, model config files, and ONNXRuntime input compatibility.
57
+
58
+ With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:
59
 
60
  ```powershell
61
  pip install -e .[download]
62
+ python scripts/download_ppdoclayoutv3_onnx.py
63
+ ```
64
+
65
+ The script downloads from this Hugging Face repository by default and prepares:
66
+
67
+ ```text
68
+ models/PP-DocLayoutV3-onnx/
69
+ inference.onnx
70
+ inference.yml
71
  ```
72
+
73
+ This gives users a simpler path:
74
+
75
+ - No PaddlePaddle runtime is required for inference.
76
+ - No Paddle2ONNX conversion is required.
77
+ - No large model files are stored in the GitHub repo.
78
+ - The same verified model artifact is shared by all users.
79
+ - The GitHub repo stays small, clean, and easy to clone.
80
+ - ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
81
+
82
+ ## Validation Result
83
+
84
+ The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.
85
+
86
+ | Item | Result |
87
+ |---|---:|
88
+ | Full-run success | 1355 / 1355 |
89
+ | Payload alignment | 1355 / 1355 |
90
+ | Layout, crop, request order, request payload | Strictly aligned |
91
+
92
+ This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
93
+
94
+ ## Quick Start With PaddleOCR-VL-ROCm
95
+
96
+ ```powershell
97
+ git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
98
+ cd PaddleOCR-VL-ROCm
99
+ python -m venv .venv
100
+ .\.venv\Scripts\Activate.ps1
101
+ pip install -e .[download]
102
+ python scripts/download_ppdoclayoutv3_onnx.py
103
+ ```
104
+
105
+ Then run inference with your OpenAI-compatible ROCm VLM endpoint:
106
+
107
+ ```powershell
108
+ paddleocr-vl-rocm `
109
+ --input examples/input/handwrite_ch_demo.png `
110
+ --output outputs/smoke `
111
+ --layout-model models/PP-DocLayoutV3-onnx `
112
+ --server-url http://127.0.0.1:8000/v1 `
113
+ --api-model-name PaddleOCR-VL-1.5-0.9B `
114
+ --vlm-backend vllm-server
115
+ ```
116
+
117
+ Expected output files:
118
+
119
+ ```text
120
+ outputs/smoke/handwrite_ch_demo_res.json
121
+ outputs/smoke/handwrite_ch_demo.md
122
+ ```
123
+
124
+ ## Python API Example
125
+
126
+ ```python
127
+ from paddleocr_vl_rocm import PaddleOCRVLROCm
128
+
129
+ pipeline = PaddleOCRVLROCm(
130
+ layout_model_dir="models/PP-DocLayoutV3-onnx",
131
+ vlm_server_url="http://127.0.0.1:8000/v1",
132
+ api_model_name="PaddleOCR-VL-1.5-0.9B",
133
+ )
134
+
135
+ result = pipeline.predict("examples/input/handwrite_ch_demo.png")
136
+ result.save_to_json("outputs")
137
+ result.save_to_markdown("outputs", pretty=False)
138
+ ```
139
+
140
+ ## Scope
141
+
142
+ This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.
143
+
144
+ ## ????
145
+
146
+ ?? Hugging Face ??????? `PaddleOCR-VL-ROCm` ????????????? `PP-DocLayoutV3-onnx` layout ??????? GitHub ????????????????????????? Paddle2ONNX????????????
147
+
148
+ ???????[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
149
+
150
+ ?????
151
+
152
+ - ???????
153
+ - ?? Paddle2ONNX ?????
154
+ - GitHub ??????????????
155
+ - ONNXRuntime ?? layout?ROCm/vLLM ? llama.cpp ?? VLM ???
156
+ - ?? 1355 ?????????full-run success ? payload alignment ?? `1355 / 1355`?