AlexTransformer commited on
Commit
504ba2b
·
verified ·
1 Parent(s): ce8d96e

Fix UTF-8 Chinese text in model card

Browse files
Files changed (1) hide show
  1. README.md +160 -156
README.md CHANGED
@@ -1,156 +1,160 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - paddleocr-vl
5
- - pp-doclayoutv3
6
- - onnxruntime
7
- - document-layout-analysis
8
- - rocm
9
- - vllm
10
- - llama-cpp
11
- pipeline_tag: object-detection
12
- library_name: onnxruntime
13
- ---
14
-
15
- # PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
16
-
17
- This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).
18
-
19
- ???????????????? `PP-DocLayoutV3-onnx` ?????? [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) ??????????????? Paddle?Paddle2ONNX?????? Paddle ???? ONNX???????????????? layout ???
20
-
21
- ## Files
22
-
23
- - `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
24
- - `inference.yml`: model configuration used by the ONNXRuntime pipeline.
25
-
26
- Verified checksums:
27
-
28
- | File | SHA256 |
29
- |---|---|
30
- | `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
31
- | `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |
32
-
33
- ## Open-Source Project
34
-
35
- The recommended runtime project is:
36
-
37
- [https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
38
-
39
- `PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
40
-
41
- - Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
42
- - Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
43
- - The project exposes both CLI and Python APIs.
44
- - Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
45
- - The code repository is open source and uses the MIT license.
46
-
47
- ## Why This Helps Users
48
-
49
- This model repository is designed to remove the most painful setup step for users.
50
-
51
- Before this model card existed, users often had to:
52
-
53
- 1. Install Paddle/PaddleX dependencies.
54
- 2. Install and configure Paddle2ONNX.
55
- 3. Export PP-DocLayoutV3 by themselves.
56
- 4. Debug model file names, model config files, and ONNXRuntime input compatibility.
57
-
58
- With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:
59
-
60
- ```powershell
61
- pip install -e .[download]
62
- python scripts/download_ppdoclayoutv3_onnx.py
63
- ```
64
-
65
- The script downloads from this Hugging Face repository by default and prepares:
66
-
67
- ```text
68
- models/PP-DocLayoutV3-onnx/
69
- inference.onnx
70
- inference.yml
71
- ```
72
-
73
- This gives users a simpler path:
74
-
75
- - No PaddlePaddle runtime is required for inference.
76
- - No Paddle2ONNX conversion is required.
77
- - No large model files are stored in the GitHub repo.
78
- - The same verified model artifact is shared by all users.
79
- - The GitHub repo stays small, clean, and easy to clone.
80
- - ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
81
-
82
- ## Validation Result
83
-
84
- The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.
85
-
86
- | Item | Result |
87
- |---|---:|
88
- | Full-run success | 1355 / 1355 |
89
- | Payload alignment | 1355 / 1355 |
90
- | Layout, crop, request order, request payload | Strictly aligned |
91
-
92
- This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
93
-
94
- ## Quick Start With PaddleOCR-VL-ROCm
95
-
96
- ```powershell
97
- git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
98
- cd PaddleOCR-VL-ROCm
99
- python -m venv .venv
100
- .\.venv\Scripts\Activate.ps1
101
- pip install -e .[download]
102
- python scripts/download_ppdoclayoutv3_onnx.py
103
- ```
104
-
105
- Then run inference with your OpenAI-compatible ROCm VLM endpoint:
106
-
107
- ```powershell
108
- paddleocr-vl-rocm `
109
- --input examples/input/handwrite_ch_demo.png `
110
- --output outputs/smoke `
111
- --layout-model models/PP-DocLayoutV3-onnx `
112
- --server-url http://127.0.0.1:8000/v1 `
113
- --api-model-name PaddleOCR-VL-1.5-0.9B `
114
- --vlm-backend vllm-server
115
- ```
116
-
117
- Expected output files:
118
-
119
- ```text
120
- outputs/smoke/handwrite_ch_demo_res.json
121
- outputs/smoke/handwrite_ch_demo.md
122
- ```
123
-
124
- ## Python API Example
125
-
126
- ```python
127
- from paddleocr_vl_rocm import PaddleOCRVLROCm
128
-
129
- pipeline = PaddleOCRVLROCm(
130
- layout_model_dir="models/PP-DocLayoutV3-onnx",
131
- vlm_server_url="http://127.0.0.1:8000/v1",
132
- api_model_name="PaddleOCR-VL-1.5-0.9B",
133
- )
134
-
135
- result = pipeline.predict("examples/input/handwrite_ch_demo.png")
136
- result.save_to_json("outputs")
137
- result.save_to_markdown("outputs", pretty=False)
138
- ```
139
-
140
- ## Scope
141
-
142
- This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.
143
-
144
- ## ????
145
-
146
- ?? Hugging Face ??????? `PaddleOCR-VL-ROCm` ????????????? `PP-DocLayoutV3-onnx` layout ??????? GitHub ????????????????????????? Paddle2ONNX????????????
147
-
148
- ???????[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
149
-
150
- ?????
151
-
152
- - ???????
153
- - ?? Paddle2ONNX ?????
154
- - GitHub ??????????????
155
- - ONNXRuntime ?? layout?ROCm/vLLM ? llama.cpp ?? VLM ???
156
- - ?? 1355 ?????????full-run success ? payload alignment ?? `1355 / 1355`?
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - paddleocr-vl
5
+ - pp-doclayoutv3
6
+ - onnxruntime
7
+ - document-layout-analysis
8
+ - rocm
9
+ - vllm
10
+ - llama-cpp
11
+ pipeline_tag: object-detection
12
+ library_name: onnxruntime
13
+ ---
14
+
15
+ # PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
16
+
17
+ This repository hosts the verified `PP-DocLayoutV3` ONNX layout model used by the open-source project [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm).
18
+
19
+ ## 中文说明
20
+
21
+ 本仓库提供已经验证过的 `PP-DocLayoutV3-onnx` 模型文件,供 [PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) 直接下载使用。
22
+
23
+ 用户不需要再安装 Paddle、Paddle2ONNX,也不需要自己从 Paddle 模型导出 ONNX。克隆开源项目后,只需运行下载脚本即可准备 layout 模型。
24
+
25
+ ## Files
26
+
27
+ - `inference.onnx`: PP-DocLayoutV3 ONNX layout detection model.
28
+ - `inference.yml`: model configuration used by the ONNXRuntime pipeline.
29
+
30
+ Verified checksums:
31
+
32
+ | File | SHA256 |
33
+ |---|---|
34
+ | `inference.onnx` | `BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61` |
35
+ | `inference.yml` | `506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC` |
36
+
37
+ ## Open-Source Project
38
+
39
+ Recommended runtime project:
40
+
41
+ [https://github.com/AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
42
+
43
+ `PaddleOCR-VL-ROCm` is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
44
+
45
+ - Layout detection runs with ONNXRuntime and this `PP-DocLayoutV3-onnx` model.
46
+ - Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
47
+ - The project exposes both CLI and Python APIs.
48
+ - Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
49
+ - The code repository is open source and uses the MIT license.
50
+
51
+ ## Why This Helps Users
52
+
53
+ This model repository removes the most painful setup step for users.
54
+
55
+ Before this model repository, users often had to:
56
+
57
+ 1. Install Paddle or PaddleX dependencies.
58
+ 2. Install and configure Paddle2ONNX.
59
+ 3. Export PP-DocLayoutV3 by themselves.
60
+ 4. Debug model file names, model config files, and ONNXRuntime input compatibility.
61
+
62
+ With this repository, users can directly download the verified ONNX model used by `PaddleOCR-VL-ROCm`:
63
+
64
+ ```powershell
65
+ pip install -e .[download]
66
+ python scripts/download_ppdoclayoutv3_onnx.py
67
+ ```
68
+
69
+ The script downloads from this Hugging Face repository by default and prepares:
70
+
71
+ ```text
72
+ models/PP-DocLayoutV3-onnx/
73
+ inference.onnx
74
+ inference.yml
75
+ ```
76
+
77
+ This gives users a simpler path:
78
+
79
+ - No PaddlePaddle runtime is required for inference.
80
+ - No Paddle2ONNX conversion is required.
81
+ - No large model files are stored in the GitHub repo.
82
+ - The same verified model artifact is shared by all users.
83
+ - The GitHub repo stays small, clean, and easy to clone.
84
+ - ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
85
+
86
+ ## Validation Result
87
+
88
+ The ONNXRuntime layout path used by `PaddleOCR-VL-ROCm` has been validated against the Paddle native pipeline on 1355 images.
89
+
90
+ | Item | Result |
91
+ |---|---:|
92
+ | Full-run success | 1355 / 1355 |
93
+ | Payload alignment | 1355 / 1355 |
94
+ | Layout, crop, request order, request payload | Strictly aligned |
95
+
96
+ This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
97
+
98
+ ## Quick Start With PaddleOCR-VL-ROCm
99
+
100
+ ```powershell
101
+ git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
102
+ cd PaddleOCR-VL-ROCm
103
+ python -m venv .venv
104
+ .\.venv\Scripts\Activate.ps1
105
+ pip install -e .[download]
106
+ python scripts/download_ppdoclayoutv3_onnx.py
107
+ ```
108
+
109
+ Then run inference with your OpenAI-compatible ROCm VLM endpoint:
110
+
111
+ ```powershell
112
+ paddleocr-vl-rocm `
113
+ --input examples/input/handwrite_ch_demo.png `
114
+ --output outputs/smoke `
115
+ --layout-model models/PP-DocLayoutV3-onnx `
116
+ --server-url http://127.0.0.1:8000/v1 `
117
+ --api-model-name PaddleOCR-VL-1.5-0.9B `
118
+ --vlm-backend vllm-server
119
+ ```
120
+
121
+ Expected output files:
122
+
123
+ ```text
124
+ outputs/smoke/handwrite_ch_demo_res.json
125
+ outputs/smoke/handwrite_ch_demo.md
126
+ ```
127
+
128
+ ## Python API Example
129
+
130
+ ```python
131
+ from paddleocr_vl_rocm import PaddleOCRVLROCm
132
+
133
+ pipeline = PaddleOCRVLROCm(
134
+ layout_model_dir="models/PP-DocLayoutV3-onnx",
135
+ vlm_server_url="http://127.0.0.1:8000/v1",
136
+ api_model_name="PaddleOCR-VL-1.5-0.9B",
137
+ )
138
+
139
+ result = pipeline.predict("examples/input/handwrite_ch_demo.png")
140
+ result.save_to_json("outputs")
141
+ result.save_to_markdown("outputs", pretty=False)
142
+ ```
143
+
144
+ ## Scope
145
+
146
+ This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use [AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm) together with a ROCm-backed OpenAI-compatible VLM service.
147
+
148
+ ## 中文摘要
149
+
150
+ 这个 Hugging Face 仓库的作用是给 `PaddleOCR-VL-ROCm` 提供可直接下载的、已验证的 `PP-DocLayoutV3-onnx` layout 模型。用户克隆 GitHub 项目后,只需要运行下载脚本即可准备模型,不需要安装 Paddle2ONNX,也不需要自己转换模型。
151
+
152
+ 开源项目地址:[AIwork4me/PaddleOCR-VL-ROCm](https://github.com/AIwork4me/PaddleOCR-VL-ROCm)
153
+
154
+ 主要好处:
155
+
156
+ - 降低安装门槛。
157
+ - 避免 Paddle2ONNX 转换差异。
158
+ - GitHub 仓库保持轻量,不提交大模型。
159
+ - ONNXRuntime 负责 layout,ROCm/vLLM 或 llama.cpp 负责 VLM 推理。
160
+ - 已在 1355 张图片上完成验证,full-run success 和 payload alignment 均为 `1355 / 1355`。