GradientDescentMe commited on
Commit
abac692
·
verified ·
1 Parent(s): fc08c11

Initial open-source release

Browse files
Files changed (4) hide show
  1. .gitattributes +0 -2
  2. README.md +64 -9
  3. README_zh.md +64 -9
  4. layout模型benchmark.xlsx +0 -0
.gitattributes CHANGED
@@ -5,5 +5,3 @@
5
  *.safetensors filter=lfs diff=lfs merge=lfs -text
6
  *.onnx filter=lfs diff=lfs merge=lfs -text
7
  *.xlsx filter=lfs diff=lfs merge=lfs -text
8
- python=3.12/lib/python3.12/site-packages/hf_xet/hf_xet.abi3.so filter=lfs diff=lfs merge=lfs -text
9
- python=3.12/lib/python3.12/site-packages/yaml/_yaml.cpython-312-darwin.so filter=lfs diff=lfs merge=lfs -text
 
5
  *.safetensors filter=lfs diff=lfs merge=lfs -text
6
  *.onnx filter=lfs diff=lfs merge=lfs -text
7
  *.xlsx filter=lfs diff=lfs merge=lfs -text
 
 
README.md CHANGED
@@ -26,16 +26,11 @@ English | [简体中文](README_zh.md)
26
 
27
  Hiro-Layout is a document layout analysis model for patent and technical PDF pages. It detects and classifies page regions such as text, titles, headers, footers, tables, formulas, chemical structures, figures, captions, search reports, bibliographies, and other patent-specific layout elements.
28
 
29
- This repository is prepared for open release in the same style as PatSnap open model cards such as [Hiro-MOSS-OCR-0.3B](https://huggingface.co/PatSnap/Hiro-MOSS-OCR-0.3B) and [TranslationGPT-1.2](https://huggingface.co/PatSnap/TranslationGPT-1.2).
30
-
31
- > Release note: model weights, inference code, exact architecture details, and dataset release permissions should be confirmed before publishing.
32
-
33
  ## Highlights
34
 
35
  - Patent-focused layout understanding: covers common patent PDF regions and patent-specific structures.
36
  - Technical document coverage: evaluated on both patent PDFs and NPD PDFs.
37
  - Fine-grained taxonomy: 25 layout categories across figure, text, and complex document elements.
38
- - Open evaluation summary: benchmark results are included in `layout模型benchmark.xlsx` and summarized in [EVALUATION.md](EVALUATION.md).
39
 
40
  ## Model Overview
41
 
@@ -82,14 +77,74 @@ This repository is prepared for open release in the same style as PatSnap open m
82
 
83
  ## Benchmarks
84
 
85
- The benchmark workbook contains two sheets: `Patent PDF` and `NPD PDF`. The `ALL` rows from the workbook are shown below.
86
 
87
  | Benchmark | Labels | Precision | Recall | F1 |
88
  | --- | ---: | ---: | ---: | ---: |
89
  | Patent PDF | 33,054 | 0.8144 | 0.7711 | 0.7922 |
90
  | NPD PDF | 17,769 | 0.7090 | 0.6983 | 0.7036 |
91
 
92
- See [EVALUATION.md](EVALUATION.md) for the per-class results.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
  ## Usage
95
 
@@ -99,7 +154,7 @@ The current model artifact is an ONNX export:
99
  layout_model/RT-DETR_25.onnx
100
  ```
101
 
102
- The exact preprocessing and postprocessing code should be published with the model. A minimal runtime will typically use ONNXRuntime:
103
 
104
  ```python
105
  import onnxruntime as ort
@@ -109,7 +164,7 @@ print("inputs:", [i.name for i in session.get_inputs()])
109
  print("outputs:", [o.name for o in session.get_outputs()])
110
  ```
111
 
112
- Before publishing, replace this section with the final image preprocessing, class-id mapping, confidence thresholding, non-maximum suppression or decoder logic, and bounding box format.
113
 
114
  ## Repository Files
115
 
 
26
 
27
  Hiro-Layout is a document layout analysis model for patent and technical PDF pages. It detects and classifies page regions such as text, titles, headers, footers, tables, formulas, chemical structures, figures, captions, search reports, bibliographies, and other patent-specific layout elements.
28
 
 
 
 
 
29
  ## Highlights
30
 
31
  - Patent-focused layout understanding: covers common patent PDF regions and patent-specific structures.
32
  - Technical document coverage: evaluated on both patent PDFs and NPD PDFs.
33
  - Fine-grained taxonomy: 25 layout categories across figure, text, and complex document elements.
 
34
 
35
  ## Model Overview
36
 
 
77
 
78
  ## Benchmarks
79
 
80
+ Metrics are reported as Precision, Recall, and F1.
81
 
82
  | Benchmark | Labels | Precision | Recall | F1 |
83
  | --- | ---: | ---: | ---: | ---: |
84
  | Patent PDF | 33,054 | 0.8144 | 0.7711 | 0.7922 |
85
  | NPD PDF | 17,769 | 0.7090 | 0.6983 | 0.7036 |
86
 
87
+ ### Patent PDF
88
+
89
+ | # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 |
90
+ |---:|---|---|---|---|---:|---:|---:|---:|
91
+ | 1 | figure | graph | graph | 图表 | 215 | 0.7611 | 0.8000 | 0.7800 |
92
+ | 2 | figure | draw | drawing | 绘制图 | 420 | 0.8649 | 0.3048 | 0.4507 |
93
+ | 3 | figure | struc | structure diagram | 结构图 | 626 | 0.6579 | 0.8355 | 0.7361 |
94
+ | 4 | figure | photo | photograph | 照片 | 147 | 0.8378 | 0.8435 | 0.8407 |
95
+ | 5 | figure | tab | table | 表格 | 198 | 0.7759 | 0.9091 | 0.8372 |
96
+ | 6 | figure | eqn | math equation | 数学公式 | 399 | 0.7762 | 0.6692 | 0.7187 |
97
+ | 7 | figure | chem | chemical formula | 化学式 | 1,099 | 0.8792 | 0.8944 | 0.8868 |
98
+ | 8 | figure | noise | noise | 噪声 | 1,241 | 0.7025 | 0.7687 | 0.7341 |
99
+ | 9 | text | text | text | 文本 | 17,668 | 0.8182 | 0.8062 | 0.8122 |
100
+ | 10 | text | title | title | 标题 | 601 | 0.9117 | 0.8070 | 0.8561 |
101
+ | 11 | text | sec | section title | 章节标题 | 1,394 | 0.7968 | 0.7088 | 0.7502 |
102
+ | 12 | text | head | page header | 页眉 | 3,074 | 0.8187 | 0.7788 | 0.7983 |
103
+ | 13 | text | foot | page footer | 页脚 | 1,012 | 0.7432 | 0.6433 | 0.6896 |
104
+ | 14 | text | mnote | marginal note | 边注 | 421 | 0.7794 | 0.5202 | 0.6239 |
105
+ | 15 | text | cap | caption | 说明 | 80 | 0.6842 | 0.4875 | 0.5693 |
106
+ | 16 | text | figno | figure number | 编号 | 1,389 | 0.8955 | 0.7466 | 0.8143 |
107
+ | 17 | text | lineno | line number | 行号 | 341 | 0.7759 | 0.6598 | 0.7132 |
108
+ | 18 | text | colno | column number | 栏号 | 449 | 0.6964 | 0.4699 | 0.5612 |
109
+ | 19 | text | seq | sequence | 序列表 | 136 | 0.4430 | 0.2574 | 0.3256 |
110
+ | 20 | complex | figcx | figure complex | 图片组 | 1,416 | 0.8657 | 0.7373 | 0.7963 |
111
+ | 21 | complex | rxn | chemical reaction | 反应式 | 150 | 0.8898 | 0.7000 | 0.7836 |
112
+ | 22 | complex | bib | bibliography | 著录页 | 470 | 0.9615 | 0.7979 | 0.8721 |
113
+ | 23 | complex | srep | search report | 搜索报告 | 106 | 0.9052 | 0.9906 | 0.9459 |
114
+ | 24 | complex | toc | Table of Contents | 目录 | 0 | 0.0000 | 0.0000 | 0.0000 |
115
+ | 25 | complex | ref | reference | 参考文献 | 2 | 0.0000 | 0.0000 | 0.0000 |
116
+ | ALL | | | | | 33,054 | 0.8144 | 0.7711 | 0.7922 |
117
+
118
+ ### NPD PDF
119
+
120
+ | # | Group | Abbr. | Class | Chinese | Labels | Precision | Recall | F1 |
121
+ |---:|---|---|---|---|---:|---:|---:|---:|
122
+ | 1 | figure | graph | graph | 图表 | 248 | 0.6838 | 0.6976 | 0.6906 |
123
+ | 2 | figure | draw | drawing | 绘制图 | 9 | 0.0000 | 0.0000 | 0.0000 |
124
+ | 3 | figure | struc | structure diagram | 结构图 | 341 | 0.7454 | 0.7126 | 0.7286 |
125
+ | 4 | figure | photo | photograph | 照片 | 82 | 0.6071 | 0.6220 | 0.6145 |
126
+ | 5 | figure | tab | table | 表格 | 209 | 0.7533 | 0.8182 | 0.7844 |
127
+ | 6 | figure | eqn | math equation | 数学公式 | 298 | 0.6789 | 0.5604 | 0.6140 |
128
+ | 7 | figure | chem | chemical formula | 化学式 | 388 | 0.7324 | 0.8325 | 0.7793 |
129
+ | 8 | figure | noise | noise | 噪声 | 695 | 0.4823 | 0.4302 | 0.4548 |
130
+ | 9 | text | text | text | 文本 | 9,119 | 0.6943 | 0.7625 | 0.7268 |
131
+ | 10 | text | title | title | 标题 | 304 | 0.7130 | 0.5395 | 0.6142 |
132
+ | 11 | text | sec | section title | 章节标题 | 1,539 | 0.7337 | 0.6160 | 0.6697 |
133
+ | 12 | text | head | page header | 页眉 | 1,246 | 0.7464 | 0.7111 | 0.7283 |
134
+ | 13 | text | foot | page footer | 页�� | 1,339 | 0.7711 | 0.6468 | 0.7035 |
135
+ | 14 | text | mnote | marginal note | 边注 | 190 | 0.5714 | 0.2947 | 0.3889 |
136
+ | 15 | text | cap | caption | 说明 | 573 | 0.8711 | 0.5899 | 0.7034 |
137
+ | 16 | text | figno | figure number | 编号 | 149 | 0.6078 | 0.4161 | 0.4940 |
138
+ | 17 | text | lineno | line number | 行号 | 41 | 0.6667 | 0.9268 | 0.7755 |
139
+ | 18 | text | colno | column number | 栏号 | 0 | 0.0000 | 0.0000 | 0.0000 |
140
+ | 19 | text | seq | sequence | 序列表 | 18 | 0.7000 | 0.3889 | 0.5000 |
141
+ | 20 | complex | figcx | figure complex | 图片组 | 734 | 0.7657 | 0.7480 | 0.7567 |
142
+ | 21 | complex | rxn | chemical reaction | 反应式 | 36 | 0.8947 | 0.4722 | 0.6182 |
143
+ | 22 | complex | bib | bibliography | 著录页 | 0 | 0.0000 | 0.0000 | 0.0000 |
144
+ | 23 | complex | srep | search report | 搜索报告 | 3 | 0.4286 | 1.0000 | 0.6000 |
145
+ | 24 | complex | toc | Table of Contents | 目录 | 76 | 0.8475 | 0.6579 | 0.7407 |
146
+ | 25 | complex | ref | reference | 参考文献 | 132 | 0.8148 | 0.3333 | 0.4731 |
147
+ | ALL | | | | | 17,769 | 0.7090 | 0.6983 | 0.7036 |
148
 
149
  ## Usage
150
 
 
154
  layout_model/RT-DETR_25.onnx
155
  ```
156
 
157
+ The model can be loaded with ONNXRuntime:
158
 
159
  ```python
160
  import onnxruntime as ort
 
164
  print("outputs:", [o.name for o in session.get_outputs()])
165
  ```
166
 
167
+ Use `labels.json` for the 25-class label mapping.
168
 
169
  ## Repository Files
170
 
README_zh.md CHANGED
@@ -26,16 +26,11 @@ library_name: transformers
26
 
27
  Hiro-Layout 是一个面向专利和技术 PDF 页面图像的文档版面分析模型,用于检测并分类页面区域,包括正文、标题、页眉、页脚、表格、公式、化学式、图片、图注、搜索报告、著录页、参考文献等专利场景常见版面元素。
28
 
29
- 本仓库按 PatSnap 已发布开源模型卡的结构准备,例如 [Hiro-MOSS-OCR-0.3B](https://huggingface.co/PatSnap/Hiro-MOSS-OCR-0.3B) 和 [TranslationGPT-1.2](https://huggingface.co/PatSnap/TranslationGPT-1.2)。
30
-
31
- > 发布前请确认:模型权重、推理代码、准确架构信息、数据集和评测结果是否满足公开发布要求。
32
-
33
  ## 亮点
34
 
35
  - 面向专利文档:覆盖专利 PDF 中常见的正文、图片、表格、公式、著录页、搜索报告等元素。
36
  - 覆盖技术文档:在 Patent PDF 和 NPD PDF 两类数据上评测。
37
  - 细粒度类别体系:共 25 个版面类别,覆盖 figure、text、complex 三组元素。
38
- - 评测结果可追溯:原始评测数据保存在 `layout模型benchmark.xlsx`,详细结果见 [EVALUATION.md](EVALUATION.md)。
39
 
40
  ## 模型概览
41
 
@@ -82,14 +77,74 @@ Hiro-Layout 是一个面向专利和技术 PDF 页面图像的文档版面分析
82
 
83
  ## 评测结果
84
 
85
- 评测 Excel 包含 `Patent PDF` `NPD PDF` 两张表下表摘录 Excel 中的 `ALL` 汇总行。
86
 
87
  | 数据集 | 人工标签数 | Precision | Recall | F1 |
88
  | --- | ---: | ---: | ---: | ---: |
89
  | Patent PDF | 33,054 | 0.8144 | 0.7711 | 0.7922 |
90
  | NPD PDF | 17,769 | 0.7090 | 0.6983 | 0.7036 |
91
 
92
- 各类别指标见 [EVALUATION.md](EVALUATION.md)。
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
  ## 使用方式
95
 
@@ -99,7 +154,7 @@ Hiro-Layout 是一个面向专利和技术 PDF 页面图像的文档版面分析
99
  layout_model/RT-DETR_25.onnx
100
  ```
101
 
102
- 正式开源时建议同步发布图像预处理和后处理代码。最小运行环境通常可使用 ONNXRuntime 加载模型
103
 
104
  ```python
105
  import onnxruntime as ort
@@ -109,7 +164,7 @@ print("inputs:", [i.name for i in session.get_inputs()])
109
  print("outputs:", [o.name for o in session.get_outputs()])
110
  ```
111
 
112
- 发布前请补充最终的图像预处理、类别 ID 映射、置信度阈值、NMS 或 decoder 逻辑,以及 bbox 坐标格式
113
 
114
  ## 文件说明
115
 
 
26
 
27
  Hiro-Layout 是一个面向专利和技术 PDF 页面图像的文档版面分析模型,用于检测并分类页面区域,包括正文、标题、页眉、页脚、表格、公式、化学式、图片、图注、搜索报告、著录页、参考文献等专利场景常见版面元素。
28
 
 
 
 
 
29
  ## 亮点
30
 
31
  - 面向专利文档:覆盖专利 PDF 中常见的正文、图片、表格、公式、著录页、搜索报告等元素。
32
  - 覆盖技术文档:在 Patent PDF 和 NPD PDF 两类数据上评测。
33
  - 细粒度类别体系:共 25 个版面类别,覆盖 figure、text、complex 三组元素。
 
34
 
35
  ## 模型概览
36
 
 
77
 
78
  ## 评测结果
79
 
80
+ 评测指标为 Precision、RecallF1
81
 
82
  | 数据集 | 人工标签数 | Precision | Recall | F1 |
83
  | --- | ---: | ---: | ---: | ---: |
84
  | Patent PDF | 33,054 | 0.8144 | 0.7711 | 0.7922 |
85
  | NPD PDF | 17,769 | 0.7090 | 0.6983 | 0.7036 |
86
 
87
+ ### Patent PDF
88
+
89
+ | # | 大类 | 缩写 | 类别全称 | 中文名 | 人工标签数 | Precision | Recall | F1 |
90
+ |---:|---|---|---|---|---:|---:|---:|---:|
91
+ | 1 | figure | graph | graph | 图表 | 215 | 0.7611 | 0.8000 | 0.7800 |
92
+ | 2 | figure | draw | drawing | 绘制图 | 420 | 0.8649 | 0.3048 | 0.4507 |
93
+ | 3 | figure | struc | structure diagram | 结构图 | 626 | 0.6579 | 0.8355 | 0.7361 |
94
+ | 4 | figure | photo | photograph | 照片 | 147 | 0.8378 | 0.8435 | 0.8407 |
95
+ | 5 | figure | tab | table | 表格 | 198 | 0.7759 | 0.9091 | 0.8372 |
96
+ | 6 | figure | eqn | math equation | 数学公式 | 399 | 0.7762 | 0.6692 | 0.7187 |
97
+ | 7 | figure | chem | chemical formula | 化学式 | 1,099 | 0.8792 | 0.8944 | 0.8868 |
98
+ | 8 | figure | noise | noise | 噪声 | 1,241 | 0.7025 | 0.7687 | 0.7341 |
99
+ | 9 | text | text | text | 文本 | 17,668 | 0.8182 | 0.8062 | 0.8122 |
100
+ | 10 | text | title | title | 标题 | 601 | 0.9117 | 0.8070 | 0.8561 |
101
+ | 11 | text | sec | section title | 章节标题 | 1,394 | 0.7968 | 0.7088 | 0.7502 |
102
+ | 12 | text | head | page header | 页眉 | 3,074 | 0.8187 | 0.7788 | 0.7983 |
103
+ | 13 | text | foot | page footer | 页脚 | 1,012 | 0.7432 | 0.6433 | 0.6896 |
104
+ | 14 | text | mnote | marginal note | 边注 | 421 | 0.7794 | 0.5202 | 0.6239 |
105
+ | 15 | text | cap | caption | 说明 | 80 | 0.6842 | 0.4875 | 0.5693 |
106
+ | 16 | text | figno | figure number | 编号 | 1,389 | 0.8955 | 0.7466 | 0.8143 |
107
+ | 17 | text | lineno | line number | 行号 | 341 | 0.7759 | 0.6598 | 0.7132 |
108
+ | 18 | text | colno | column number | 栏号 | 449 | 0.6964 | 0.4699 | 0.5612 |
109
+ | 19 | text | seq | sequence | 序列表 | 136 | 0.4430 | 0.2574 | 0.3256 |
110
+ | 20 | complex | figcx | figure complex | 图片组 | 1,416 | 0.8657 | 0.7373 | 0.7963 |
111
+ | 21 | complex | rxn | chemical reaction | 反应式 | 150 | 0.8898 | 0.7000 | 0.7836 |
112
+ | 22 | complex | bib | bibliography | 著录页 | 470 | 0.9615 | 0.7979 | 0.8721 |
113
+ | 23 | complex | srep | search report | 搜索报告 | 106 | 0.9052 | 0.9906 | 0.9459 |
114
+ | 24 | complex | toc | Table of Contents | 目录 | 0 | 0.0000 | 0.0000 | 0.0000 |
115
+ | 25 | complex | ref | reference | 参考文献 | 2 | 0.0000 | 0.0000 | 0.0000 |
116
+ | ALL | | | | | 33,054 | 0.8144 | 0.7711 | 0.7922 |
117
+
118
+ ### NPD PDF
119
+
120
+ | # | 大类 | 缩写 | 类别全称 | 中文名 | 人工标签数 | Precision | Recall | F1 |
121
+ |---:|---|---|---|---|---:|---:|---:|---:|
122
+ | 1 | figure | graph | graph | 图表 | 248 | 0.6838 | 0.6976 | 0.6906 |
123
+ | 2 | figure | draw | drawing | 绘制图 | 9 | 0.0000 | 0.0000 | 0.0000 |
124
+ | 3 | figure | struc | structure diagram | 结构图 | 341 | 0.7454 | 0.7126 | 0.7286 |
125
+ | 4 | figure | photo | photograph | 照片 | 82 | 0.6071 | 0.6220 | 0.6145 |
126
+ | 5 | figure | tab | table | 表格 | 209 | 0.7533 | 0.8182 | 0.7844 |
127
+ | 6 | figure | eqn | math equation | 数学公式 | 298 | 0.6789 | 0.5604 | 0.6140 |
128
+ | 7 | figure | chem | chemical formula | 化学式 | 388 | 0.7324 | 0.8325 | 0.7793 |
129
+ | 8 | figure | noise | noise | 噪声 | 695 | 0.4823 | 0.4302 | 0.4548 |
130
+ | 9 | text | text | text | 文本 | 9,119 | 0.6943 | 0.7625 | 0.7268 |
131
+ | 10 | text | title | title | 标题 | 304 | 0.7130 | 0.5395 | 0.6142 |
132
+ | 11 | text | sec | section title | 章节标题 | 1,539 | 0.7337 | 0.6160 | 0.6697 |
133
+ | 12 | text | head | page header | 页眉 | 1,246 | 0.7464 | 0.7111 | 0.7283 |
134
+ | 13 | text | foot | page footer | 页脚 | 1,339 | 0.7711 | 0.6468 | 0.7035 |
135
+ | 14 | text | mnote | marginal note | 边注 | 190 | 0.5714 | 0.2947 | 0.3889 |
136
+ | 15 | text | cap | caption | 说明 | 573 | 0.8711 | 0.5899 | 0.7034 |
137
+ | 16 | text | figno | figure number | 编号 | 149 | 0.6078 | 0.4161 | 0.4940 |
138
+ | 17 | text | lineno | line number | 行号 | 41 | 0.6667 | 0.9268 | 0.7755 |
139
+ | 18 | text | colno | column number | 栏号 | 0 | 0.0000 | 0.0000 | 0.0000 |
140
+ | 19 | text | seq | sequence | 序列表 | 18 | 0.7000 | 0.3889 | 0.5000 |
141
+ | 20 | complex | figcx | figure complex | 图片组 | 734 | 0.7657 | 0.7480 | 0.7567 |
142
+ | 21 | complex | rxn | chemical reaction | 反应式 | 36 | 0.8947 | 0.4722 | 0.6182 |
143
+ | 22 | complex | bib | bibliography | 著录页 | 0 | 0.0000 | 0.0000 | 0.0000 |
144
+ | 23 | complex | srep | search report | 搜索报告 | 3 | 0.4286 | 1.0000 | 0.6000 |
145
+ | 24 | complex | toc | Table of Contents | 目录 | 76 | 0.8475 | 0.6579 | 0.7407 |
146
+ | 25 | complex | ref | reference | 参考文献 | 132 | 0.8148 | 0.3333 | 0.4731 |
147
+ | ALL | | | | | 17,769 | 0.7090 | 0.6983 | 0.7036 |
148
 
149
  ## 使用方式
150
 
 
154
  layout_model/RT-DETR_25.onnx
155
  ```
156
 
157
+ 模型可使用 ONNXRuntime 加载:
158
 
159
  ```python
160
  import onnxruntime as ort
 
164
  print("outputs:", [o.name for o in session.get_outputs()])
165
  ```
166
 
167
+ 25 类标签映射 `labels.json`
168
 
169
  ## 文件说明
170
 
layout模型benchmark.xlsx CHANGED
Binary files "a/layout\346\250\241\345\236\213benchmark.xlsx" and "b/layout\346\250\241\345\236\213benchmark.xlsx" differ