Update README.md
Browse files
README.md
CHANGED
|
@@ -68,6 +68,7 @@ To stop the server:
|
|
| 68 |
- [Models](#models)
|
| 69 |
- [Data](#data)
|
| 70 |
- [Usage](#usage)
|
|
|
|
| 71 |
|
| 72 |
## Dependencies
|
| 73 |
* Docker Desktop 4.25.0 [install link](https://www.docker.com/products/docker-desktop/)
|
|
@@ -79,7 +80,7 @@ To stop the server:
|
|
| 79 |
|
| 80 |
## Models
|
| 81 |
|
| 82 |
-
There are two kinds of models in the project. The default model is a visual model which has been trained by
|
| 83 |
Alibaba Research Group. If you would like to take a look at their original project, you can visit
|
| 84 |
[this](https://github.com/AlibabaResearch/AdvancedLiterateMachinery) link. There are various models published by them
|
| 85 |
and according to our benchmarks the best performing model is the one trained with the [DocLayNet](https://github.com/DS4SD/DocLayNet)
|
|
@@ -161,3 +162,30 @@ excluding "footers" and "footnotes," which are positioned at the end of the outp
|
|
| 161 |
Occasionally, we encounter segments like pictures that might not contain text. Since Poppler cannot assign a reading order to these non-text segments,
|
| 162 |
we process them after sorting all segments with content. To determine their reading order, we rely on the reading order of the nearest "non-empty" segment,
|
| 163 |
using distance as a criterion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
- [Models](#models)
|
| 69 |
- [Data](#data)
|
| 70 |
- [Usage](#usage)
|
| 71 |
+
- [Benchmark](#benchmark)
|
| 72 |
|
| 73 |
## Dependencies
|
| 74 |
* Docker Desktop 4.25.0 [install link](https://www.docker.com/products/docker-desktop/)
|
|
|
|
| 80 |
|
| 81 |
## Models
|
| 82 |
|
| 83 |
+
There are two kinds of models in the project. The default model is a visual model (specifically called as Vision Grid Transformer - VGT) which has been trained by
|
| 84 |
Alibaba Research Group. If you would like to take a look at their original project, you can visit
|
| 85 |
[this](https://github.com/AlibabaResearch/AdvancedLiterateMachinery) link. There are various models published by them
|
| 86 |
and according to our benchmarks the best performing model is the one trained with the [DocLayNet](https://github.com/DS4SD/DocLayNet)
|
|
|
|
| 162 |
Occasionally, we encounter segments like pictures that might not contain text. Since Poppler cannot assign a reading order to these non-text segments,
|
| 163 |
we process them after sorting all segments with content. To determine their reading order, we rely on the reading order of the nearest "non-empty" segment,
|
| 164 |
using distance as a criterion.
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
## Benchmark
|
| 168 |
+
|
| 169 |
+
These are the benchmark results for VGT model on PubLayNet dataset:
|
| 170 |
+
|
| 171 |
+
<table>
|
| 172 |
+
<tr>
|
| 173 |
+
<th>Overall</th>
|
| 174 |
+
<th>Text</th>
|
| 175 |
+
<th>Title</th>
|
| 176 |
+
<th>List</th>
|
| 177 |
+
<th>Table</th>
|
| 178 |
+
<th>Figure</th>
|
| 179 |
+
</tr>
|
| 180 |
+
<tr>
|
| 181 |
+
<td>0.962</td>
|
| 182 |
+
<td>0.950</td>
|
| 183 |
+
<td>0.939</td>
|
| 184 |
+
<td>0.968</td>
|
| 185 |
+
<td>0.981</td>
|
| 186 |
+
<td>0.971</td>
|
| 187 |
+
</tr>
|
| 188 |
+
</table>
|
| 189 |
+
|
| 190 |
+
You can check this link to see the comparison with the other models:
|
| 191 |
+
https://paperswithcode.com/sota/document-layout-analysis-on-publaynet-val
|