Update README.md
Browse files
README.md
CHANGED
|
@@ -3,39 +3,47 @@ library_name: transformers
|
|
| 3 |
tags: []
|
| 4 |
---
|
| 5 |
|
| 6 |
-
#
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
-
|
| 33 |
-
|
| 34 |
-
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
### Direct Use
|
| 41 |
|
|
|
|
| 3 |
tags: []
|
| 4 |
---
|
| 5 |
|
| 6 |
+
# DIT-base-layout-detection
|
| 7 |
+
|
| 8 |
+
We present the model cmarkea/dit-base-layout-detection, which allows extracting different layouts (Text, Picture, Caption, Footnote, etc.) from an image of a document.
|
| 9 |
+
This is a fine-tuning of the model [dit-base](https://huggingface.co/microsoft/dit-base) on the [DocLayNet](https://huggingface.co/datasets/ds4sd/DocLayNet)
|
| 10 |
+
dataset. This model can jointly predict masks and bounding boxes for documentary objects. It is ideal for processing documentary corpora to be ingested into an
|
| 11 |
+
ODQA system.
|
| 12 |
+
|
| 13 |
+
This model allows extracting 11 entities, which are: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title.
|
| 14 |
+
|
| 15 |
+
## Performance
|
| 16 |
+
|
| 17 |
+
In this section, we will assess the model's performance by separately considering semantic segmentation and object detection. In both cases, no post-processing was
|
| 18 |
+
applied after estimation.
|
| 19 |
+
|
| 20 |
+
For semantic segmentation, we will use the F1-score to evaluate the classification of each pixel. For object detection, we will assess performance based on the
|
| 21 |
+
Generalized Intersection over Union (GIoU) and the accuracy of the predicted bounding box class. The evaluation is conducted on 500 pages from the PDF evaluation
|
| 22 |
+
dataset of DocLayNet.
|
| 23 |
+
|
| 24 |
+
| Class | f1-score (x100) | GIoU (x100) | accuracy (x100) |
|
| 25 |
+
|:--------------:|:---------------:|:-----------:|:---------------:|
|
| 26 |
+
| Background | 94.98 | NA | NA |
|
| 27 |
+
| Caption | 75.54 | 55.61 | 72.62 |
|
| 28 |
+
| Footnote | 72.29 | 50.08 | 70.97 |
|
| 29 |
+
| Formula | 82.29 | 49.91 | 94.48 |
|
| 30 |
+
| List-item | 67.56 | 35.19 | 69 |
|
| 31 |
+
| Page-footer | 83.93 | 57.99 | 94.06 |
|
| 32 |
+
| Page-header | 62.33 | 65.25 | 79.39 |
|
| 33 |
+
| Picture | 78.32 | 58.22 | 92.71 |
|
| 34 |
+
| Section-header | 69.55 | 56.64 | 78.29 |
|
| 35 |
+
| Table | 83.69 | 63.03 | 90.13 |
|
| 36 |
+
| Text | 90.94 | 51.89 | 88.09 |
|
| 37 |
+
| Title | 61.19 | 52.64 | 70 |
|
| 38 |
+
|
| 39 |
+
## Benchmark
|
| 40 |
+
|
| 41 |
+
Now, let's compare the performance of this model with other models.
|
| 42 |
+
|
| 43 |
+
| Class | f1-score (x100) | GIoU (x100) | accuracy (x100) |
|
| 44 |
+
|:---------------------------------------------------------------------------------------------:|:---------------:|:-----------:|:---------------:|
|
| 45 |
+
| cmarkea/dit-base-layout-detection | 90.77 | 56.29 | 85.26 |
|
| 46 |
+
| [cmarkea/detr-layout-detection](https://huggingface.co/cmarkea/detr-layout-detection) | 84.23 | 43.84 | 71.98 |
|
| 47 |
|
| 48 |
### Direct Use
|
| 49 |
|