File size: 3,618 Bytes
4549518
 
8f7512e
064b20a
 
 
 
 
 
8f7512e
064b20a
8f7512e
064b20a
 
8f7512e
24e77e1
8f7512e
24e77e1
0a7748e
24e77e1
d2f7897
 
 
 
10a0521
d2f7897
 
8f7512e
24e77e1
2f6af40
 
 
 
24e77e1
8f7512e
24e77e1
2f6af40
 
 
 
 
 
 
 
 
 
 
24e77e1
b179d75
 
 
 
 
 
8f7512e
24e77e1
8f7512e
24e77e1
2f6af40
 
24e77e1
3d8e705
24e77e1
 
3d8e705
24e77e1
8f7512e
d2f7897
2f6af40
8f7512e
 
2f6af40
24e77e1
8f7512e
 
5b9cbe8
24e77e1
8f7512e
 
 
 
 
 
 
 
 
 
 
24e77e1
8f7512e
 
 
 
 
 
 
24e77e1
8f7512e
 
 
24e77e1
8f7512e
 
24e77e1
8f7512e
 
 
24e77e1
2f6af40
8f7512e
 
 
 
2f6af40
24e77e1
 
 
 
d2f7897
 
 
 
 
 
24e77e1
 
d2f7897
 
24e77e1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
license: apache-2.0
tags:
- object-detection
- document-layout-analysis
- historical-documents
- layoutparser
- detectron2
- mask-rcnn
language:
- sv
pipeline_tag: object-detection
base_model:
- layoutparser/detectron2
---

# Historical Document Layout Detection Model

A fine-tuned Mask R-CNN model (via LayoutParser/Detectron2) for detecting layout elements in historical Swedish medical journal pages. This is a lighter model and the more complex model can be found here [Swemper-layout](https://huggingface.co/cdhu-uu/SweMPer-layout).

This model was developed as part of the research project:  
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**  
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.

## Project page:  
https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper

## Model Details

- **Model type:** Mask R-CNN (ResNet backbone)  
- **Framework:** Detectron2 / LayoutParser  
- **Fine-tuned for:** Historical document layout analysis  
- **Language of source documents:** Swedish  

## Label Map

| ID | Label            |
|----|------------------|
| 0  | Advertisement    |
| 1  | Author           |
| 2  | Header or Footer |
| 3  | Image            |
| 4  | List             |
| 5  | Page Number      |
| 6  | Table            |
| 7  | Text             |
| 8  | Title            |

## Evaluation Metrics
The evaluation metrics for this model are as follows:
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 64.325 | 88.948 | 69.214 | 40.350 | 55.117 | 67.543 |

## Usage

### Installation

Follow instructions at:  
https://detectron2.readthedocs.io/en/latest/tutorials/install.html

### Finetuning

Follow instructions at:  
https://detectron2.readthedocs.io/en/latest/tutorials/training.html

### Inference

```python
import cv2
import layoutparser as lp
import matplotlib.pyplot as plt

# Configuration
model_config_path = "config_mask_rcnn_resized.yaml"
model_path = "SweMPer-layout-lite.pth"

label_map = {
    0: "advertisement",
    1: "author",
    2: "header_or_footer",
    3: "image",
    4: "list",
    5: "page_no",
    6: "table",
    7: "text",
    8: "title",
}

# Load model
model = lp.models.Detectron2LayoutModel(
    config_path=model_config_path,
    model_path=model_path,
    extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
    label_map=label_map,
)

# Load and process image
image = cv2.imread("<path_to_image>")
image = image[..., ::-1]  # BGR to RGB

# Detect layout
layout = model.detect(image)

# Print detected elements
for block in layout:
    print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")

# Visualize results
viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
plt.figure(figsize=(12, 16))
plt.imshow(viz)
plt.axis("off")
plt.show()
```

## Acknowledgements

This work was carried out within the project:  
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**  
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.

We gratefully acknowledge the support of the funder and project collaborators.

This model builds upon the excellent work of:

- [Detectron2](https://github.com/facebookresearch/detectron2/tree/main)  
- [LayoutParser](https://github.com/Layout-Parser/layout-parser?tab=readme-ov-file)  

We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.