File size: 4,479 Bytes
d976b1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ee5b86
d976b1f
1ee5b86
d976b1f
 
 
 
 
 
 
 
 
 
 
 
 
 
4e223bf
d976b1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15f0e5c
 
 
 
603bd6a
15f0e5c
d976b1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f3fa424
d976b1f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: apache-2.0
tags:
- object-detection
- document-layout-analysis
- historical-documents
- layoutparser
- mmdetection
- co-dino
- vision-transformer
language:
- sv
pipeline_tag: object-detection
---

# Historical Document Layout Detection Model (Co-DETR / DINO)

A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages.

This model is a more advanced successor to earlier Mask R-CNN-based approaches [cdhu-uu/SweMPer-layout-lite](https://huggingface.co/cdhu-uu/SweMPer-layout-lite), offering improved detection performance and robustness on complex layouts.

This model was developed as part of the research project:  
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**  
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.

Project page:  
https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper

## Model Details

- **Model type:** Co-DINO (Vision Transformer backbone)  
- **Framework:** MMDetection  
- **Fine-tuned for:** Historical document layout analysis  
- **Language of source documents:** Swedish  
- **Strengths:** Improved detection Precision on complex layouts

## Supported Labels

| Label            |
|------------------|
| Advertisement    |
| Author           |
| Header or Footer |
| Image            |
| List             |
| Page Number      |
| Table            |
| Text             |
| Title            |

## Evaluation Metrics
The evaluation metrics for this model are as follows:
| AP     | AP50  | AP75  | APs   | APm   | APl   |
|--------|-------|-------|-------|-------|-------|
| 80.7   | 98.4  | 87.4  | 51.5  | 69.6  | 88.2  |

## Usage

### Installation

Find installation and finetuning instructions at:  
https://github.com/Sense-X/Co-DETR?tab=readme-ov-file

### Inference

```python
import cv2
import layoutparser as lp
import matplotlib.pyplot as plt
from mmdet.apis import init_detector, inference_detector

# Configuration
config_file = "co_dino_5scale_vit_large_coco.py"
checkpoint_file = "SweMPer-layout.pth"
score_thr = 0.50
device = "cuda:0"

# Initialize model
model = init_detector(config_file, checkpoint_file, device=device)

# Get class names from model
def get_classes(model):
    m = getattr(model, "module", model)
    classes = getattr(m, "CLASSES", None)
    if classes:
        return list(classes)
    meta = getattr(m, "dataset_meta", None)
    if meta and isinstance(meta, dict) and "classes" in meta:
        return list(meta["classes"])
    return None

classes = get_classes(model)

# Convert MMDet results to LayoutParser layout
def mmdet_to_layout(result, classes, thr=0.50):
    bbox_result = result[0] if isinstance(result, tuple) else result
    blocks = []
    for cls_id, dets in enumerate(bbox_result):
        if dets is None or len(dets) == 0:
            continue
        cls_name = classes[cls_id].lower() if classes else str(cls_id)
        for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
            rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
            blocks.append(
                lp.TextBlock(block=rect, type=cls_name, score=float(score))
            )
    return lp.Layout(blocks)

# Run inference
image_path = "<path_to_image>"
result = inference_detector(model, image_path)
layout = mmdet_to_layout(result, classes, thr=score_thr)

# Print detected elements
for block in layout:
    print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")

# Visualize results
image = cv2.imread(image_path)[..., ::-1]  # BGR to RGB
viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
plt.figure(figsize=(12, 16))
plt.imshow(viz)
plt.axis("off")
plt.show()
```

## Acknowledgements

This work was carried out within the project:  
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**  
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.

We gratefully acknowledge the support of the funder and project collaborators.

This model builds upon the excellent work of:

- [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file)
- [MMDetection](https://github.com/open-mmlab/mmdetection)  

We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.