File size: 4,479 Bytes
d976b1f 1ee5b86 d976b1f 1ee5b86 d976b1f 4e223bf d976b1f 15f0e5c 603bd6a 15f0e5c d976b1f f3fa424 d976b1f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | ---
license: apache-2.0
tags:
- object-detection
- document-layout-analysis
- historical-documents
- layoutparser
- mmdetection
- co-dino
- vision-transformer
language:
- sv
pipeline_tag: object-detection
---
# Historical Document Layout Detection Model (Co-DETR / DINO)
A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages.
This model is a more advanced successor to earlier Mask R-CNN-based approaches [cdhu-uu/SweMPer-layout-lite](https://huggingface.co/cdhu-uu/SweMPer-layout-lite), offering improved detection performance and robustness on complex layouts.
This model was developed as part of the research project:
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
Project page:
https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper
## Model Details
- **Model type:** Co-DINO (Vision Transformer backbone)
- **Framework:** MMDetection
- **Fine-tuned for:** Historical document layout analysis
- **Language of source documents:** Swedish
- **Strengths:** Improved detection Precision on complex layouts
## Supported Labels
| Label |
|------------------|
| Advertisement |
| Author |
| Header or Footer |
| Image |
| List |
| Page Number |
| Table |
| Text |
| Title |
## Evaluation Metrics
The evaluation metrics for this model are as follows:
| AP | AP50 | AP75 | APs | APm | APl |
|--------|-------|-------|-------|-------|-------|
| 80.7 | 98.4 | 87.4 | 51.5 | 69.6 | 88.2 |
## Usage
### Installation
Find installation and finetuning instructions at:
https://github.com/Sense-X/Co-DETR?tab=readme-ov-file
### Inference
```python
import cv2
import layoutparser as lp
import matplotlib.pyplot as plt
from mmdet.apis import init_detector, inference_detector
# Configuration
config_file = "co_dino_5scale_vit_large_coco.py"
checkpoint_file = "SweMPer-layout.pth"
score_thr = 0.50
device = "cuda:0"
# Initialize model
model = init_detector(config_file, checkpoint_file, device=device)
# Get class names from model
def get_classes(model):
m = getattr(model, "module", model)
classes = getattr(m, "CLASSES", None)
if classes:
return list(classes)
meta = getattr(m, "dataset_meta", None)
if meta and isinstance(meta, dict) and "classes" in meta:
return list(meta["classes"])
return None
classes = get_classes(model)
# Convert MMDet results to LayoutParser layout
def mmdet_to_layout(result, classes, thr=0.50):
bbox_result = result[0] if isinstance(result, tuple) else result
blocks = []
for cls_id, dets in enumerate(bbox_result):
if dets is None or len(dets) == 0:
continue
cls_name = classes[cls_id].lower() if classes else str(cls_id)
for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
blocks.append(
lp.TextBlock(block=rect, type=cls_name, score=float(score))
)
return lp.Layout(blocks)
# Run inference
image_path = "<path_to_image>"
result = inference_detector(model, image_path)
layout = mmdet_to_layout(result, classes, thr=score_thr)
# Print detected elements
for block in layout:
print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")
# Visualize results
image = cv2.imread(image_path)[..., ::-1] # BGR to RGB
viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
plt.figure(figsize=(12, 16))
plt.imshow(viz)
plt.axis("off")
plt.show()
```
## Acknowledgements
This work was carried out within the project:
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
We gratefully acknowledge the support of the funder and project collaborators.
This model builds upon the excellent work of:
- [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file)
- [MMDetection](https://github.com/open-mmlab/mmdetection)
We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research. |