SweMPer-layout / README.md
sushruthb's picture
Update README.md
4e223bf verified
---
license: apache-2.0
tags:
- object-detection
- document-layout-analysis
- historical-documents
- layoutparser
- mmdetection
- co-dino
- vision-transformer
language:
- sv
pipeline_tag: object-detection
---
# Historical Document Layout Detection Model (Co-DETR / DINO)
A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages.
This model is a more advanced successor to earlier Mask R-CNN-based approaches [cdhu-uu/SweMPer-layout-lite](https://huggingface.co/cdhu-uu/SweMPer-layout-lite), offering improved detection performance and robustness on complex layouts.
This model was developed as part of the research project:
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
Project page:
https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper
## Model Details
- **Model type:** Co-DINO (Vision Transformer backbone)
- **Framework:** MMDetection
- **Fine-tuned for:** Historical document layout analysis
- **Language of source documents:** Swedish
- **Strengths:** Improved detection Precision on complex layouts
## Supported Labels
| Label |
|------------------|
| Advertisement |
| Author |
| Header or Footer |
| Image |
| List |
| Page Number |
| Table |
| Text |
| Title |
## Evaluation Metrics
The evaluation metrics for this model are as follows:
| AP | AP50 | AP75 | APs | APm | APl |
|--------|-------|-------|-------|-------|-------|
| 80.7 | 98.4 | 87.4 | 51.5 | 69.6 | 88.2 |
## Usage
### Installation
Find installation and finetuning instructions at:
https://github.com/Sense-X/Co-DETR?tab=readme-ov-file
### Inference
```python
import cv2
import layoutparser as lp
import matplotlib.pyplot as plt
from mmdet.apis import init_detector, inference_detector
# Configuration
config_file = "co_dino_5scale_vit_large_coco.py"
checkpoint_file = "SweMPer-layout.pth"
score_thr = 0.50
device = "cuda:0"
# Initialize model
model = init_detector(config_file, checkpoint_file, device=device)
# Get class names from model
def get_classes(model):
m = getattr(model, "module", model)
classes = getattr(m, "CLASSES", None)
if classes:
return list(classes)
meta = getattr(m, "dataset_meta", None)
if meta and isinstance(meta, dict) and "classes" in meta:
return list(meta["classes"])
return None
classes = get_classes(model)
# Convert MMDet results to LayoutParser layout
def mmdet_to_layout(result, classes, thr=0.50):
bbox_result = result[0] if isinstance(result, tuple) else result
blocks = []
for cls_id, dets in enumerate(bbox_result):
if dets is None or len(dets) == 0:
continue
cls_name = classes[cls_id].lower() if classes else str(cls_id)
for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
blocks.append(
lp.TextBlock(block=rect, type=cls_name, score=float(score))
)
return lp.Layout(blocks)
# Run inference
image_path = "<path_to_image>"
result = inference_detector(model, image_path)
layout = mmdet_to_layout(result, classes, thr=score_thr)
# Print detected elements
for block in layout:
print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")
# Visualize results
image = cv2.imread(image_path)[..., ::-1] # BGR to RGB
viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
plt.figure(figsize=(12, 16))
plt.imshow(viz)
plt.axis("off")
plt.show()
```
## Acknowledgements
This work was carried out within the project:
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
We gratefully acknowledge the support of the funder and project collaborators.
This model builds upon the excellent work of:
- [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file)
- [MMDetection](https://github.com/open-mmlab/mmdetection)
We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.