File size: 4,479 Bytes

---
license: apache-2.0
tags:
- object-detection
- document-layout-analysis
- historical-documents
- layoutparser
- mmdetection
- co-dino
- vision-transformer
language:
- sv
pipeline_tag: object-detection
---

# Historical Document Layout Detection Model (Co-DETR / DINO)

A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages.

This model is a more advanced successor to earlier Mask R-CNN-based approaches [cdhu-uu/SweMPer-layout-lite](https://huggingface.co/cdhu-uu/SweMPer-layout-lite), offering improved detection performance and robustness on complex layouts.

This model was developed as part of the research project:  
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**  
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.

Project page:  
https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper

## Model Details

- **Model type:** Co-DINO (Vision Transformer backbone)  
- **Framework:** MMDetection  
- **Fine-tuned for:** Historical document layout analysis  
- **Language of source documents:** Swedish  
- **Strengths:** Improved detection Precision on complex layouts

## Supported Labels

| Label            |
|------------------|
| Advertisement    |
| Author           |
| Header or Footer |
| Image            |
| List             |
| Page Number      |
| Table            |
| Text             |
| Title            |

## Evaluation Metrics
The evaluation metrics for this model are as follows:
| AP     | AP50  | AP75  | APs   | APm   | APl   |
|--------|-------|-------|-------|-------|-------|
| 80.7   | 98.4  | 87.4  | 51.5  | 69.6  | 88.2  |

## Usage

### Installation

Find installation and finetuning instructions at:  
https://github.com/Sense-X/Co-DETR?tab=readme-ov-file

### Inference

```python
import cv2
import layoutparser as lp
import matplotlib.pyplot as plt
from mmdet.apis import init_detector, inference_detector

# Configuration
config_file = "co_dino_5scale_vit_large_coco.py"
checkpoint_file = "SweMPer-layout.pth"
score_thr = 0.50
device = "cuda:0"

# Initialize model
model = init_detector(config_file, checkpoint_file, device=device)

# Get class names from model
def get_classes(model):
    m = getattr(model, "module", model)
    classes = getattr(m, "CLASSES", None)
    if classes:
        return list(classes)
    meta = getattr(m, "dataset_meta", None)
    if meta and isinstance(meta, dict) and "classes" in meta:
        return list(meta["classes"])
    return None

classes = get_classes(model)

# Convert MMDet results to LayoutParser layout
def mmdet_to_layout(result, classes, thr=0.50):
    bbox_result = result[0] if isinstance(result, tuple) else result
    blocks = []
    for cls_id, dets in enumerate(bbox_result):
        if dets is None or len(dets) == 0:
            continue
        cls_name = classes[cls_id].lower() if classes else str(cls_id)
        for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
            rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
            blocks.append(
                lp.TextBlock(block=rect, type=cls_name, score=float(score))
            )
    return lp.Layout(blocks)

# Run inference
image_path = "<path_to_image>"
result = inference_detector(model, image_path)
layout = mmdet_to_layout(result, classes, thr=score_thr)

# Print detected elements
for block in layout:
    print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")

# Visualize results
image = cv2.imread(image_path)[..., ::-1]  # BGR to RGB
viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
plt.figure(figsize=(12, 16))
plt.imshow(viz)
plt.axis("off")
plt.show()
```

## Acknowledgements

This work was carried out within the project:  
**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**  
(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.

We gratefully acknowledge the support of the funder and project collaborators.

This model builds upon the excellent work of:

- [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file)
- [MMDetection](https://github.com/open-mmlab/mmdetection)  

We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.