cdhu-uu
/

SweMPer-layout

+---
+license: apache-2.0
+tags:
+- object-detection
+- document-layout-analysis
+- historical-documents
+- layoutparser
+- mmdetection
+- co-dino
+- vision-transformer
+language:
+- sv
+pipeline_tag: object-detection
+---
+# Historical Document Layout Detection Model (Co-DETR / DINO)
+A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout
+elements in historical Swedish medical journal pages.
+This model is a more advanced successor to earlier Mask R-CNN-based approaches, offering improved detection performance and robustness on complex layouts.
+This model was developed as part of the research project:
+**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
+(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
+Project page:
+https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper
+## Model Details
+- **Model type:** Co-DINO (Vision Transformer backbone)
+- **Framework:** MMDetection
+- **Fine-tuned for:** Historical document layout analysis
+- **Language of source documents:** Swedish
+- **Strengths:** Improved detection accuracy on complex layouts and multi-scale elements
+## Supported Labels
+| Label            |
+|------------------|
+| Advertisement    |
+| Author           |
+| Header or Footer |
+| Image            |
+| List             |
+| Page Number      |
+| Table            |
+| Text             |
+| Title            |
+## Usage
+### Installation
+Find installation and finetuning instructions at:
+https://github.com/Sense-X/Co-DETR?tab=readme-ov-file
+### Inference
+```python
+import cv2
+import layoutparser as lp
+import matplotlib.pyplot as plt
+from mmdet.apis import init_detector, inference_detector
+# Configuration
+config_file = "co_dino_5scale_vit_large_coco.py"
+checkpoint_file = "SweMPer-layout.pth"
+score_thr = 0.50
+device = "cuda:0"
+# Initialize model
+model = init_detector(config_file, checkpoint_file, device=device)
+# Get class names from model
+def get_classes(model):
+    m = getattr(model, "module", model)
+    classes = getattr(m, "CLASSES", None)
+    if classes:
+        return list(classes)
+    meta = getattr(m, "dataset_meta", None)
+    if meta and isinstance(meta, dict) and "classes" in meta:
+        return list(meta["classes"])
+    return None
+classes = get_classes(model)
+# Convert MMDet results to LayoutParser layout
+def mmdet_to_layout(result, classes, thr=0.50):
+    bbox_result = result[0] if isinstance(result, tuple) else result
+    blocks = []
+    for cls_id, dets in enumerate(bbox_result):
+        if dets is None or len(dets) == 0:
+            continue
+        cls_name = classes[cls_id].lower() if classes else str(cls_id)
+        for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
+            rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
+            blocks.append(
+                lp.TextBlock(block=rect, type=cls_name, score=float(score))
+            )
+    return lp.Layout(blocks)
+# Run inference
+image_path = "<path_to_image>"
+result = inference_detector(model, image_path)
+layout = mmdet_to_layout(result, classes, thr=score_thr)
+# Print detected elements
+for block in layout:
+    print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")
+# Visualize results
+image = cv2.imread(image_path)[..., ::-1]  # BGR to RGB
+viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
+plt.figure(figsize=(12, 16))
+plt.imshow(viz)
+plt.axis("off")
+plt.show()
+```
+## Acknowledgements
+This work was carried out within the project:
+**Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
+(Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
+We gratefully acknowledge the support of the funder and project collaborators.
+This model builds upon the excellent work of:
+- [MMDetection](https://github.com/open-mmlab/mmdetection)
+- [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file)
+We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.