--- license: apache-2.0 tags: - object-detection - document-layout-analysis - historical-documents - layoutparser - mmdetection - co-dino - vision-transformer language: - sv pipeline_tag: object-detection --- # Historical Document Layout Detection Model (Co-DETR / DINO) A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages. This model is a more advanced successor to earlier Mask R-CNN-based approaches [cdhu-uu/SweMPer-layout-lite](https://huggingface.co/cdhu-uu/SweMPer-layout-lite), offering improved detection performance and robustness on complex layouts. This model was developed as part of the research project: **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011** (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**. Project page: https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper ## Model Details - **Model type:** Co-DINO (Vision Transformer backbone) - **Framework:** MMDetection - **Fine-tuned for:** Historical document layout analysis - **Language of source documents:** Swedish - **Strengths:** Improved detection Precision on complex layouts ## Supported Labels | Label | |------------------| | Advertisement | | Author | | Header or Footer | | Image | | List | | Page Number | | Table | | Text | | Title | ## Evaluation Metrics The evaluation metrics for this model are as follows: | AP | AP50 | AP75 | APs | APm | APl | |--------|-------|-------|-------|-------|-------| | 80.7 | 98.4 | 87.4 | 51.5 | 69.6 | 88.2 | ## Usage ### Installation Find installation and finetuning instructions at: https://github.com/Sense-X/Co-DETR?tab=readme-ov-file ### Inference ```python import cv2 import layoutparser as lp import matplotlib.pyplot as plt from mmdet.apis import init_detector, inference_detector # Configuration config_file = "co_dino_5scale_vit_large_coco.py" checkpoint_file = "SweMPer-layout.pth" score_thr = 0.50 device = "cuda:0" # Initialize model model = init_detector(config_file, checkpoint_file, device=device) # Get class names from model def get_classes(model): m = getattr(model, "module", model) classes = getattr(m, "CLASSES", None) if classes: return list(classes) meta = getattr(m, "dataset_meta", None) if meta and isinstance(meta, dict) and "classes" in meta: return list(meta["classes"]) return None classes = get_classes(model) # Convert MMDet results to LayoutParser layout def mmdet_to_layout(result, classes, thr=0.50): bbox_result = result[0] if isinstance(result, tuple) else result blocks = [] for cls_id, dets in enumerate(bbox_result): if dets is None or len(dets) == 0: continue cls_name = classes[cls_id].lower() if classes else str(cls_id) for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]: rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2)) blocks.append( lp.TextBlock(block=rect, type=cls_name, score=float(score)) ) return lp.Layout(blocks) # Run inference image_path = "" result = inference_detector(model, image_path) layout = mmdet_to_layout(result, classes, thr=score_thr) # Print detected elements for block in layout: print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}") # Visualize results image = cv2.imread(image_path)[..., ::-1] # BGR to RGB viz = lp.draw_box(image, layout, box_width=3, show_element_type=True) plt.figure(figsize=(12, 16)) plt.imshow(viz) plt.axis("off") plt.show() ``` ## Acknowledgements This work was carried out within the project: **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011** (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**. We gratefully acknowledge the support of the funder and project collaborators. This model builds upon the excellent work of: - [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file) - [MMDetection](https://github.com/open-mmlab/mmdetection) We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.