| --- |
| license: apache-2.0 |
| tags: |
| - object-detection |
| - document-layout-analysis |
| - historical-documents |
| - layoutparser |
| - mmdetection |
| - co-dino |
| - vision-transformer |
| language: |
| - sv |
| pipeline_tag: object-detection |
| --- |
| |
| # Historical Document Layout Detection Model (Co-DETR / DINO) |
|
|
| A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages. |
|
|
| This model is a more advanced successor to earlier Mask R-CNN-based approaches [cdhu-uu/SweMPer-layout-lite](https://huggingface.co/cdhu-uu/SweMPer-layout-lite), offering improved detection performance and robustness on complex layouts. |
|
|
| This model was developed as part of the research project: |
| **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011** |
| (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**. |
|
|
| Project page: |
| https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper |
|
|
| ## Model Details |
|
|
| - **Model type:** Co-DINO (Vision Transformer backbone) |
| - **Framework:** MMDetection |
| - **Fine-tuned for:** Historical document layout analysis |
| - **Language of source documents:** Swedish |
| - **Strengths:** Improved detection Precision on complex layouts |
|
|
| ## Supported Labels |
|
|
| | Label | |
| |------------------| |
| | Advertisement | |
| | Author | |
| | Header or Footer | |
| | Image | |
| | List | |
| | Page Number | |
| | Table | |
| | Text | |
| | Title | |
|
|
| ## Evaluation Metrics |
| The evaluation metrics for this model are as follows: |
| | AP | AP50 | AP75 | APs | APm | APl | |
| |--------|-------|-------|-------|-------|-------| |
| | 80.7 | 98.4 | 87.4 | 51.5 | 69.6 | 88.2 | |
|
|
| ## Usage |
|
|
| ### Installation |
|
|
| Find installation and finetuning instructions at: |
| https://github.com/Sense-X/Co-DETR?tab=readme-ov-file |
|
|
| ### Inference |
|
|
| ```python |
| import cv2 |
| import layoutparser as lp |
| import matplotlib.pyplot as plt |
| from mmdet.apis import init_detector, inference_detector |
| |
| # Configuration |
| config_file = "co_dino_5scale_vit_large_coco.py" |
| checkpoint_file = "SweMPer-layout.pth" |
| score_thr = 0.50 |
| device = "cuda:0" |
| |
| # Initialize model |
| model = init_detector(config_file, checkpoint_file, device=device) |
| |
| # Get class names from model |
| def get_classes(model): |
| m = getattr(model, "module", model) |
| classes = getattr(m, "CLASSES", None) |
| if classes: |
| return list(classes) |
| meta = getattr(m, "dataset_meta", None) |
| if meta and isinstance(meta, dict) and "classes" in meta: |
| return list(meta["classes"]) |
| return None |
| |
| classes = get_classes(model) |
| |
| # Convert MMDet results to LayoutParser layout |
| def mmdet_to_layout(result, classes, thr=0.50): |
| bbox_result = result[0] if isinstance(result, tuple) else result |
| blocks = [] |
| for cls_id, dets in enumerate(bbox_result): |
| if dets is None or len(dets) == 0: |
| continue |
| cls_name = classes[cls_id].lower() if classes else str(cls_id) |
| for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]: |
| rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2)) |
| blocks.append( |
| lp.TextBlock(block=rect, type=cls_name, score=float(score)) |
| ) |
| return lp.Layout(blocks) |
| |
| # Run inference |
| image_path = "<path_to_image>" |
| result = inference_detector(model, image_path) |
| layout = mmdet_to_layout(result, classes, thr=score_thr) |
| |
| # Print detected elements |
| for block in layout: |
| print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}") |
| |
| # Visualize results |
| image = cv2.imread(image_path)[..., ::-1] # BGR to RGB |
| viz = lp.draw_box(image, layout, box_width=3, show_element_type=True) |
| plt.figure(figsize=(12, 16)) |
| plt.imshow(viz) |
| plt.axis("off") |
| plt.show() |
| ``` |
|
|
| ## Acknowledgements |
|
|
| This work was carried out within the project: |
| **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011** |
| (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**. |
|
|
| We gratefully acknowledge the support of the funder and project collaborators. |
|
|
| This model builds upon the excellent work of: |
|
|
| - [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file) |
| - [MMDetection](https://github.com/open-mmlab/mmdetection) |
|
|
| We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research. |