--- license: apache-2.0 tags: - object-detection - document-layout-analysis - historical-documents - layoutparser - detectron2 - mask-rcnn language: - sv pipeline_tag: object-detection base_model: - layoutparser/detectron2 --- # Historical Document Layout Detection Model A fine-tuned Mask R-CNN model (via LayoutParser/Detectron2) for detecting layout elements in historical Swedish medical journal pages. This is a lighter model and the more complex model can be found here [Swemper-layout](https://huggingface.co/cdhu-uu/SweMPer-layout). This model was developed as part of the research project: **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011** (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**. ## Project page: https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper ## Model Details - **Model type:** Mask R-CNN (ResNet backbone) - **Framework:** Detectron2 / LayoutParser - **Fine-tuned for:** Historical document layout analysis - **Language of source documents:** Swedish ## Label Map | ID | Label | |----|------------------| | 0 | Advertisement | | 1 | Author | | 2 | Header or Footer | | 3 | Image | | 4 | List | | 5 | Page Number | | 6 | Table | | 7 | Text | | 8 | Title | ## Evaluation Metrics The evaluation metrics for this model are as follows: | AP | AP50 | AP75 | APs | APm | APl | |:------:|:------:|:------:|:------:|:------:|:------:| | 64.325 | 88.948 | 69.214 | 40.350 | 55.117 | 67.543 | ## Usage ### Installation Follow instructions at: https://detectron2.readthedocs.io/en/latest/tutorials/install.html ### Finetuning Follow instructions at: https://detectron2.readthedocs.io/en/latest/tutorials/training.html ### Inference ```python import cv2 import layoutparser as lp import matplotlib.pyplot as plt # Configuration model_config_path = "config_mask_rcnn_resized.yaml" model_path = "SweMPer-layout-lite.pth" label_map = { 0: "advertisement", 1: "author", 2: "header_or_footer", 3: "image", 4: "list", 5: "page_no", 6: "table", 7: "text", 8: "title", } # Load model model = lp.models.Detectron2LayoutModel( config_path=model_config_path, model_path=model_path, extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8], label_map=label_map, ) # Load and process image image = cv2.imread("") image = image[..., ::-1] # BGR to RGB # Detect layout layout = model.detect(image) # Print detected elements for block in layout: print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}") # Visualize results viz = lp.draw_box(image, layout, box_width=3, show_element_type=True) plt.figure(figsize=(12, 16)) plt.imshow(viz) plt.axis("off") plt.show() ``` ## Acknowledgements This work was carried out within the project: **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011** (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**. We gratefully acknowledge the support of the funder and project collaborators. This model builds upon the excellent work of: - [Detectron2](https://github.com/facebookresearch/detectron2/tree/main) - [LayoutParser](https://github.com/Layout-Parser/layout-parser?tab=readme-ov-file) We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.