sushruthb commited on
Commit
d976b1f
·
verified ·
1 Parent(s): fb1d548

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -3
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - object-detection
5
+ - document-layout-analysis
6
+ - historical-documents
7
+ - layoutparser
8
+ - mmdetection
9
+ - co-dino
10
+ - vision-transformer
11
+ language:
12
+ - sv
13
+ pipeline_tag: object-detection
14
+ ---
15
+
16
+ # Historical Document Layout Detection Model (Co-DETR / DINO)
17
+
18
+ A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout
19
+ elements in historical Swedish medical journal pages.
20
+
21
+ This model is a more advanced successor to earlier Mask R-CNN-based approaches, offering improved detection performance and robustness on complex layouts.
22
+
23
+ This model was developed as part of the research project:
24
+ **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
25
+ (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
26
+
27
+ Project page:
28
+ https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper
29
+
30
+ ## Model Details
31
+
32
+ - **Model type:** Co-DINO (Vision Transformer backbone)
33
+ - **Framework:** MMDetection
34
+ - **Fine-tuned for:** Historical document layout analysis
35
+ - **Language of source documents:** Swedish
36
+ - **Strengths:** Improved detection accuracy on complex layouts and multi-scale elements
37
+
38
+ ## Supported Labels
39
+
40
+ | Label |
41
+ |------------------|
42
+ | Advertisement |
43
+ | Author |
44
+ | Header or Footer |
45
+ | Image |
46
+ | List |
47
+ | Page Number |
48
+ | Table |
49
+ | Text |
50
+ | Title |
51
+
52
+ ## Usage
53
+
54
+ ### Installation
55
+
56
+ Find installation and finetuning instructions at:
57
+ https://github.com/Sense-X/Co-DETR?tab=readme-ov-file
58
+
59
+ ### Inference
60
+
61
+ ```python
62
+ import cv2
63
+ import layoutparser as lp
64
+ import matplotlib.pyplot as plt
65
+ from mmdet.apis import init_detector, inference_detector
66
+
67
+ # Configuration
68
+ config_file = "co_dino_5scale_vit_large_coco.py"
69
+ checkpoint_file = "SweMPer-layout.pth"
70
+ score_thr = 0.50
71
+ device = "cuda:0"
72
+
73
+ # Initialize model
74
+ model = init_detector(config_file, checkpoint_file, device=device)
75
+
76
+ # Get class names from model
77
+ def get_classes(model):
78
+ m = getattr(model, "module", model)
79
+ classes = getattr(m, "CLASSES", None)
80
+ if classes:
81
+ return list(classes)
82
+ meta = getattr(m, "dataset_meta", None)
83
+ if meta and isinstance(meta, dict) and "classes" in meta:
84
+ return list(meta["classes"])
85
+ return None
86
+
87
+ classes = get_classes(model)
88
+
89
+ # Convert MMDet results to LayoutParser layout
90
+ def mmdet_to_layout(result, classes, thr=0.50):
91
+ bbox_result = result[0] if isinstance(result, tuple) else result
92
+ blocks = []
93
+ for cls_id, dets in enumerate(bbox_result):
94
+ if dets is None or len(dets) == 0:
95
+ continue
96
+ cls_name = classes[cls_id].lower() if classes else str(cls_id)
97
+ for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
98
+ rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
99
+ blocks.append(
100
+ lp.TextBlock(block=rect, type=cls_name, score=float(score))
101
+ )
102
+ return lp.Layout(blocks)
103
+
104
+ # Run inference
105
+ image_path = "<path_to_image>"
106
+ result = inference_detector(model, image_path)
107
+ layout = mmdet_to_layout(result, classes, thr=score_thr)
108
+
109
+ # Print detected elements
110
+ for block in layout:
111
+ print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")
112
+
113
+ # Visualize results
114
+ image = cv2.imread(image_path)[..., ::-1] # BGR to RGB
115
+ viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
116
+ plt.figure(figsize=(12, 16))
117
+ plt.imshow(viz)
118
+ plt.axis("off")
119
+ plt.show()
120
+ ```
121
+
122
+ ## Acknowledgements
123
+
124
+ This work was carried out within the project:
125
+ **Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011**
126
+ (Project ID: **IN22-0017**), funded by **Riksbankens Jubileumsfond**.
127
+
128
+ We gratefully acknowledge the support of the funder and project collaborators.
129
+
130
+ This model builds upon the excellent work of:
131
+
132
+ - [MMDetection](https://github.com/open-mmlab/mmdetection)
133
+ - [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file)
134
+
135
+ We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.