Update README.md

4e223bf verified 10 days ago

4.48 kB

	---
	license: apache-2.0
	tags:
	- object-detection
	- document-layout-analysis
	- historical-documents
	- layoutparser
	- mmdetection
	- co-dino
	- vision-transformer
	language:
	- sv
	pipeline_tag: object-detection
	---

	# Historical Document Layout Detection Model (Co-DETR / DINO)

	A fine-tuned Co-DINO (Vision Transformer-based detector via MMDetection) model for detecting layout elements in historical Swedish medical journal pages.

	This model is a more advanced successor to earlier Mask R-CNN-based approaches [cdhu-uu/SweMPer-layout-lite](https://huggingface.co/cdhu-uu/SweMPer-layout-lite), offering improved detection performance and robustness on complex layouts.

	This model was developed as part of the research project:
	Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011
	(Project ID: IN22-0017), funded by Riksbankens Jubileumsfond.

	Project page:
	https://www.uu.se/en/department/history-of-science-and-ideas/research/research-projects-and-programmes/communicating-medicine-swemper

	## Model Details

	- Model type: Co-DINO (Vision Transformer backbone)
	- Framework: MMDetection
	- Fine-tuned for: Historical document layout analysis
	- Language of source documents: Swedish
	- Strengths: Improved detection Precision on complex layouts

	## Supported Labels

	\| Label \|
	\|------------------\|
	\| Advertisement \|
	\| Author \|
	\| Header or Footer \|
	\| Image \|
	\| List \|
	\| Page Number \|
	\| Table \|
	\| Text \|
	\| Title \|

	## Evaluation Metrics
	The evaluation metrics for this model are as follows:
	\| AP \| AP50 \| AP75 \| APs \| APm \| APl \|
	\|--------\|-------\|-------\|-------\|-------\|-------\|
	\| 80.7 \| 98.4 \| 87.4 \| 51.5 \| 69.6 \| 88.2 \|

	## Usage

	### Installation

	Find installation and finetuning instructions at:
	https://github.com/Sense-X/Co-DETR?tab=readme-ov-file

	### Inference

	```python
	import cv2
	import layoutparser as lp
	import matplotlib.pyplot as plt
	from mmdet.apis import init_detector, inference_detector

	# Configuration
	config_file = "co_dino_5scale_vit_large_coco.py"
	checkpoint_file = "SweMPer-layout.pth"
	score_thr = 0.50
	device = "cuda:0"

	# Initialize model
	model = init_detector(config_file, checkpoint_file, device=device)

	# Get class names from model
	def get_classes(model):
	m = getattr(model, "module", model)
	classes = getattr(m, "CLASSES", None)
	if classes:
	return list(classes)
	meta = getattr(m, "dataset_meta", None)
	if meta and isinstance(meta, dict) and "classes" in meta:
	return list(meta["classes"])
	return None

	classes = get_classes(model)

	# Convert MMDet results to LayoutParser layout
	def mmdet_to_layout(result, classes, thr=0.50):
	bbox_result = result[0] if isinstance(result, tuple) else result
	blocks = []
	for cls_id, dets in enumerate(bbox_result):
	if dets is None or len(dets) == 0:
	continue
	cls_name = classes[cls_id].lower() if classes else str(cls_id)
	for x1, y1, x2, y2, score in dets[dets[:, -1] >= thr]:
	rect = lp.Rectangle(float(x1), float(y1), float(x2), float(y2))
	blocks.append(
	lp.TextBlock(block=rect, type=cls_name, score=float(score))
	)
	return lp.Layout(blocks)

	# Run inference
	image_path = "<path_to_image>"
	result = inference_detector(model, image_path)
	layout = mmdet_to_layout(result, classes, thr=score_thr)

	# Print detected elements
	for block in layout:
	print(f"Type: {block.type}, Score: {block.score:.3f}, Box: {block.coordinates}")

	# Visualize results
	image = cv2.imread(image_path)[..., ::-1] # BGR to RGB
	viz = lp.draw_box(image, layout, box_width=3, show_element_type=True)
	plt.figure(figsize=(12, 16))
	plt.imshow(viz)
	plt.axis("off")
	plt.show()
	```

	## Acknowledgements

	This work was carried out within the project:
	Communicating Medicine (SweMPer): Digitalisation of Swedish Medical Periodicals, 1781–2011
	(Project ID: IN22-0017), funded by Riksbankens Jubileumsfond.

	We gratefully acknowledge the support of the funder and project collaborators.

	This model builds upon the excellent work of:

	- [Sense-X/Co-DETR](https://github.com/Sense-X/Co-DETR?tab=readme-ov-file)
	- [MMDetection](https://github.com/open-mmlab/mmdetection)

	We thank the contributors and maintainers of these projects for making their tools publicly available and supporting research.