anyformat
/

doclayout-yolo-docstructbench

Object Detection

document-layout

document-analysis

Model card Files Files and versions

doclayout-yolo-docstructbench / README.md

andrew33333's picture

Upload folder using huggingface_hub

c147abc verified 1 day ago

|

history blame contribute delete

2.16 kB

	---
	license: apache-2.0
	tags:
	- document-layout
	- object-detection
	- yolo
	- document-analysis
	library_name: ultralytics
	---

	# DocLayout-YOLO - Docstructbench

	Document layout detection model based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO).

	## Model Description

	- Architecture: YOLOv10m with G2L_CRM (Global-to-Local Context Refining Module)
	- Classes: 10 document layout elements
	- Input Size: 1024x1024
	- Paper: [DocLayout-YOLO](https://arxiv.org/abs/2410.12628)

	### Classes

	- `title`
	- `plain_text`
	- `abandon`
	- `figure`
	- `figure_caption`
	- `table`
	- `table_caption`
	- `table_footnote`
	- `isolate_formula`
	- `formula_caption`

	## Usage

	### PyTorch

	```python
	from huggingface_hub import snapshot_download
	import sys

	# Download model (includes code + weights)
	repo_path = snapshot_download("anyformat-ai/doclayout-yolo-docstructbench")

	# Import and use
	sys.path.insert(0, repo_path)
	from doclayout_yolo import DocLayoutModel

	model = DocLayoutModel(f"{repo_path}/model.pt")
	results = model.predict("document.png")

	for det in results:
	print(f"{det['class_name']}: {det['confidence']:.2f} at {det['bbox']}")
	```

	### ONNX

	```python
	import onnxruntime as ort
	import numpy as np
	from huggingface_hub import hf_hub_download
	import json

	# Download ONNX model and config
	model_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "model.onnx")
	config_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "config.json")

	with open(config_path) as f:
	config = json.load(f)

	session = ort.InferenceSession(model_path)
	# Preprocess image to (1, 3, 1024, 1024) float32, normalized to [0, 1]
	# Run inference and post-process outputs
	```

	## Requirements

	```
	ultralytics
	huggingface-hub
	onnxruntime # for ONNX inference
	```

	## Citation

	```bibtex
	@article{zhao2024doclayout,
	title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
	author={Zhao, Zhiyuan and Kang, Hengrui and Wang, Bin and He, Conghui},
	journal={arXiv preprint arXiv:2410.12628},
	year={2024}
	}
	```

	## License

	Apache 2.0