basiliskan
/

YOLOMH

Object Detection

document-layout-analysis

endpoints-template

Model card Files Files and versions

YOLOMH / README.md

basiliskan's picture

Create README.md

d54a0b5 verified about 1 month ago

|

history blame contribute delete

2.18 kB

	---
	license: apache-2.0
	library_name: doclayout-yolo
	tags:
	- document-layout-analysis
	- object-detection
	- yolo
	- endpoints-template
	pipeline_tag: object-detection
	---

	# DocLayout-YOLO for Document Layout Analysis

	This model is based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO), fine-tuned on DocStructBench for document layout detection.

	## Model Description

	DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect:

	- title - Document titles
	- plain_text - Regular text blocks
	- figure - Images and graphics
	- figure_caption - Captions for figures
	- table - Tables
	- table_caption - Captions for tables
	- table_footnote - Footnotes in tables
	- isolate_formula - Mathematical formulas
	- formula_caption - Captions for formulas
	- abandon - Elements to ignore

	## Usage via Inference Endpoint

	```python
	import requests
	import base64

	API_URL = "https://your-endpoint-url.huggingface.cloud"
	headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

	# Load and encode image
	with open("document.png", "rb") as f:
	image_b64 = base64.b64encode(f.read()).decode()

	# Make request
	response = requests.post(
	API_URL,
	headers=headers,
	json={
	"inputs": image_b64,
	"parameters": {
	"confidence": 0.2,
	"iou_threshold": 0.45
	}
	}
	)

	detections = response.json()
	print(detections)
	```

	## Response Format

	```json
	[
	{
	"label": "title",
	"score": 0.95,
	"box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80}
	},
	{
	"label": "plain_text",
	"score": 0.92,
	"box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400}
	}
	]
	```

	## Credits

	Based on [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)

	```bibtex
	@misc{zhao2024doclayoutyoloenhancingdocumentlayout,
	title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
	author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},
	year={2024},
	eprint={2410.12628},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```