YOLOMH / README.md
basiliskan's picture
Create README.md
d54a0b5 verified
---
license: apache-2.0
library_name: doclayout-yolo
tags:
- document-layout-analysis
- object-detection
- yolo
- endpoints-template
pipeline_tag: object-detection
---
# DocLayout-YOLO for Document Layout Analysis
This model is based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO), fine-tuned on DocStructBench for document layout detection.
## Model Description
DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect:
- **title** - Document titles
- **plain_text** - Regular text blocks
- **figure** - Images and graphics
- **figure_caption** - Captions for figures
- **table** - Tables
- **table_caption** - Captions for tables
- **table_footnote** - Footnotes in tables
- **isolate_formula** - Mathematical formulas
- **formula_caption** - Captions for formulas
- **abandon** - Elements to ignore
## Usage via Inference Endpoint
```python
import requests
import base64
API_URL = "https://your-endpoint-url.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
# Load and encode image
with open("document.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# Make request
response = requests.post(
API_URL,
headers=headers,
json={
"inputs": image_b64,
"parameters": {
"confidence": 0.2,
"iou_threshold": 0.45
}
}
)
detections = response.json()
print(detections)
```
## Response Format
```json
[
{
"label": "title",
"score": 0.95,
"box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80}
},
{
"label": "plain_text",
"score": 0.92,
"box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400}
}
]
```
## Credits
Based on [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
```bibtex
@misc{zhao2024doclayoutyoloenhancingdocumentlayout,
title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},
year={2024},
eprint={2410.12628},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```