File size: 2,177 Bytes
2a30887 d54a0b5 2a30887 d54a0b5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
license: apache-2.0
library_name: doclayout-yolo
tags:
- document-layout-analysis
- object-detection
- yolo
- endpoints-template
pipeline_tag: object-detection
---
# DocLayout-YOLO for Document Layout Analysis
This model is based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO), fine-tuned on DocStructBench for document layout detection.
## Model Description
DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect:
- **title** - Document titles
- **plain_text** - Regular text blocks
- **figure** - Images and graphics
- **figure_caption** - Captions for figures
- **table** - Tables
- **table_caption** - Captions for tables
- **table_footnote** - Footnotes in tables
- **isolate_formula** - Mathematical formulas
- **formula_caption** - Captions for formulas
- **abandon** - Elements to ignore
## Usage via Inference Endpoint
```python
import requests
import base64
API_URL = "https://your-endpoint-url.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
# Load and encode image
with open("document.png", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# Make request
response = requests.post(
API_URL,
headers=headers,
json={
"inputs": image_b64,
"parameters": {
"confidence": 0.2,
"iou_threshold": 0.45
}
}
)
detections = response.json()
print(detections)
```
## Response Format
```json
[
{
"label": "title",
"score": 0.95,
"box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80}
},
{
"label": "plain_text",
"score": 0.92,
"box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400}
}
]
```
## Credits
Based on [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
```bibtex
@misc{zhao2024doclayoutyoloenhancingdocumentlayout,
title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},
year={2024},
eprint={2410.12628},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
``` |