|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: doclayout-yolo |
|
|
tags: |
|
|
- document-layout-analysis |
|
|
- object-detection |
|
|
- yolo |
|
|
- endpoints-template |
|
|
pipeline_tag: object-detection |
|
|
--- |
|
|
|
|
|
# DocLayout-YOLO for Document Layout Analysis |
|
|
|
|
|
This model is based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO), fine-tuned on DocStructBench for document layout detection. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect: |
|
|
|
|
|
- **title** - Document titles |
|
|
- **plain_text** - Regular text blocks |
|
|
- **figure** - Images and graphics |
|
|
- **figure_caption** - Captions for figures |
|
|
- **table** - Tables |
|
|
- **table_caption** - Captions for tables |
|
|
- **table_footnote** - Footnotes in tables |
|
|
- **isolate_formula** - Mathematical formulas |
|
|
- **formula_caption** - Captions for formulas |
|
|
- **abandon** - Elements to ignore |
|
|
|
|
|
## Usage via Inference Endpoint |
|
|
|
|
|
```python |
|
|
import requests |
|
|
import base64 |
|
|
|
|
|
API_URL = "https://your-endpoint-url.huggingface.cloud" |
|
|
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} |
|
|
|
|
|
# Load and encode image |
|
|
with open("document.png", "rb") as f: |
|
|
image_b64 = base64.b64encode(f.read()).decode() |
|
|
|
|
|
# Make request |
|
|
response = requests.post( |
|
|
API_URL, |
|
|
headers=headers, |
|
|
json={ |
|
|
"inputs": image_b64, |
|
|
"parameters": { |
|
|
"confidence": 0.2, |
|
|
"iou_threshold": 0.45 |
|
|
} |
|
|
} |
|
|
) |
|
|
|
|
|
detections = response.json() |
|
|
print(detections) |
|
|
``` |
|
|
|
|
|
## Response Format |
|
|
|
|
|
```json |
|
|
[ |
|
|
{ |
|
|
"label": "title", |
|
|
"score": 0.95, |
|
|
"box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80} |
|
|
}, |
|
|
{ |
|
|
"label": "plain_text", |
|
|
"score": 0.92, |
|
|
"box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400} |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
## Credits |
|
|
|
|
|
Based on [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) |
|
|
|
|
|
```bibtex |
|
|
@misc{zhao2024doclayoutyoloenhancingdocumentlayout, |
|
|
title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception}, |
|
|
author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He}, |
|
|
year={2024}, |
|
|
eprint={2410.12628}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV} |
|
|
} |
|
|
``` |