File size: 2,177 Bytes
2a30887
 
d54a0b5
 
 
 
 
 
 
2a30887
d54a0b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: apache-2.0
library_name: doclayout-yolo
tags:
  - document-layout-analysis
  - object-detection
  - yolo
  - endpoints-template
pipeline_tag: object-detection
---

# DocLayout-YOLO for Document Layout Analysis

This model is based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO), fine-tuned on DocStructBench for document layout detection.

## Model Description

DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect:

- **title** - Document titles
- **plain_text** - Regular text blocks
- **figure** - Images and graphics
- **figure_caption** - Captions for figures
- **table** - Tables
- **table_caption** - Captions for tables
- **table_footnote** - Footnotes in tables
- **isolate_formula** - Mathematical formulas
- **formula_caption** - Captions for formulas
- **abandon** - Elements to ignore

## Usage via Inference Endpoint

```python
import requests
import base64

API_URL = "https://your-endpoint-url.huggingface.cloud"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

# Load and encode image
with open("document.png", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode()

# Make request
response = requests.post(
    API_URL,
    headers=headers,
    json={
        "inputs": image_b64,
        "parameters": {
            "confidence": 0.2,
            "iou_threshold": 0.45
        }
    }
)

detections = response.json()
print(detections)
```

## Response Format

```json
[
  {
    "label": "title",
    "score": 0.95,
    "box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80}
  },
  {
    "label": "plain_text",
    "score": 0.92,
    "box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400}
  }
]
```

## Credits

Based on [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)

```bibtex
@misc{zhao2024doclayoutyoloenhancingdocumentlayout,
  title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
  author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He},
  year={2024},
  eprint={2410.12628},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```