File size: 2,163 Bytes

c147abc

---
license: apache-2.0
tags:
  - document-layout
  - object-detection
  - yolo
  - document-analysis
library_name: ultralytics
---

# DocLayout-YOLO - Docstructbench

Document layout detection model based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO).

## Model Description

- **Architecture**: YOLOv10m with G2L_CRM (Global-to-Local Context Refining Module)
- **Classes**: 10 document layout elements
- **Input Size**: 1024x1024
- **Paper**: [DocLayout-YOLO](https://arxiv.org/abs/2410.12628)

### Classes

- `title`
- `plain_text`
- `abandon`
- `figure`
- `figure_caption`
- `table`
- `table_caption`
- `table_footnote`
- `isolate_formula`
- `formula_caption`

## Usage

### PyTorch

```python
from huggingface_hub import snapshot_download
import sys

# Download model (includes code + weights)
repo_path = snapshot_download("anyformat-ai/doclayout-yolo-docstructbench")

# Import and use
sys.path.insert(0, repo_path)
from doclayout_yolo import DocLayoutModel

model = DocLayoutModel(f"{repo_path}/model.pt")
results = model.predict("document.png")

for det in results:
    print(f"{det['class_name']}: {det['confidence']:.2f} at {det['bbox']}")
```

### ONNX

```python
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
import json

# Download ONNX model and config
model_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "model.onnx")
config_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "config.json")

with open(config_path) as f:
    config = json.load(f)

session = ort.InferenceSession(model_path)
# Preprocess image to (1, 3, 1024, 1024) float32, normalized to [0, 1]
# Run inference and post-process outputs
```

## Requirements

```
ultralytics
huggingface-hub
onnxruntime  # for ONNX inference
```

## Citation

```bibtex
@article{zhao2024doclayout,
  title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
  author={Zhao, Zhiyuan and Kang, Hengrui and Wang, Bin and He, Conghui},
  journal={arXiv preprint arXiv:2410.12628},
  year={2024}
}
```

## License

Apache 2.0