andrew33333's picture
Upload folder using huggingface_hub
c147abc verified
---
license: apache-2.0
tags:
- document-layout
- object-detection
- yolo
- document-analysis
library_name: ultralytics
---
# DocLayout-YOLO - Docstructbench
Document layout detection model based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO).
## Model Description
- **Architecture**: YOLOv10m with G2L_CRM (Global-to-Local Context Refining Module)
- **Classes**: 10 document layout elements
- **Input Size**: 1024x1024
- **Paper**: [DocLayout-YOLO](https://arxiv.org/abs/2410.12628)
### Classes
- `title`
- `plain_text`
- `abandon`
- `figure`
- `figure_caption`
- `table`
- `table_caption`
- `table_footnote`
- `isolate_formula`
- `formula_caption`
## Usage
### PyTorch
```python
from huggingface_hub import snapshot_download
import sys
# Download model (includes code + weights)
repo_path = snapshot_download("anyformat-ai/doclayout-yolo-docstructbench")
# Import and use
sys.path.insert(0, repo_path)
from doclayout_yolo import DocLayoutModel
model = DocLayoutModel(f"{repo_path}/model.pt")
results = model.predict("document.png")
for det in results:
print(f"{det['class_name']}: {det['confidence']:.2f} at {det['bbox']}")
```
### ONNX
```python
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
import json
# Download ONNX model and config
model_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "model.onnx")
config_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "config.json")
with open(config_path) as f:
config = json.load(f)
session = ort.InferenceSession(model_path)
# Preprocess image to (1, 3, 1024, 1024) float32, normalized to [0, 1]
# Run inference and post-process outputs
```
## Requirements
```
ultralytics
huggingface-hub
onnxruntime # for ONNX inference
```
## Citation
```bibtex
@article{zhao2024doclayout,
title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
author={Zhao, Zhiyuan and Kang, Hengrui and Wang, Bin and He, Conghui},
journal={arXiv preprint arXiv:2410.12628},
year={2024}
}
```
## License
Apache 2.0