|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- document-layout |
|
|
- object-detection |
|
|
- yolo |
|
|
- document-analysis |
|
|
library_name: ultralytics |
|
|
--- |
|
|
|
|
|
# DocLayout-YOLO - Docstructbench |
|
|
|
|
|
Document layout detection model based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO). |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Architecture**: YOLOv10m with G2L_CRM (Global-to-Local Context Refining Module) |
|
|
- **Classes**: 10 document layout elements |
|
|
- **Input Size**: 1024x1024 |
|
|
- **Paper**: [DocLayout-YOLO](https://arxiv.org/abs/2410.12628) |
|
|
|
|
|
### Classes |
|
|
|
|
|
- `title` |
|
|
- `plain_text` |
|
|
- `abandon` |
|
|
- `figure` |
|
|
- `figure_caption` |
|
|
- `table` |
|
|
- `table_caption` |
|
|
- `table_footnote` |
|
|
- `isolate_formula` |
|
|
- `formula_caption` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### PyTorch |
|
|
|
|
|
```python |
|
|
from huggingface_hub import snapshot_download |
|
|
import sys |
|
|
|
|
|
# Download model (includes code + weights) |
|
|
repo_path = snapshot_download("anyformat-ai/doclayout-yolo-docstructbench") |
|
|
|
|
|
# Import and use |
|
|
sys.path.insert(0, repo_path) |
|
|
from doclayout_yolo import DocLayoutModel |
|
|
|
|
|
model = DocLayoutModel(f"{repo_path}/model.pt") |
|
|
results = model.predict("document.png") |
|
|
|
|
|
for det in results: |
|
|
print(f"{det['class_name']}: {det['confidence']:.2f} at {det['bbox']}") |
|
|
``` |
|
|
|
|
|
### ONNX |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
import numpy as np |
|
|
from huggingface_hub import hf_hub_download |
|
|
import json |
|
|
|
|
|
# Download ONNX model and config |
|
|
model_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "model.onnx") |
|
|
config_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "config.json") |
|
|
|
|
|
with open(config_path) as f: |
|
|
config = json.load(f) |
|
|
|
|
|
session = ort.InferenceSession(model_path) |
|
|
# Preprocess image to (1, 3, 1024, 1024) float32, normalized to [0, 1] |
|
|
# Run inference and post-process outputs |
|
|
``` |
|
|
|
|
|
## Requirements |
|
|
|
|
|
``` |
|
|
ultralytics |
|
|
huggingface-hub |
|
|
onnxruntime # for ONNX inference |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{zhao2024doclayout, |
|
|
title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception}, |
|
|
author={Zhao, Zhiyuan and Kang, Hengrui and Wang, Bin and He, Conghui}, |
|
|
journal={arXiv preprint arXiv:2410.12628}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|