DocLayout-YOLO - Docstructbench

Document layout detection model based on DocLayout-YOLO.

Model Description

  • Architecture: YOLOv10m with G2L_CRM (Global-to-Local Context Refining Module)
  • Classes: 10 document layout elements
  • Input Size: 1024x1024
  • Paper: DocLayout-YOLO

Classes

  • title
  • plain_text
  • abandon
  • figure
  • figure_caption
  • table
  • table_caption
  • table_footnote
  • isolate_formula
  • formula_caption

Usage

PyTorch

from huggingface_hub import snapshot_download
import sys

# Download model (includes code + weights)
repo_path = snapshot_download("anyformat-ai/doclayout-yolo-docstructbench")

# Import and use
sys.path.insert(0, repo_path)
from doclayout_yolo import DocLayoutModel

model = DocLayoutModel(f"{repo_path}/model.pt")
results = model.predict("document.png")

for det in results:
    print(f"{det['class_name']}: {det['confidence']:.2f} at {det['bbox']}")

ONNX

import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
import json

# Download ONNX model and config
model_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "model.onnx")
config_path = hf_hub_download("anyformat-ai/doclayout-yolo-docstructbench", "config.json")

with open(config_path) as f:
    config = json.load(f)

session = ort.InferenceSession(model_path)
# Preprocess image to (1, 3, 1024, 1024) float32, normalized to [0, 1]
# Run inference and post-process outputs

Requirements

ultralytics
huggingface-hub
onnxruntime  # for ONNX inference

Citation

@article{zhao2024doclayout,
  title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception},
  author={Zhao, Zhiyuan and Kang, Hengrui and Wang, Bin and He, Conghui},
  journal={arXiv preprint arXiv:2410.12628},
  year={2024}
}

License

Apache 2.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for anyformat/doclayout-yolo-docstructbench