--- license: apache-2.0 library_name: doclayout-yolo tags: - document-layout-analysis - object-detection - yolo - endpoints-template pipeline_tag: object-detection --- # DocLayout-YOLO for Document Layout Analysis This model is based on [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO), fine-tuned on DocStructBench for document layout detection. ## Model Description DocLayout-YOLO is a real-time and robust layout detection model for diverse documents, based on YOLO-v10. It can detect: - **title** - Document titles - **plain_text** - Regular text blocks - **figure** - Images and graphics - **figure_caption** - Captions for figures - **table** - Tables - **table_caption** - Captions for tables - **table_footnote** - Footnotes in tables - **isolate_formula** - Mathematical formulas - **formula_caption** - Captions for formulas - **abandon** - Elements to ignore ## Usage via Inference Endpoint ```python import requests import base64 API_URL = "https://your-endpoint-url.huggingface.cloud" headers = {"Authorization": "Bearer YOUR_HF_TOKEN"} # Load and encode image with open("document.png", "rb") as f: image_b64 = base64.b64encode(f.read()).decode() # Make request response = requests.post( API_URL, headers=headers, json={ "inputs": image_b64, "parameters": { "confidence": 0.2, "iou_threshold": 0.45 } } ) detections = response.json() print(detections) ``` ## Response Format ```json [ { "label": "title", "score": 0.95, "box": {"x1": 100, "y1": 50, "x2": 500, "y2": 80} }, { "label": "plain_text", "score": 0.92, "box": {"x1": 100, "y1": 100, "x2": 500, "y2": 400} } ] ``` ## Credits Based on [opendatalab/DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO) ```bibtex @misc{zhao2024doclayoutyoloenhancingdocumentlayout, title={DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception}, author={Zhiyuan Zhao and Hengrui Kang and Bin Wang and Conghui He}, year={2024}, eprint={2410.12628}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```