--- title: PP-DocLayoutV3 Empirical Parser emoji: 📄 colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 6.9.0 app_file: app.py pinned: false license: mit --- # PDF Layout Detection with PP-DocLayoutV3 Upload any PDF and get a structured breakdown of every element on the page — titles, body text, tables, figures, formulas, headers, footers, footnotes, and more — powered by PaddlePaddle's PP-DocLayoutV3 model via the [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) plugin. Results are displayed as interactive JSON in the browser and can be downloaded as a `.json` file with one click. ## How to use 1. Click **Source Document** and upload a PDF. 2. Click **Run Layout Detection**. 3. Inspect the extracted elements in the JSON panel. 4. Click **Download JSON** to save the results. ## Output format Each detected region is returned as an object with two fields: ```json { "type": "SectionHeaderItem", "content": "Introduction" } ``` `type` reflects the docling document-model class. The table below maps the model's raw labels to the types you will see: | Detected region | Output type | |---|---| | `doc_title` | `TitleItem` | | `paragraph_title` | `SectionHeaderItem` | | `text`, `content`, `abstract`, `aside_text` | `TextItem` | | `table` | `TableItem` | | `image`, `chart`, `seal` | `PictureItem` | | `formula` | `TextItem` (formula) | | `footnote`, `vision_footnote` | `TextItem` (footnote) | | `header` | `TextItem` (page header) | | `footer` | `TextItem` (page footer) | | `reference`, `reference_content` | `TextItem` | | `algorithm` | `TextItem` (code) | ## Infrastructure | Component | Detail | |---|---| | Hardware | ZeroGPU — NVIDIA H200 (70 GB VRAM, shared) | | Layout model | [`PaddlePaddle/PP-DocLayoutV3_safetensors`](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) | | Pipeline | [docling](https://github.com/docling-project/docling) ≥ 2.73 + [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) | | SDK | Gradio 6.9.0, Python 3.10 |