Spaces:
Running on Zero
Running on Zero
| title: PP-DocLayoutV3 Empirical Parser | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 6.9.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # PDF Layout Detection with PP-DocLayoutV3 | |
| Upload any PDF and get a structured breakdown of every element on the page β | |
| titles, body text, tables, figures, formulas, headers, footers, footnotes, and | |
| more β powered by PaddlePaddle's PP-DocLayoutV3 model via the | |
| [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) | |
| plugin. | |
| Results are displayed as interactive JSON in the browser and can be downloaded | |
| as a `.json` file with one click. | |
| ## How to use | |
| 1. Click **Source Document** and upload a PDF. | |
| 2. Click **Run Layout Detection**. | |
| 3. Inspect the extracted elements in the JSON panel. | |
| 4. Click **Download JSON** to save the results. | |
| ## Output format | |
| Each detected region is returned as an object with two fields: | |
| ```json | |
| { | |
| "type": "SectionHeaderItem", | |
| "content": "Introduction" | |
| } | |
| ``` | |
| `type` reflects the docling document-model class. The table below maps the | |
| model's raw labels to the types you will see: | |
| | Detected region | Output type | | |
| |---|---| | |
| | `doc_title` | `TitleItem` | | |
| | `paragraph_title` | `SectionHeaderItem` | | |
| | `text`, `content`, `abstract`, `aside_text` | `TextItem` | | |
| | `table` | `TableItem` | | |
| | `image`, `chart`, `seal` | `PictureItem` | | |
| | `formula` | `TextItem` (formula) | | |
| | `footnote`, `vision_footnote` | `TextItem` (footnote) | | |
| | `header` | `TextItem` (page header) | | |
| | `footer` | `TextItem` (page footer) | | |
| | `reference`, `reference_content` | `TextItem` | | |
| | `algorithm` | `TextItem` (code) | | |
| ## Infrastructure | |
| | Component | Detail | | |
| |---|---| | |
| | Hardware | ZeroGPU β NVIDIA H200 (70 GB VRAM, shared) | | |
| | Layout model | [`PaddlePaddle/PP-DocLayoutV3_safetensors`](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) | | |
| | Pipeline | [docling](https://github.com/docling-project/docling) β₯ 2.73 + [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) | | |
| | SDK | Gradio 6.9.0, Python 3.10 | | |