Spaces:
Running on Zero
Running on Zero
File size: 2,048 Bytes
258fdc9 dbe48bf 258fdc9 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e dbe48bf 8ac770e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | ---
title: PP-DocLayoutV3 Empirical Parser
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit
---
# PDF Layout Detection with PP-DocLayoutV3
Upload any PDF and get a structured breakdown of every element on the page β
titles, body text, tables, figures, formulas, headers, footers, footnotes, and
more β powered by PaddlePaddle's PP-DocLayoutV3 model via the
[docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout)
plugin.
Results are displayed as interactive JSON in the browser and can be downloaded
as a `.json` file with one click.
## How to use
1. Click **Source Document** and upload a PDF.
2. Click **Run Layout Detection**.
3. Inspect the extracted elements in the JSON panel.
4. Click **Download JSON** to save the results.
## Output format
Each detected region is returned as an object with two fields:
```json
{
"type": "SectionHeaderItem",
"content": "Introduction"
}
```
`type` reflects the docling document-model class. The table below maps the
model's raw labels to the types you will see:
| Detected region | Output type |
|---|---|
| `doc_title` | `TitleItem` |
| `paragraph_title` | `SectionHeaderItem` |
| `text`, `content`, `abstract`, `aside_text` | `TextItem` |
| `table` | `TableItem` |
| `image`, `chart`, `seal` | `PictureItem` |
| `formula` | `TextItem` (formula) |
| `footnote`, `vision_footnote` | `TextItem` (footnote) |
| `header` | `TextItem` (page header) |
| `footer` | `TextItem` (page footer) |
| `reference`, `reference_content` | `TextItem` |
| `algorithm` | `TextItem` (code) |
## Infrastructure
| Component | Detail |
|---|---|
| Hardware | ZeroGPU β NVIDIA H200 (70 GB VRAM, shared) |
| Layout model | [`PaddlePaddle/PP-DocLayoutV3_safetensors`](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) |
| Pipeline | [docling](https://github.com/docling-project/docling) β₯ 2.73 + [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) |
| SDK | Gradio 6.9.0, Python 3.10 |
|