adelevett's picture
Upload 2 files
8ac770e verified
---
title: PP-DocLayoutV3 Empirical Parser
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit
---
# PDF Layout Detection with PP-DocLayoutV3
Upload any PDF and get a structured breakdown of every element on the page β€”
titles, body text, tables, figures, formulas, headers, footers, footnotes, and
more β€” powered by PaddlePaddle's PP-DocLayoutV3 model via the
[docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout)
plugin.
Results are displayed as interactive JSON in the browser and can be downloaded
as a `.json` file with one click.
## How to use
1. Click **Source Document** and upload a PDF.
2. Click **Run Layout Detection**.
3. Inspect the extracted elements in the JSON panel.
4. Click **Download JSON** to save the results.
## Output format
Each detected region is returned as an object with two fields:
```json
{
"type": "SectionHeaderItem",
"content": "Introduction"
}
```
`type` reflects the docling document-model class. The table below maps the
model's raw labels to the types you will see:
| Detected region | Output type |
|---|---|
| `doc_title` | `TitleItem` |
| `paragraph_title` | `SectionHeaderItem` |
| `text`, `content`, `abstract`, `aside_text` | `TextItem` |
| `table` | `TableItem` |
| `image`, `chart`, `seal` | `PictureItem` |
| `formula` | `TextItem` (formula) |
| `footnote`, `vision_footnote` | `TextItem` (footnote) |
| `header` | `TextItem` (page header) |
| `footer` | `TextItem` (page footer) |
| `reference`, `reference_content` | `TextItem` |
| `algorithm` | `TextItem` (code) |
## Infrastructure
| Component | Detail |
|---|---|
| Hardware | ZeroGPU β€” NVIDIA H200 (70 GB VRAM, shared) |
| Layout model | [`PaddlePaddle/PP-DocLayoutV3_safetensors`](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) |
| Pipeline | [docling](https://github.com/docling-project/docling) β‰₯ 2.73 + [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) |
| SDK | Gradio 6.9.0, Python 3.10 |