Spaces:
Running on Zero
Running on Zero
metadata
title: PP-DocLayoutV3 Empirical Parser
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit
PDF Layout Detection with PP-DocLayoutV3
Upload any PDF and get a structured breakdown of every element on the page β titles, body text, tables, figures, formulas, headers, footers, footnotes, and more β powered by PaddlePaddle's PP-DocLayoutV3 model via the docling-pp-doc-layout plugin.
Results are displayed as interactive JSON in the browser and can be downloaded
as a .json file with one click.
How to use
- Click Source Document and upload a PDF.
- Click Run Layout Detection.
- Inspect the extracted elements in the JSON panel.
- Click Download JSON to save the results.
Output format
Each detected region is returned as an object with two fields:
{
"type": "SectionHeaderItem",
"content": "Introduction"
}
type reflects the docling document-model class. The table below maps the
model's raw labels to the types you will see:
| Detected region | Output type |
|---|---|
doc_title |
TitleItem |
paragraph_title |
SectionHeaderItem |
text, content, abstract, aside_text |
TextItem |
table |
TableItem |
image, chart, seal |
PictureItem |
formula |
TextItem (formula) |
footnote, vision_footnote |
TextItem (footnote) |
header |
TextItem (page header) |
footer |
TextItem (page footer) |
reference, reference_content |
TextItem |
algorithm |
TextItem (code) |
Infrastructure
| Component | Detail |
|---|---|
| Hardware | ZeroGPU β NVIDIA H200 (70 GB VRAM, shared) |
| Layout model | PaddlePaddle/PP-DocLayoutV3_safetensors |
| Pipeline | docling β₯ 2.73 + docling-pp-doc-layout |
| SDK | Gradio 6.9.0, Python 3.10 |