Spaces:

adelevett
/

docling_pp_layout_demo

Running on Zero

App Files Files Community

docling_pp_layout_demo / README.md

adelevett

Upload 2 files

8ac770e verified 1 day ago

preview code

raw

history blame contribute delete

2.05 kB

metadata

title: PP-DocLayoutV3 Empirical Parser
emoji: 📄
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit

PDF Layout Detection with PP-DocLayoutV3

Upload any PDF and get a structured breakdown of every element on the page — titles, body text, tables, figures, formulas, headers, footers, footnotes, and more — powered by PaddlePaddle's PP-DocLayoutV3 model via the docling-pp-doc-layout plugin.

Results are displayed as interactive JSON in the browser and can be downloaded as a .json file with one click.

How to use

Click Source Document and upload a PDF.
Click Run Layout Detection.
Inspect the extracted elements in the JSON panel.
Click Download JSON to save the results.

Output format

Each detected region is returned as an object with two fields:

{
  "type": "SectionHeaderItem",
  "content": "Introduction"
}

type reflects the docling document-model class. The table below maps the model's raw labels to the types you will see:

Detected region	Output type
`doc_title`	`TitleItem`
`paragraph_title`	`SectionHeaderItem`
`text`, `content`, `abstract`, `aside_text`	`TextItem`
`table`	`TableItem`
`image`, `chart`, `seal`	`PictureItem`
`formula`	`TextItem` (formula)
`footnote`, `vision_footnote`	`TextItem` (footnote)
`header`	`TextItem` (page header)
`footer`	`TextItem` (page footer)
`reference`, `reference_content`	`TextItem`
`algorithm`	`TextItem` (code)

Infrastructure

Component	Detail
Hardware	ZeroGPU — NVIDIA H200 (70 GB VRAM, shared)
Layout model	`PaddlePaddle/PP-DocLayoutV3_safetensors`
Pipeline	docling ≥ 2.73 + docling-pp-doc-layout
SDK	Gradio 6.9.0, Python 3.10