adelevett's picture
Upload 2 files
8ac770e verified
metadata
title: PP-DocLayoutV3 Empirical Parser
emoji: πŸ“„
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit

PDF Layout Detection with PP-DocLayoutV3

Upload any PDF and get a structured breakdown of every element on the page β€” titles, body text, tables, figures, formulas, headers, footers, footnotes, and more β€” powered by PaddlePaddle's PP-DocLayoutV3 model via the docling-pp-doc-layout plugin.

Results are displayed as interactive JSON in the browser and can be downloaded as a .json file with one click.

How to use

  1. Click Source Document and upload a PDF.
  2. Click Run Layout Detection.
  3. Inspect the extracted elements in the JSON panel.
  4. Click Download JSON to save the results.

Output format

Each detected region is returned as an object with two fields:

{
  "type": "SectionHeaderItem",
  "content": "Introduction"
}

type reflects the docling document-model class. The table below maps the model's raw labels to the types you will see:

Detected region Output type
doc_title TitleItem
paragraph_title SectionHeaderItem
text, content, abstract, aside_text TextItem
table TableItem
image, chart, seal PictureItem
formula TextItem (formula)
footnote, vision_footnote TextItem (footnote)
header TextItem (page header)
footer TextItem (page footer)
reference, reference_content TextItem
algorithm TextItem (code)

Infrastructure

Component Detail
Hardware ZeroGPU β€” NVIDIA H200 (70 GB VRAM, shared)
Layout model PaddlePaddle/PP-DocLayoutV3_safetensors
Pipeline docling β‰₯ 2.73 + docling-pp-doc-layout
SDK Gradio 6.9.0, Python 3.10