Spaces:

adelevett
/

docling_pp_layout_demo

Running on Zero

App Files Files Community

docling_pp_layout_demo / README.md

adelevett

Upload 2 files

8ac770e verified 1 day ago

preview code

raw

history blame contribute delete

2.05 kB

	---
	title: PP-DocLayoutV3 Empirical Parser
	emoji: 📄
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 6.9.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# PDF Layout Detection with PP-DocLayoutV3

	Upload any PDF and get a structured breakdown of every element on the page —
	titles, body text, tables, figures, formulas, headers, footers, footnotes, and
	more — powered by PaddlePaddle's PP-DocLayoutV3 model via the
	[docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout)
	plugin.

	Results are displayed as interactive JSON in the browser and can be downloaded
	as a `.json` file with one click.

	## How to use

	1. Click Source Document and upload a PDF.
	2. Click Run Layout Detection.
	3. Inspect the extracted elements in the JSON panel.
	4. Click Download JSON to save the results.

	## Output format

	Each detected region is returned as an object with two fields:

	```json
	{
	"type": "SectionHeaderItem",
	"content": "Introduction"
	}
	```

	`type` reflects the docling document-model class. The table below maps the
	model's raw labels to the types you will see:

	\| Detected region \| Output type \|
	\|---\|---\|
	\| `doc_title` \| `TitleItem` \|
	\| `paragraph_title` \| `SectionHeaderItem` \|
	\| `text`, `content`, `abstract`, `aside_text` \| `TextItem` \|
	\| `table` \| `TableItem` \|
	\| `image`, `chart`, `seal` \| `PictureItem` \|
	\| `formula` \| `TextItem` (formula) \|
	\| `footnote`, `vision_footnote` \| `TextItem` (footnote) \|
	\| `header` \| `TextItem` (page header) \|
	\| `footer` \| `TextItem` (page footer) \|
	\| `reference`, `reference_content` \| `TextItem` \|
	\| `algorithm` \| `TextItem` (code) \|

	## Infrastructure

	\| Component \| Detail \|
	\|---\|---\|
	\| Hardware \| ZeroGPU — NVIDIA H200 (70 GB VRAM, shared) \|
	\| Layout model \| [`PaddlePaddle/PP-DocLayoutV3_safetensors`](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) \|
	\| Pipeline \| [docling](https://github.com/docling-project/docling) ≥ 2.73 + [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) \|
	\| SDK \| Gradio 6.9.0, Python 3.10 \|