adelevett commited on
Commit
8ac770e
·
verified ·
1 Parent(s): fabca3e

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +43 -29
  2. app.py +22 -5
README.md CHANGED
@@ -10,43 +10,57 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # PP-DocLayoutV3 Pipeline: Empirical Iteration Guide
14
 
15
- This application provides an extraction pipeline using `docling-pp-doc-layout`
16
- running on Hugging Face's ZeroGPU infrastructure (70 GB VRAM NVIDIA H200).
17
- Because instance-segmentation-based layout parsing exhibits high variance in
18
- memory utilisation based on polygon density and image resolution, this Space is
19
- engineered for iterative, data-driven optimisation.
20
 
21
- ## Architecture
 
22
 
23
- | Component | Value |
24
- |---|---|
25
- | Hardware | Hugging Face ZeroGPU (`@spaces.GPU`, large tier — half H200) |
26
- | SDK | Gradio 6.9.0 |
27
- | Python | 3.12 (ZeroGPU supports 3.12.12 and 3.10.13; 3.13 is **not** supported) |
28
- | Layout model | `PaddlePaddle/PP-DocLayoutV3_safetensors` |
29
- | GPU timeout | 120 s (`duration=120`) |
30
 
31
- ## Iterative Deployment Protocol
 
 
 
32
 
33
- ### 1. Memory Profiling and Batch Optimisation
34
 
35
- `PPDocLayoutV3Options` is initialised with `batch_size=2` as a conservative
36
- baseline. Monitor ZeroGPU hardware logs for OOM evictions. The large tier
37
- provides 70 GB VRAM, so `batch_size` can be incremented sequentially until
38
- utilisation approaches the ceiling.
39
 
40
- ### 2. Confidence Threshold Calibration
 
 
 
 
 
41
 
42
- `confidence_threshold=0.5` is the default decision boundary. Evaluate output
43
- classifications against a validation set:
44
 
45
- - **Higher threshold** higher precision, fewer false positives
46
- - **Lower threshold** → higher recall, fewer missed bounding boxes
 
 
 
 
 
 
 
 
 
 
 
47
 
48
- ### 3. Queue Latency and Hardware Timeouts
49
 
50
- ZeroGPU enforces a 60 s default GPU lease. The `@spaces.GPU(duration=120)`
51
- annotation extends this to 120 s. If empirical data shows consistent sub-60 s
52
- inference, reduce `duration` to improve queue priority for Space visitors.
 
 
 
 
10
  license: mit
11
  ---
12
 
13
+ # PDF Layout Detection with PP-DocLayoutV3
14
 
15
+ Upload any PDF and get a structured breakdown of every element on the page —
16
+ titles, body text, tables, figures, formulas, headers, footers, footnotes, and
17
+ more powered by PaddlePaddle's PP-DocLayoutV3 model via the
18
+ [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout)
19
+ plugin.
20
 
21
+ Results are displayed as interactive JSON in the browser and can be downloaded
22
+ as a `.json` file with one click.
23
 
24
+ ## How to use
 
 
 
 
 
 
25
 
26
+ 1. Click **Source Document** and upload a PDF.
27
+ 2. Click **Run Layout Detection**.
28
+ 3. Inspect the extracted elements in the JSON panel.
29
+ 4. Click **Download JSON** to save the results.
30
 
31
+ ## Output format
32
 
33
+ Each detected region is returned as an object with two fields:
 
 
 
34
 
35
+ ```json
36
+ {
37
+ "type": "SectionHeaderItem",
38
+ "content": "Introduction"
39
+ }
40
+ ```
41
 
42
+ `type` reflects the docling document-model class. The table below maps the
43
+ model's raw labels to the types you will see:
44
 
45
+ | Detected region | Output type |
46
+ |---|---|
47
+ | `doc_title` | `TitleItem` |
48
+ | `paragraph_title` | `SectionHeaderItem` |
49
+ | `text`, `content`, `abstract`, `aside_text` | `TextItem` |
50
+ | `table` | `TableItem` |
51
+ | `image`, `chart`, `seal` | `PictureItem` |
52
+ | `formula` | `TextItem` (formula) |
53
+ | `footnote`, `vision_footnote` | `TextItem` (footnote) |
54
+ | `header` | `TextItem` (page header) |
55
+ | `footer` | `TextItem` (page footer) |
56
+ | `reference`, `reference_content` | `TextItem` |
57
+ | `algorithm` | `TextItem` (code) |
58
 
59
+ ## Infrastructure
60
 
61
+ | Component | Detail |
62
+ |---|---|
63
+ | Hardware | ZeroGPU NVIDIA H200 (70 GB VRAM, shared) |
64
+ | Layout model | [`PaddlePaddle/PP-DocLayoutV3_safetensors`](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) |
65
+ | Pipeline | [docling](https://github.com/docling-project/docling) ≥ 2.73 + [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) |
66
+ | SDK | Gradio 6.9.0, Python 3.10 |
app.py CHANGED
@@ -90,7 +90,7 @@ converter = DocumentConverter(
90
  @spaces.GPU(duration=120)
91
  def infer_layout(file_path: str | None):
92
  if not file_path:
93
- return {"error": "No file uploaded"}
94
  try:
95
  result = converter.convert(file_path)
96
  structured_data = []
@@ -99,9 +99,16 @@ def infer_layout(file_path: str | None):
99
  "type": type(item).__name__,
100
  "content": getattr(item, "text", "No text mapping"),
101
  })
102
- return structured_data
 
 
 
 
 
 
 
103
  except Exception as e:
104
- return {"runtime_exception": str(e)}
105
 
106
 
107
  with gr.Blocks(title="PP-DocLayoutV3 Empirical Parser") as interface:
@@ -113,8 +120,18 @@ with gr.Blocks(title="PP-DocLayoutV3 Empirical Parser") as interface:
113
  with gr.Row():
114
  pdf_input = gr.File(label="Source Document", file_types=[".pdf"])
115
  json_output = gr.JSON(label="Structured Extraction Matrix")
116
- execute_btn = gr.Button("Initialize Inference")
117
- execute_btn.click(fn=infer_layout, inputs=pdf_input, outputs=json_output)
 
 
 
 
 
 
 
 
 
 
118
 
119
  if __name__ == "__main__":
120
  interface.launch()
 
90
  @spaces.GPU(duration=120)
91
  def infer_layout(file_path: str | None):
92
  if not file_path:
93
+ return {"error": "No file uploaded"}, None
94
  try:
95
  result = converter.convert(file_path)
96
  structured_data = []
 
99
  "type": type(item).__name__,
100
  "content": getattr(item, "text", "No text mapping"),
101
  })
102
+ # Write to a temp file so Gradio can serve it as a download.
103
+ import json, tempfile, os
104
+ tmp = tempfile.NamedTemporaryFile(
105
+ mode="w", suffix=".json", delete=False, encoding="utf-8"
106
+ )
107
+ json.dump(structured_data, tmp, ensure_ascii=False, indent=2)
108
+ tmp.close()
109
+ return structured_data, tmp.name
110
  except Exception as e:
111
+ return {"runtime_exception": str(e)}, None
112
 
113
 
114
  with gr.Blocks(title="PP-DocLayoutV3 Empirical Parser") as interface:
 
120
  with gr.Row():
121
  pdf_input = gr.File(label="Source Document", file_types=[".pdf"])
122
  json_output = gr.JSON(label="Structured Extraction Matrix")
123
+ download_btn = gr.DownloadButton(label="Download JSON", visible=False)
124
+ execute_btn = gr.Button("Run Layout Detection")
125
+
126
+ def run_and_reveal(file_path):
127
+ data, path = infer_layout(file_path)
128
+ return data, gr.DownloadButton(value=path, visible=path is not None)
129
+
130
+ execute_btn.click(
131
+ fn=run_and_reveal,
132
+ inputs=pdf_input,
133
+ outputs=[json_output, download_btn],
134
+ )
135
 
136
  if __name__ == "__main__":
137
  interface.launch()