Spaces:

adelevett
/

docling_pp_layout_demo

Running on Zero

App Files Files Community

adelevett commited on Mar 8

Commit

8ac770e

verified ·

1 Parent(s): fabca3e

Upload 2 files

Browse files

Files changed (2) hide show

README.md +43 -29
app.py +22 -5

README.md CHANGED Viewed

@@ -10,43 +10,57 @@ pinned: false
 license: mit
 ---
-# PP-DocLayoutV3 Pipeline: Empirical Iteration Guide
-This application provides an extraction pipeline using `docling-pp-doc-layout`
-running on Hugging Face's ZeroGPU infrastructure (70 GB VRAM NVIDIA H200).
-Because instance-segmentation-based layout parsing exhibits high variance in
-memory utilisation based on polygon density and image resolution, this Space is
-engineered for iterative, data-driven optimisation.
-## Architecture
-| Component | Value |
-|---|---|
-| Hardware | Hugging Face ZeroGPU (`@spaces.GPU`, large tier — half H200) |
-| SDK | Gradio 6.9.0 |
-| Python | 3.12 (ZeroGPU supports 3.12.12 and 3.10.13; 3.13 is **not** supported) |
-| Layout model | `PaddlePaddle/PP-DocLayoutV3_safetensors` |
-| GPU timeout | 120 s (`duration=120`) |
-## Iterative Deployment Protocol
-### 1. Memory Profiling and Batch Optimisation
-`PPDocLayoutV3Options` is initialised with `batch_size=2` as a conservative
-baseline. Monitor ZeroGPU hardware logs for OOM evictions. The large tier
-provides 70 GB VRAM, so `batch_size` can be incremented sequentially until
-utilisation approaches the ceiling.
-### 2. Confidence Threshold Calibration
-`confidence_threshold=0.5` is the default decision boundary. Evaluate output
-classifications against a validation set:
-- **Higher threshold** → higher precision, fewer false positives
-- **Lower threshold** → higher recall, fewer missed bounding boxes
-### 3. Queue Latency and Hardware Timeouts
-ZeroGPU enforces a 60 s default GPU lease. The `@spaces.GPU(duration=120)`
-annotation extends this to 120 s. If empirical data shows consistent sub-60 s
-inference, reduce `duration` to improve queue priority for Space visitors.

 license: mit
 ---
+# PDF Layout Detection with PP-DocLayoutV3
+Upload any PDF and get a structured breakdown of every element on the page —
+titles, body text, tables, figures, formulas, headers, footers, footnotes, and
+more — powered by PaddlePaddle's PP-DocLayoutV3 model via the
+[docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout)
+plugin.
+Results are displayed as interactive JSON in the browser and can be downloaded
+as a `.json` file with one click.
+## How to use
+1. Click **Source Document** and upload a PDF.
+2. Click **Run Layout Detection**.
+3. Inspect the extracted elements in the JSON panel.
+4. Click **Download JSON** to save the results.
+## Output format
+Each detected region is returned as an object with two fields:
+```json
+{
+  "type": "SectionHeaderItem",
+  "content": "Introduction"
+}
+```
+`type` reflects the docling document-model class. The table below maps the
+model's raw labels to the types you will see:
+| Detected region | Output type |
+|---|---|
+| `doc_title` | `TitleItem` |
+| `paragraph_title` | `SectionHeaderItem` |
+| `text`, `content`, `abstract`, `aside_text` | `TextItem` |
+| `table` | `TableItem` |
+| `image`, `chart`, `seal` | `PictureItem` |
+| `formula` | `TextItem` (formula) |
+| `footnote`, `vision_footnote` | `TextItem` (footnote) |
+| `header` | `TextItem` (page header) |
+| `footer` | `TextItem` (page footer) |
+| `reference`, `reference_content` | `TextItem` |
+| `algorithm` | `TextItem` (code) |
+## Infrastructure
+| Component | Detail |
+|---|---|
+| Hardware | ZeroGPU — NVIDIA H200 (70 GB VRAM, shared) |
+| Layout model | [`PaddlePaddle/PP-DocLayoutV3_safetensors`](https://huggingface.co/PaddlePaddle/PP-DocLayoutV3_safetensors) |
+| Pipeline | [docling](https://github.com/docling-project/docling) ≥ 2.73 + [docling-pp-doc-layout](https://github.com/DCC-BS/docling-pp-doc-layout) |
+| SDK | Gradio 6.9.0, Python 3.10 |

app.py CHANGED Viewed

@@ -90,7 +90,7 @@ converter = DocumentConverter(
 @spaces.GPU(duration=120)
 def infer_layout(file_path: str | None):
     if not file_path:
-        return {"error": "No file uploaded"}
     try:
         result = converter.convert(file_path)
         structured_data = []
@@ -99,9 +99,16 @@ def infer_layout(file_path: str | None):
                 "type": type(item).__name__,
                 "content": getattr(item, "text", "No text mapping"),
             })
-        return structured_data
     except Exception as e:
-        return {"runtime_exception": str(e)}
 with gr.Blocks(title="PP-DocLayoutV3 Empirical Parser") as interface:
@@ -113,8 +120,18 @@ with gr.Blocks(title="PP-DocLayoutV3 Empirical Parser") as interface:
     with gr.Row():
         pdf_input = gr.File(label="Source Document", file_types=[".pdf"])
         json_output = gr.JSON(label="Structured Extraction Matrix")
-    execute_btn = gr.Button("Initialize Inference")
-    execute_btn.click(fn=infer_layout, inputs=pdf_input, outputs=json_output)
 if __name__ == "__main__":
     interface.launch()

 @spaces.GPU(duration=120)
 def infer_layout(file_path: str | None):
     if not file_path:
+        return {"error": "No file uploaded"}, None
     try:
         result = converter.convert(file_path)
         structured_data = []
                 "type": type(item).__name__,
                 "content": getattr(item, "text", "No text mapping"),
             })
+        # Write to a temp file so Gradio can serve it as a download.
+        import json, tempfile, os
+        tmp = tempfile.NamedTemporaryFile(
+            mode="w", suffix=".json", delete=False, encoding="utf-8"
+        )
+        json.dump(structured_data, tmp, ensure_ascii=False, indent=2)
+        tmp.close()
+        return structured_data, tmp.name
     except Exception as e:
+        return {"runtime_exception": str(e)}, None
 with gr.Blocks(title="PP-DocLayoutV3 Empirical Parser") as interface:
     with gr.Row():
         pdf_input = gr.File(label="Source Document", file_types=[".pdf"])
         json_output = gr.JSON(label="Structured Extraction Matrix")
+    download_btn = gr.DownloadButton(label="Download JSON", visible=False)
+    execute_btn = gr.Button("Run Layout Detection")
+    def run_and_reveal(file_path):
+        data, path = infer_layout(file_path)
+        return data, gr.DownloadButton(value=path, visible=path is not None)
+    execute_btn.click(
+        fn=run_and_reveal,
+        inputs=pdf_input,
+        outputs=[json_output, download_btn],
+    )
 if __name__ == "__main__":
     interface.launch()