Spaces:
Running
on
Zero
Running
on
Zero
File size: 2,504 Bytes
62cc451 8e3d376 62cc451 8e3d376 62cc451 8e3d376 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
---
title: Structured Docling
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app_hf_spaces.py
pinned: false
license: gpl-3.0
---
# Docling Structured Extraction Demo
A Gradio-based demo application for extracting structured data from documents using Docling's beta structured extraction feature.
## Features
- π Support for PDF and image files (PNG, JPG, JPEG, TIFF, BMP)
- π URL input for remote documents
- π― Customizable JSON templates for extraction
- π Optimized for Hugging Face Spaces with Zero GPU support
- π Clean JSON output with extracted data
## Files
- `app.py` - Standard Gradio application
- `app_hf_spaces.py` - Version optimized for Hugging Face Spaces with Zero GPU decorator
- `requirements.txt` - Python dependencies
## Installation
```bash
pip install -r requirements.txt
```
## Usage
### Local Development
Run the standard version:
```bash
python app.py
```
### Hugging Face Spaces
The `app_hf_spaces.py` file is specifically designed for deployment on Hugging Face Spaces with Zero GPU support.
To deploy:
1. Create a new Space on Hugging Face
2. Upload `app_hf_spaces.py` (rename to `app.py`)
3. Upload `requirements.txt`
4. Enable Zero GPU in Space settings
## How to Use the Demo
1. **Input Source**: Either upload a document file or provide a URL to a document
2. **Define Template**: Create a JSON template specifying the fields you want to extract
- Use `"string"` for text fields
- Use `"float"` for decimal numbers
- Use `"int"` for whole numbers
3. **Extract**: Click the "Extract" button to process the document
4. **View Results**: The extracted data will appear in JSON format in the output box
## Template Examples
### Simple Invoice Extraction
```json
{
"bill_no": "string",
"total": "float",
"date": "string"
}
```
### Detailed Invoice Extraction
```json
{
"bill_no": "string",
"total": "float",
"sender_name": "string",
"receiver_name": "string",
"postal_code": "string",
"city": "string"
}
```
## Notes
- The structured extraction API is currently in **beta** and may change
- Only PDF and image formats are supported
- The extraction uses Vision Language Models (VLM) for understanding document content
- Processing time depends on document complexity and size
## Requirements
- Python 3.9+
- gradio >= 4.0.0
- docling[vlm] >= 2.0.0
- spaces >= 0.19.0 (for Hugging Face Spaces deployment)
## License
This demo is provided as-is for demonstration purposes.
|