Spaces:
Running
on
Zero
Running
on
Zero
| title: Structured Docling | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app_hf_spaces.py | |
| pinned: false | |
| license: gpl-3.0 | |
| # Docling Structured Extraction Demo | |
| A Gradio-based demo application for extracting structured data from documents using Docling's beta structured extraction feature. | |
| ## Features | |
| - π Support for PDF and image files (PNG, JPG, JPEG, TIFF, BMP) | |
| - π URL input for remote documents | |
| - π― Customizable JSON templates for extraction | |
| - π Optimized for Hugging Face Spaces with Zero GPU support | |
| - π Clean JSON output with extracted data | |
| ## Files | |
| - `app.py` - Standard Gradio application | |
| - `app_hf_spaces.py` - Version optimized for Hugging Face Spaces with Zero GPU decorator | |
| - `requirements.txt` - Python dependencies | |
| ## Installation | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Usage | |
| ### Local Development | |
| Run the standard version: | |
| ```bash | |
| python app.py | |
| ``` | |
| ### Hugging Face Spaces | |
| The `app_hf_spaces.py` file is specifically designed for deployment on Hugging Face Spaces with Zero GPU support. | |
| To deploy: | |
| 1. Create a new Space on Hugging Face | |
| 2. Upload `app_hf_spaces.py` (rename to `app.py`) | |
| 3. Upload `requirements.txt` | |
| 4. Enable Zero GPU in Space settings | |
| ## How to Use the Demo | |
| 1. **Input Source**: Either upload a document file or provide a URL to a document | |
| 2. **Define Template**: Create a JSON template specifying the fields you want to extract | |
| - Use `"string"` for text fields | |
| - Use `"float"` for decimal numbers | |
| - Use `"int"` for whole numbers | |
| 3. **Extract**: Click the "Extract" button to process the document | |
| 4. **View Results**: The extracted data will appear in JSON format in the output box | |
| ## Template Examples | |
| ### Simple Invoice Extraction | |
| ```json | |
| { | |
| "bill_no": "string", | |
| "total": "float", | |
| "date": "string" | |
| } | |
| ``` | |
| ### Detailed Invoice Extraction | |
| ```json | |
| { | |
| "bill_no": "string", | |
| "total": "float", | |
| "sender_name": "string", | |
| "receiver_name": "string", | |
| "postal_code": "string", | |
| "city": "string" | |
| } | |
| ``` | |
| ## Notes | |
| - The structured extraction API is currently in **beta** and may change | |
| - Only PDF and image formats are supported | |
| - The extraction uses Vision Language Models (VLM) for understanding document content | |
| - Processing time depends on document complexity and size | |
| ## Requirements | |
| - Python 3.9+ | |
| - gradio >= 4.0.0 | |
| - docling[vlm] >= 2.0.0 | |
| - spaces >= 0.19.0 (for Hugging Face Spaces deployment) | |
| ## License | |
| This demo is provided as-is for demonstration purposes. | |