Spaces:

flozi00
/

structured-docling

Running on Zero

App Files Files Community

structured-docling / README.md

flozi00

first try

8e3d376 about 1 month ago

preview code

raw

history blame contribute delete

2.5 kB

	---
	title: Structured Docling
	emoji: 📄
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app_hf_spaces.py
	pinned: false
	license: gpl-3.0
	---

	# Docling Structured Extraction Demo

	A Gradio-based demo application for extracting structured data from documents using Docling's beta structured extraction feature.

	## Features

	- 📄 Support for PDF and image files (PNG, JPG, JPEG, TIFF, BMP)
	- 🌐 URL input for remote documents
	- 🎯 Customizable JSON templates for extraction
	- 🚀 Optimized for Hugging Face Spaces with Zero GPU support
	- 📊 Clean JSON output with extracted data

	## Files

	- `app.py` - Standard Gradio application
	- `app_hf_spaces.py` - Version optimized for Hugging Face Spaces with Zero GPU decorator
	- `requirements.txt` - Python dependencies

	## Installation

	```bash
	pip install -r requirements.txt
	```

	## Usage

	### Local Development

	Run the standard version:
	```bash
	python app.py
	```

	### Hugging Face Spaces

	The `app_hf_spaces.py` file is specifically designed for deployment on Hugging Face Spaces with Zero GPU support.

	To deploy:
	1. Create a new Space on Hugging Face
	2. Upload `app_hf_spaces.py` (rename to `app.py`)
	3. Upload `requirements.txt`
	4. Enable Zero GPU in Space settings

	## How to Use the Demo

	1. Input Source: Either upload a document file or provide a URL to a document
	2. Define Template: Create a JSON template specifying the fields you want to extract
	- Use `"string"` for text fields
	- Use `"float"` for decimal numbers
	- Use `"int"` for whole numbers
	3. Extract: Click the "Extract" button to process the document
	4. View Results: The extracted data will appear in JSON format in the output box

	## Template Examples

	### Simple Invoice Extraction
	```json
	{
	"bill_no": "string",
	"total": "float",
	"date": "string"
	}
	```

	### Detailed Invoice Extraction
	```json
	{
	"bill_no": "string",
	"total": "float",
	"sender_name": "string",
	"receiver_name": "string",
	"postal_code": "string",
	"city": "string"
	}
	```

	## Notes

	- The structured extraction API is currently in beta and may change
	- Only PDF and image formats are supported
	- The extraction uses Vision Language Models (VLM) for understanding document content
	- Processing time depends on document complexity and size

	## Requirements

	- Python 3.9+
	- gradio >= 4.0.0
	- docling[vlm] >= 2.0.0
	- spaces >= 0.19.0 (for Hugging Face Spaces deployment)

	## License

	This demo is provided as-is for demonstration purposes.