File size: 5,318 Bytes
513efc5 933ba3b 513efc5 933ba3b 513efc5 933ba3b 513efc5 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b 9434a85 933ba3b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
---
title: Document Layout Detection
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit
---
# π Document Layout, Table Structure & Signature Detection
A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.
## π― What Does This Do?
This Space automatically analyzes your documents (PDFs, images, scanned documents) to:
- π·οΈ **Detect Layout Elements**: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
- π **Extract Tables**: Recognizes table structures and extracts data
- πΌοΈ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
- π **Export Formats**: Provides Markdown, JSON, and visual outputs
- π **OCR Support**: Automatically processes scanned documents and images
- βοΈ **Signature Detection**: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)
## π How to Use
1. **Upload** your document (PDF, JPG, PNG, etc.)
2. **Choose** processing mode:
- **Fast**: Quick processing for simple documents
- **Accurate**: Better quality for complex tables (slower)
3. **Configure** options:
- Enable/disable OCR
- Enable/disable table detection
4. **Process** and view results!
## π Use Cases
Perfect for analyzing:
- π **ID Documents**: Aadhaar cards, passports, driver's licenses
- π **Forms & Applications**: Government forms, surveys, questionnaires
- π§Ύ **Invoices & Receipts**: Business documents with tables
- π **Research Papers**: Academic documents with complex layouts
- π **Reports**: Annual reports, financial statements
- π° **Articles & Documents**: Any structured document
## π οΈ Technology
This Space uses state-of-the-art AI models:
- **Layout Model**: Advanced neural networks for document layout analysis
- **Table Structure Model**: TableFormer architecture for table detection and extraction
- **OCR Engine**: Integrated OCR for text recognition in scanned documents
- **Framework**: Modern document processing pipeline
- **Signature Model (Optional)**: Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face
## π¨ Output Formats
### 1. Visual Visualization
- Bounding boxes drawn on the document
- Color-coded by element type
- Labels showing detected elements
### 2. Markdown Export
- Clean, structured text output
- Preserves document hierarchy
- Ready for further processing
### 3. JSON Data
- Complete layout predictions
- Bounding box coordinates
- Element types and confidence scores
- Machine-readable format
## π Features
This tool offers:
- Advanced AI models for layout detection
- Supports multiple input formats (PDF, images)
- Accurate table structure extraction
- Handles both digital and scanned documents
- Exports to various formats (Markdown, JSON)
- Fast and accurate processing modes
## π Deployment on Hugging Face Spaces
This app is ready to deploy on Hugging Face Spaces!
### Setup HF_TOKEN Secret
The signature detector model is gated and requires authentication:
1. Go to your Space settings: `Settings` β `Repository secrets`
2. Add a new secret:
- **Name**: `HF_TOKEN`
- **Value**: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
3. Click `Add Secret`
The app will automatically use this token to download the signature model on startup.
### Requirements
- SDK: Gradio 5.x
- Python: 3.11+
- Hardware: CPU (2 cores, 18GB RAM on Spaces)
- Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference
All dependencies are in `requirements.txt` and will be installed automatically.
## π§ͺ Local Testing
Want to test locally?
```bash
# Install dependencies
pip install -r requirements.txt
# Set HF token (if signature model is gated)
export HF_TOKEN=hf_xxx
# Run the app locally
python app.py
```
### Test Scripts
```bash
# Test signature detection only
python test_signature.py
# Test full document analysis
python test_analyze.py
```
### Signature Detector Notes
- The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`)
- CPU inference is supported; no GPU required
- The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
- First run downloads ~12MB model checkpoint
## πΈ Examples
Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab.
### OCR Engine
- This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
- If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine.
## π€ Contributing
Found a bug or have a suggestion? Feel free to open an issue or contribute!
## π License
- App code: MIT License
- Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.
---
**Made with β€οΈ for better document understanding**
|