Ayaan Sharif
Add signature detection with finetuned model and UI improvements
9434a85
---
title: Document Layout Detection
emoji: πŸ“„
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit
---
# πŸ“„ Document Layout, Table Structure & Signature Detection
A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.
## 🎯 What Does This Do?
This Space automatically analyzes your documents (PDFs, images, scanned documents) to:
- 🏷️ **Detect Layout Elements**: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
- πŸ“Š **Extract Tables**: Recognizes table structures and extracts data
- πŸ–ΌοΈ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
- πŸ“ **Export Formats**: Provides Markdown, JSON, and visual outputs
- πŸ” **OCR Support**: Automatically processes scanned documents and images
- ✍️ **Signature Detection**: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)
## πŸš€ How to Use
1. **Upload** your document (PDF, JPG, PNG, etc.)
2. **Choose** processing mode:
- **Fast**: Quick processing for simple documents
- **Accurate**: Better quality for complex tables (slower)
3. **Configure** options:
- Enable/disable OCR
- Enable/disable table detection
4. **Process** and view results!
## πŸ“š Use Cases
Perfect for analyzing:
- πŸ†” **ID Documents**: Aadhaar cards, passports, driver's licenses
- πŸ“„ **Forms & Applications**: Government forms, surveys, questionnaires
- 🧾 **Invoices & Receipts**: Business documents with tables
- πŸ“– **Research Papers**: Academic documents with complex layouts
- πŸ“Š **Reports**: Annual reports, financial statements
- πŸ“° **Articles & Documents**: Any structured document
## πŸ› οΈ Technology
This Space uses state-of-the-art AI models:
- **Layout Model**: Advanced neural networks for document layout analysis
- **Table Structure Model**: TableFormer architecture for table detection and extraction
- **OCR Engine**: Integrated OCR for text recognition in scanned documents
- **Framework**: Modern document processing pipeline
- **Signature Model (Optional)**: Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face
## 🎨 Output Formats
### 1. Visual Visualization
- Bounding boxes drawn on the document
- Color-coded by element type
- Labels showing detected elements
### 2. Markdown Export
- Clean, structured text output
- Preserves document hierarchy
- Ready for further processing
### 3. JSON Data
- Complete layout predictions
- Bounding box coordinates
- Element types and confidence scores
- Machine-readable format
## 🌟 Features
This tool offers:
- Advanced AI models for layout detection
- Supports multiple input formats (PDF, images)
- Accurate table structure extraction
- Handles both digital and scanned documents
- Exports to various formats (Markdown, JSON)
- Fast and accurate processing modes
## πŸš€ Deployment on Hugging Face Spaces
This app is ready to deploy on Hugging Face Spaces!
### Setup HF_TOKEN Secret
The signature detector model is gated and requires authentication:
1. Go to your Space settings: `Settings` β†’ `Repository secrets`
2. Add a new secret:
- **Name**: `HF_TOKEN`
- **Value**: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
3. Click `Add Secret`
The app will automatically use this token to download the signature model on startup.
### Requirements
- SDK: Gradio 5.x
- Python: 3.11+
- Hardware: CPU (2 cores, 18GB RAM on Spaces)
- Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference
All dependencies are in `requirements.txt` and will be installed automatically.
## πŸ§ͺ Local Testing
Want to test locally?
```bash
# Install dependencies
pip install -r requirements.txt
# Set HF token (if signature model is gated)
export HF_TOKEN=hf_xxx
# Run the app locally
python app.py
```
### Test Scripts
```bash
# Test signature detection only
python test_signature.py
# Test full document analysis
python test_analyze.py
```
### Signature Detector Notes
- The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`)
- CPU inference is supported; no GPU required
- The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
- First run downloads ~12MB model checkpoint
## πŸ“Έ Examples
Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab.
### OCR Engine
- This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
- If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine.
## 🀝 Contributing
Found a bug or have a suggestion? Feel free to open an issue or contribute!
## πŸ“ License
- App code: MIT License
- Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.
---
**Made with ❀️ for better document understanding**