|
|
--- |
|
|
title: Document Layout Detection |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 5.49.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# π Document Layout, Table Structure & Signature Detection |
|
|
|
|
|
A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector. |
|
|
|
|
|
## π― What Does This Do? |
|
|
|
|
|
This Space automatically analyzes your documents (PDFs, images, scanned documents) to: |
|
|
|
|
|
- π·οΈ **Detect Layout Elements**: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more |
|
|
- π **Extract Tables**: Recognizes table structures and extracts data |
|
|
- πΌοΈ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels |
|
|
- π **Export Formats**: Provides Markdown, JSON, and visual outputs |
|
|
- π **OCR Support**: Automatically processes scanned documents and images |
|
|
- βοΈ **Signature Detection**: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool) |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
1. **Upload** your document (PDF, JPG, PNG, etc.) |
|
|
2. **Choose** processing mode: |
|
|
- **Fast**: Quick processing for simple documents |
|
|
- **Accurate**: Better quality for complex tables (slower) |
|
|
3. **Configure** options: |
|
|
- Enable/disable OCR |
|
|
- Enable/disable table detection |
|
|
4. **Process** and view results! |
|
|
|
|
|
## π Use Cases |
|
|
|
|
|
Perfect for analyzing: |
|
|
- π **ID Documents**: Aadhaar cards, passports, driver's licenses |
|
|
- π **Forms & Applications**: Government forms, surveys, questionnaires |
|
|
- π§Ύ **Invoices & Receipts**: Business documents with tables |
|
|
- π **Research Papers**: Academic documents with complex layouts |
|
|
- π **Reports**: Annual reports, financial statements |
|
|
- π° **Articles & Documents**: Any structured document |
|
|
|
|
|
## π οΈ Technology |
|
|
|
|
|
This Space uses state-of-the-art AI models: |
|
|
|
|
|
- **Layout Model**: Advanced neural networks for document layout analysis |
|
|
- **Table Structure Model**: TableFormer architecture for table detection and extraction |
|
|
- **OCR Engine**: Integrated OCR for text recognition in scanned documents |
|
|
- **Framework**: Modern document processing pipeline |
|
|
- **Signature Model (Optional)**: Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face |
|
|
|
|
|
## π¨ Output Formats |
|
|
|
|
|
### 1. Visual Visualization |
|
|
- Bounding boxes drawn on the document |
|
|
- Color-coded by element type |
|
|
- Labels showing detected elements |
|
|
|
|
|
### 2. Markdown Export |
|
|
- Clean, structured text output |
|
|
- Preserves document hierarchy |
|
|
- Ready for further processing |
|
|
|
|
|
### 3. JSON Data |
|
|
- Complete layout predictions |
|
|
- Bounding box coordinates |
|
|
- Element types and confidence scores |
|
|
- Machine-readable format |
|
|
|
|
|
## π Features |
|
|
|
|
|
This tool offers: |
|
|
- Advanced AI models for layout detection |
|
|
- Supports multiple input formats (PDF, images) |
|
|
- Accurate table structure extraction |
|
|
- Handles both digital and scanned documents |
|
|
- Exports to various formats (Markdown, JSON) |
|
|
- Fast and accurate processing modes |
|
|
|
|
|
## π Deployment on Hugging Face Spaces |
|
|
|
|
|
This app is ready to deploy on Hugging Face Spaces! |
|
|
|
|
|
### Setup HF_TOKEN Secret |
|
|
|
|
|
The signature detector model is gated and requires authentication: |
|
|
|
|
|
1. Go to your Space settings: `Settings` β `Repository secrets` |
|
|
2. Add a new secret: |
|
|
- **Name**: `HF_TOKEN` |
|
|
- **Value**: Your Hugging Face token (get it from https://huggingface.co/settings/tokens) |
|
|
3. Click `Add Secret` |
|
|
|
|
|
The app will automatically use this token to download the signature model on startup. |
|
|
|
|
|
### Requirements |
|
|
|
|
|
- SDK: Gradio 5.x |
|
|
- Python: 3.11+ |
|
|
- Hardware: CPU (2 cores, 18GB RAM on Spaces) |
|
|
- Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference |
|
|
|
|
|
All dependencies are in `requirements.txt` and will be installed automatically. |
|
|
|
|
|
## π§ͺ Local Testing |
|
|
|
|
|
Want to test locally? |
|
|
|
|
|
```bash |
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Set HF token (if signature model is gated) |
|
|
export HF_TOKEN=hf_xxx |
|
|
|
|
|
# Run the app locally |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
### Test Scripts |
|
|
|
|
|
```bash |
|
|
# Test signature detection only |
|
|
python test_signature.py |
|
|
|
|
|
# Test full document analysis |
|
|
python test_analyze.py |
|
|
``` |
|
|
|
|
|
### Signature Detector Notes |
|
|
|
|
|
- The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`) |
|
|
- CPU inference is supported; no GPU required |
|
|
- The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores) |
|
|
- First run downloads ~12MB model checkpoint |
|
|
|
|
|
## πΈ Examples |
|
|
|
|
|
Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab. |
|
|
|
|
|
### OCR Engine |
|
|
|
|
|
- This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference. |
|
|
- If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine. |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
Found a bug or have a suggestion? Feel free to open an issue or contribute! |
|
|
|
|
|
## π License |
|
|
|
|
|
- App code: MIT License |
|
|
- Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL. |
|
|
|
|
|
--- |
|
|
|
|
|
**Made with β€οΈ for better document understanding** |
|
|
|