Ayaan Sharif
Add signature detection with finetuned model and UI improvements
9434a85

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Document Layout Detection
emoji: πŸ“„
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit

πŸ“„ Document Layout, Table Structure & Signature Detection

A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.

🎯 What Does This Do?

This Space automatically analyzes your documents (PDFs, images, scanned documents) to:

  • 🏷️ Detect Layout Elements: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
  • πŸ“Š Extract Tables: Recognizes table structures and extracts data
  • πŸ–ΌοΈ Visual Output: Shows bounding boxes around detected elements with color-coded labels
  • πŸ“ Export Formats: Provides Markdown, JSON, and visual outputs
  • πŸ” OCR Support: Automatically processes scanned documents and images
  • ✍️ Signature Detection: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)

πŸš€ How to Use

  1. Upload your document (PDF, JPG, PNG, etc.)
  2. Choose processing mode:
    • Fast: Quick processing for simple documents
    • Accurate: Better quality for complex tables (slower)
  3. Configure options:
    • Enable/disable OCR
    • Enable/disable table detection
  4. Process and view results!

πŸ“š Use Cases

Perfect for analyzing:

  • πŸ†” ID Documents: Aadhaar cards, passports, driver's licenses
  • πŸ“„ Forms & Applications: Government forms, surveys, questionnaires
  • 🧾 Invoices & Receipts: Business documents with tables
  • πŸ“– Research Papers: Academic documents with complex layouts
  • πŸ“Š Reports: Annual reports, financial statements
  • πŸ“° Articles & Documents: Any structured document

πŸ› οΈ Technology

This Space uses state-of-the-art AI models:

  • Layout Model: Advanced neural networks for document layout analysis
  • Table Structure Model: TableFormer architecture for table detection and extraction
  • OCR Engine: Integrated OCR for text recognition in scanned documents
  • Framework: Modern document processing pipeline
  • Signature Model (Optional): Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face

🎨 Output Formats

1. Visual Visualization

  • Bounding boxes drawn on the document
  • Color-coded by element type
  • Labels showing detected elements

2. Markdown Export

  • Clean, structured text output
  • Preserves document hierarchy
  • Ready for further processing

3. JSON Data

  • Complete layout predictions
  • Bounding box coordinates
  • Element types and confidence scores
  • Machine-readable format

🌟 Features

This tool offers:

  • Advanced AI models for layout detection
  • Supports multiple input formats (PDF, images)
  • Accurate table structure extraction
  • Handles both digital and scanned documents
  • Exports to various formats (Markdown, JSON)
  • Fast and accurate processing modes

πŸš€ Deployment on Hugging Face Spaces

This app is ready to deploy on Hugging Face Spaces!

Setup HF_TOKEN Secret

The signature detector model is gated and requires authentication:

  1. Go to your Space settings: Settings β†’ Repository secrets
  2. Add a new secret:
  3. Click Add Secret

The app will automatically use this token to download the signature model on startup.

Requirements

  • SDK: Gradio 5.x
  • Python: 3.11+
  • Hardware: CPU (2 cores, 18GB RAM on Spaces)
  • Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference

All dependencies are in requirements.txt and will be installed automatically.

πŸ§ͺ Local Testing

Want to test locally?

# Install dependencies
pip install -r requirements.txt

# Set HF token (if signature model is gated)
export HF_TOKEN=hf_xxx

# Run the app locally
python app.py

Test Scripts

# Test signature detection only
python test_signature.py

# Test full document analysis
python test_analyze.py

Signature Detector Notes

  • The signature model weights are hosted on Hugging Face (tech4humans/yolov8s-signature-detector)
  • CPU inference is supported; no GPU required
  • The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
  • First run downloads ~12MB model checkpoint

πŸ“Έ Examples

Signature-only examples live under sample_signature/. Try them in the "Signature Detection (Only)" tab.

OCR Engine

  • This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
  • If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes onnxruntime in requirements.txt and configures RapidOcrOptions(backend="onnxruntime") to enforce the preferred engine.

🀝 Contributing

Found a bug or have a suggestion? Feel free to open an issue or contribute!

πŸ“ License

  • App code: MIT License
  • Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.

Made with ❀️ for better document understanding