Spaces:

Ayaan-Sharif
/

ocr-layout-detection-poc

Running

App Files Files Community

ocr-layout-detection-poc / README.md

Ayaan Sharif

Add signature detection with finetuned model and UI improvements

9434a85 about 1 month ago

preview code

raw

history blame contribute delete

5.32 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Document Layout Detection
emoji: 📄
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit

📄 Document Layout, Table Structure & Signature Detection

A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.

🎯 What Does This Do?

This Space automatically analyzes your documents (PDFs, images, scanned documents) to:

🏷️ Detect Layout Elements: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
📊 Extract Tables: Recognizes table structures and extracts data
🖼️ Visual Output: Shows bounding boxes around detected elements with color-coded labels
📝 Export Formats: Provides Markdown, JSON, and visual outputs
🔍 OCR Support: Automatically processes scanned documents and images
✍️ Signature Detection: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)

🚀 How to Use

Upload your document (PDF, JPG, PNG, etc.)
Choose processing mode:
- Fast: Quick processing for simple documents
- Accurate: Better quality for complex tables (slower)
Configure options:
- Enable/disable OCR
- Enable/disable table detection
Process and view results!

📚 Use Cases

Perfect for analyzing:

🆔 ID Documents: Aadhaar cards, passports, driver's licenses
📄 Forms & Applications: Government forms, surveys, questionnaires
🧾 Invoices & Receipts: Business documents with tables
📖 Research Papers: Academic documents with complex layouts
📊 Reports: Annual reports, financial statements
📰 Articles & Documents: Any structured document

🛠️ Technology

This Space uses state-of-the-art AI models:

Layout Model: Advanced neural networks for document layout analysis
Table Structure Model: TableFormer architecture for table detection and extraction
OCR Engine: Integrated OCR for text recognition in scanned documents
Framework: Modern document processing pipeline
Signature Model (Optional): Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face

🎨 Output Formats

1. Visual Visualization

Bounding boxes drawn on the document
Color-coded by element type
Labels showing detected elements

2. Markdown Export

Clean, structured text output
Preserves document hierarchy
Ready for further processing

3. JSON Data

Complete layout predictions
Bounding box coordinates
Element types and confidence scores
Machine-readable format

🌟 Features

This tool offers:

Advanced AI models for layout detection
Supports multiple input formats (PDF, images)
Accurate table structure extraction
Handles both digital and scanned documents
Exports to various formats (Markdown, JSON)
Fast and accurate processing modes

🚀 Deployment on Hugging Face Spaces

This app is ready to deploy on Hugging Face Spaces!

Setup HF_TOKEN Secret

The signature detector model is gated and requires authentication:

Go to your Space settings: Settings → Repository secrets
Add a new secret:
- Name: HF_TOKEN
- Value: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
Click Add Secret

The app will automatically use this token to download the signature model on startup.

Requirements

SDK: Gradio 5.x
Python: 3.11+
Hardware: CPU (2 cores, 18GB RAM on Spaces)
Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference

All dependencies are in requirements.txt and will be installed automatically.

🧪 Local Testing

Want to test locally?

# Install dependencies
pip install -r requirements.txt

# Set HF token (if signature model is gated)
export HF_TOKEN=hf_xxx

# Run the app locally
python app.py

Test Scripts

# Test signature detection only
python test_signature.py

# Test full document analysis
python test_analyze.py

Signature Detector Notes

The signature model weights are hosted on Hugging Face (tech4humans/yolov8s-signature-detector)
CPU inference is supported; no GPU required
The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
First run downloads ~12MB model checkpoint

📸 Examples

Signature-only examples live under sample_signature/. Try them in the "Signature Detection (Only)" tab.

OCR Engine

This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes onnxruntime in requirements.txt and configures RapidOcrOptions(backend="onnxruntime") to enforce the preferred engine.

🤝 Contributing

Found a bug or have a suggestion? Feel free to open an issue or contribute!

📝 License

App code: MIT License
Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.

Made with ❤️ for better document understanding