A newer version of the Gradio SDK is available:
6.1.0
title: Document Layout Detection
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit
π Document Layout, Table Structure & Signature Detection
A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.
π― What Does This Do?
This Space automatically analyzes your documents (PDFs, images, scanned documents) to:
- π·οΈ Detect Layout Elements: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
- π Extract Tables: Recognizes table structures and extracts data
- πΌοΈ Visual Output: Shows bounding boxes around detected elements with color-coded labels
- π Export Formats: Provides Markdown, JSON, and visual outputs
- π OCR Support: Automatically processes scanned documents and images
- βοΈ Signature Detection: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)
π How to Use
- Upload your document (PDF, JPG, PNG, etc.)
- Choose processing mode:
- Fast: Quick processing for simple documents
- Accurate: Better quality for complex tables (slower)
- Configure options:
- Enable/disable OCR
- Enable/disable table detection
- Process and view results!
π Use Cases
Perfect for analyzing:
- π ID Documents: Aadhaar cards, passports, driver's licenses
- π Forms & Applications: Government forms, surveys, questionnaires
- π§Ύ Invoices & Receipts: Business documents with tables
- π Research Papers: Academic documents with complex layouts
- π Reports: Annual reports, financial statements
- π° Articles & Documents: Any structured document
π οΈ Technology
This Space uses state-of-the-art AI models:
- Layout Model: Advanced neural networks for document layout analysis
- Table Structure Model: TableFormer architecture for table detection and extraction
- OCR Engine: Integrated OCR for text recognition in scanned documents
- Framework: Modern document processing pipeline
- Signature Model (Optional): Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face
π¨ Output Formats
1. Visual Visualization
- Bounding boxes drawn on the document
- Color-coded by element type
- Labels showing detected elements
2. Markdown Export
- Clean, structured text output
- Preserves document hierarchy
- Ready for further processing
3. JSON Data
- Complete layout predictions
- Bounding box coordinates
- Element types and confidence scores
- Machine-readable format
π Features
This tool offers:
- Advanced AI models for layout detection
- Supports multiple input formats (PDF, images)
- Accurate table structure extraction
- Handles both digital and scanned documents
- Exports to various formats (Markdown, JSON)
- Fast and accurate processing modes
π Deployment on Hugging Face Spaces
This app is ready to deploy on Hugging Face Spaces!
Setup HF_TOKEN Secret
The signature detector model is gated and requires authentication:
- Go to your Space settings:
SettingsβRepository secrets - Add a new secret:
- Name:
HF_TOKEN - Value: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
- Name:
- Click
Add Secret
The app will automatically use this token to download the signature model on startup.
Requirements
- SDK: Gradio 5.x
- Python: 3.11+
- Hardware: CPU (2 cores, 18GB RAM on Spaces)
- Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference
All dependencies are in requirements.txt and will be installed automatically.
π§ͺ Local Testing
Want to test locally?
# Install dependencies
pip install -r requirements.txt
# Set HF token (if signature model is gated)
export HF_TOKEN=hf_xxx
# Run the app locally
python app.py
Test Scripts
# Test signature detection only
python test_signature.py
# Test full document analysis
python test_analyze.py
Signature Detector Notes
- The signature model weights are hosted on Hugging Face (
tech4humans/yolov8s-signature-detector) - CPU inference is supported; no GPU required
- The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
- First run downloads ~12MB model checkpoint
πΈ Examples
Signature-only examples live under sample_signature/. Try them in the "Signature Detection (Only)" tab.
OCR Engine
- This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
- If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes
onnxruntimeinrequirements.txtand configuresRapidOcrOptions(backend="onnxruntime")to enforce the preferred engine.
π€ Contributing
Found a bug or have a suggestion? Feel free to open an issue or contribute!
π License
- App code: MIT License
- Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.
Made with β€οΈ for better document understanding