--- title: Document Layout Detection emoji: ๐Ÿ“„ colorFrom: blue colorTo: green sdk: gradio sdk_version: 5.49.0 app_file: app.py pinned: false license: mit --- # ๐Ÿ“„ Document Layout, Table Structure & Signature Detection A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector. ## ๐ŸŽฏ What Does This Do? This Space automatically analyzes your documents (PDFs, images, scanned documents) to: - ๐Ÿท๏ธ **Detect Layout Elements**: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more - ๐Ÿ“Š **Extract Tables**: Recognizes table structures and extracts data - ๐Ÿ–ผ๏ธ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels - ๐Ÿ“ **Export Formats**: Provides Markdown, JSON, and visual outputs - ๐Ÿ” **OCR Support**: Automatically processes scanned documents and images - โœ๏ธ **Signature Detection**: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool) ## ๐Ÿš€ How to Use 1. **Upload** your document (PDF, JPG, PNG, etc.) 2. **Choose** processing mode: - **Fast**: Quick processing for simple documents - **Accurate**: Better quality for complex tables (slower) 3. **Configure** options: - Enable/disable OCR - Enable/disable table detection 4. **Process** and view results! ## ๐Ÿ“š Use Cases Perfect for analyzing: - ๐Ÿ†” **ID Documents**: Aadhaar cards, passports, driver's licenses - ๐Ÿ“„ **Forms & Applications**: Government forms, surveys, questionnaires - ๐Ÿงพ **Invoices & Receipts**: Business documents with tables - ๐Ÿ“– **Research Papers**: Academic documents with complex layouts - ๐Ÿ“Š **Reports**: Annual reports, financial statements - ๐Ÿ“ฐ **Articles & Documents**: Any structured document ## ๐Ÿ› ๏ธ Technology This Space uses state-of-the-art AI models: - **Layout Model**: Advanced neural networks for document layout analysis - **Table Structure Model**: TableFormer architecture for table detection and extraction - **OCR Engine**: Integrated OCR for text recognition in scanned documents - **Framework**: Modern document processing pipeline - **Signature Model (Optional)**: Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face ## ๐ŸŽจ Output Formats ### 1. Visual Visualization - Bounding boxes drawn on the document - Color-coded by element type - Labels showing detected elements ### 2. Markdown Export - Clean, structured text output - Preserves document hierarchy - Ready for further processing ### 3. JSON Data - Complete layout predictions - Bounding box coordinates - Element types and confidence scores - Machine-readable format ## ๐ŸŒŸ Features This tool offers: - Advanced AI models for layout detection - Supports multiple input formats (PDF, images) - Accurate table structure extraction - Handles both digital and scanned documents - Exports to various formats (Markdown, JSON) - Fast and accurate processing modes ## ๐Ÿš€ Deployment on Hugging Face Spaces This app is ready to deploy on Hugging Face Spaces! ### Setup HF_TOKEN Secret The signature detector model is gated and requires authentication: 1. Go to your Space settings: `Settings` โ†’ `Repository secrets` 2. Add a new secret: - **Name**: `HF_TOKEN` - **Value**: Your Hugging Face token (get it from https://huggingface.co/settings/tokens) 3. Click `Add Secret` The app will automatically use this token to download the signature model on startup. ### Requirements - SDK: Gradio 5.x - Python: 3.11+ - Hardware: CPU (2 cores, 18GB RAM on Spaces) - Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference All dependencies are in `requirements.txt` and will be installed automatically. ## ๐Ÿงช Local Testing Want to test locally? ```bash # Install dependencies pip install -r requirements.txt # Set HF token (if signature model is gated) export HF_TOKEN=hf_xxx # Run the app locally python app.py ``` ### Test Scripts ```bash # Test signature detection only python test_signature.py # Test full document analysis python test_analyze.py ``` ### Signature Detector Notes - The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`) - CPU inference is supported; no GPU required - The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores) - First run downloads ~12MB model checkpoint ## ๐Ÿ“ธ Examples Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab. ### OCR Engine - This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference. - If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine. ## ๐Ÿค Contributing Found a bug or have a suggestion? Feel free to open an issue or contribute! ## ๐Ÿ“ License - App code: MIT License - Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL. --- **Made with โค๏ธ for better document understanding**