Spaces:

Ayaan-Sharif
/

ocr-layout-detection-poc

Running

App Files Files Community

ocr-layout-detection-poc / README.md

Ayaan Sharif

Add signature detection with finetuned model and UI improvements

9434a85 about 1 month ago

preview code

raw

history blame contribute delete

5.32 kB

	---
	title: Document Layout Detection
	emoji: 📄
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: 5.49.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# 📄 Document Layout, Table Structure & Signature Detection

	A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.

	## 🎯 What Does This Do?

	This Space automatically analyzes your documents (PDFs, images, scanned documents) to:

	- 🏷️ Detect Layout Elements: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
	- 📊 Extract Tables: Recognizes table structures and extracts data
	- 🖼️ Visual Output: Shows bounding boxes around detected elements with color-coded labels
	- 📝 Export Formats: Provides Markdown, JSON, and visual outputs
	- 🔍 OCR Support: Automatically processes scanned documents and images
	- ✍️ Signature Detection: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)

	## 🚀 How to Use

	1. Upload your document (PDF, JPG, PNG, etc.)
	2. Choose processing mode:
	- Fast: Quick processing for simple documents
	- Accurate: Better quality for complex tables (slower)
	3. Configure options:
	- Enable/disable OCR
	- Enable/disable table detection
	4. Process and view results!

	## 📚 Use Cases

	Perfect for analyzing:
	- 🆔 ID Documents: Aadhaar cards, passports, driver's licenses
	- 📄 Forms & Applications: Government forms, surveys, questionnaires
	- 🧾 Invoices & Receipts: Business documents with tables
	- 📖 Research Papers: Academic documents with complex layouts
	- 📊 Reports: Annual reports, financial statements
	- 📰 Articles & Documents: Any structured document

	## 🛠️ Technology

	This Space uses state-of-the-art AI models:

	- Layout Model: Advanced neural networks for document layout analysis
	- Table Structure Model: TableFormer architecture for table detection and extraction
	- OCR Engine: Integrated OCR for text recognition in scanned documents
	- Framework: Modern document processing pipeline
	- Signature Model (Optional): Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face

	## 🎨 Output Formats

	### 1. Visual Visualization
	- Bounding boxes drawn on the document
	- Color-coded by element type
	- Labels showing detected elements

	### 2. Markdown Export
	- Clean, structured text output
	- Preserves document hierarchy
	- Ready for further processing

	### 3. JSON Data
	- Complete layout predictions
	- Bounding box coordinates
	- Element types and confidence scores
	- Machine-readable format

	## 🌟 Features

	This tool offers:
	- Advanced AI models for layout detection
	- Supports multiple input formats (PDF, images)
	- Accurate table structure extraction
	- Handles both digital and scanned documents
	- Exports to various formats (Markdown, JSON)
	- Fast and accurate processing modes

	## 🚀 Deployment on Hugging Face Spaces

	This app is ready to deploy on Hugging Face Spaces!

	### Setup HF_TOKEN Secret

	The signature detector model is gated and requires authentication:

	1. Go to your Space settings: `Settings` → `Repository secrets`
	2. Add a new secret:
	- Name: `HF_TOKEN`
	- Value: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
	3. Click `Add Secret`

	The app will automatically use this token to download the signature model on startup.

	### Requirements

	- SDK: Gradio 5.x
	- Python: 3.11+
	- Hardware: CPU (2 cores, 18GB RAM on Spaces)
	- Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference

	All dependencies are in `requirements.txt` and will be installed automatically.

	## 🧪 Local Testing

	Want to test locally?

	```bash
	# Install dependencies
	pip install -r requirements.txt

	# Set HF token (if signature model is gated)
	export HF_TOKEN=hf_xxx

	# Run the app locally
	python app.py
	```

	### Test Scripts

	```bash
	# Test signature detection only
	python test_signature.py

	# Test full document analysis
	python test_analyze.py
	```

	### Signature Detector Notes

	- The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`)
	- CPU inference is supported; no GPU required
	- The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
	- First run downloads ~12MB model checkpoint

	## 📸 Examples

	Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab.

	### OCR Engine

	- This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
	- If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine.

	## 🤝 Contributing

	Found a bug or have a suggestion? Feel free to open an issue or contribute!

	## 📝 License

	- App code: MIT License
	- Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.

	---

	Made with ❤️ for better document understanding