File size: 5,318 Bytes
513efc5
933ba3b
 
 
 
513efc5
933ba3b
513efc5
 
933ba3b
513efc5
 
9434a85
933ba3b
9434a85
933ba3b
 
 
 
 
 
 
 
 
 
9434a85
933ba3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9434a85
933ba3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9434a85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
933ba3b
 
9434a85
933ba3b
 
 
 
 
9434a85
 
 
933ba3b
 
9434a85
 
 
933ba3b
9434a85
 
 
 
 
 
933ba3b
 
9434a85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
933ba3b
 
 
 
 
 
9434a85
 
933ba3b
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
title: Document Layout Detection
emoji: πŸ“„
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
license: mit
---

# πŸ“„ Document Layout, Table Structure & Signature Detection

A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.

## 🎯 What Does This Do?

This Space automatically analyzes your documents (PDFs, images, scanned documents) to:

- 🏷️ **Detect Layout Elements**: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
- πŸ“Š **Extract Tables**: Recognizes table structures and extracts data
- πŸ–ΌοΈ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
- πŸ“ **Export Formats**: Provides Markdown, JSON, and visual outputs
- πŸ” **OCR Support**: Automatically processes scanned documents and images
 - ✍️ **Signature Detection**: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)

## πŸš€ How to Use

1. **Upload** your document (PDF, JPG, PNG, etc.)
2. **Choose** processing mode:
   - **Fast**: Quick processing for simple documents
   - **Accurate**: Better quality for complex tables (slower)
3. **Configure** options:
   - Enable/disable OCR
   - Enable/disable table detection
4. **Process** and view results!

## πŸ“š Use Cases

Perfect for analyzing:
- πŸ†” **ID Documents**: Aadhaar cards, passports, driver's licenses
- πŸ“„ **Forms & Applications**: Government forms, surveys, questionnaires
- 🧾 **Invoices & Receipts**: Business documents with tables
- πŸ“– **Research Papers**: Academic documents with complex layouts
- πŸ“Š **Reports**: Annual reports, financial statements
- πŸ“° **Articles & Documents**: Any structured document

## πŸ› οΈ Technology

This Space uses state-of-the-art AI models:

- **Layout Model**: Advanced neural networks for document layout analysis
- **Table Structure Model**: TableFormer architecture for table detection and extraction
- **OCR Engine**: Integrated OCR for text recognition in scanned documents
- **Framework**: Modern document processing pipeline
- **Signature Model (Optional)**: Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face

## 🎨 Output Formats

### 1. Visual Visualization
- Bounding boxes drawn on the document
- Color-coded by element type
- Labels showing detected elements

### 2. Markdown Export
- Clean, structured text output
- Preserves document hierarchy
- Ready for further processing

### 3. JSON Data
- Complete layout predictions
- Bounding box coordinates
- Element types and confidence scores
- Machine-readable format

## 🌟 Features

This tool offers:
- Advanced AI models for layout detection
- Supports multiple input formats (PDF, images)
- Accurate table structure extraction
- Handles both digital and scanned documents
- Exports to various formats (Markdown, JSON)
- Fast and accurate processing modes

## πŸš€ Deployment on Hugging Face Spaces

This app is ready to deploy on Hugging Face Spaces!

### Setup HF_TOKEN Secret

The signature detector model is gated and requires authentication:

1. Go to your Space settings: `Settings` β†’ `Repository secrets`
2. Add a new secret:
   - **Name**: `HF_TOKEN`
   - **Value**: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
3. Click `Add Secret`

The app will automatically use this token to download the signature model on startup.

### Requirements

- SDK: Gradio 5.x
- Python: 3.11+
- Hardware: CPU (2 cores, 18GB RAM on Spaces)
- Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference

All dependencies are in `requirements.txt` and will be installed automatically.

## πŸ§ͺ Local Testing

Want to test locally?

```bash
# Install dependencies
pip install -r requirements.txt

# Set HF token (if signature model is gated)
export HF_TOKEN=hf_xxx

# Run the app locally
python app.py
```

### Test Scripts

```bash
# Test signature detection only
python test_signature.py

# Test full document analysis
python test_analyze.py
```

### Signature Detector Notes

- The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`)
- CPU inference is supported; no GPU required
- The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
- First run downloads ~12MB model checkpoint

## πŸ“Έ Examples

Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab.

### OCR Engine

- This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
- If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine.

## 🀝 Contributing

Found a bug or have a suggestion? Feel free to open an issue or contribute!

## πŸ“ License

- App code: MIT License
- Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.

---

**Made with ❀️ for better document understanding**