Ayaan Sharif commited on
Commit
9434a85
Β·
1 Parent(s): 1d76058

Add signature detection with finetuned model and UI improvements

Browse files

- Integrate tech4humans/yolov8s-signature-detector
- Add signature overlay in Analyze tab
- Add dedicated Signature Detection tab
- Add input preview images (240px fixed height)
- Enforce RapidOCR ONNX backend
- Reorganize UI with top-level tabs
- Add sample_signature/ folder with 3 examples
- Update README with deployment instructions

README.md CHANGED
@@ -10,9 +10,9 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # πŸ“„ Document Layout & Table Structure Detection
14
 
15
- A powerful AI-powered tool for automatically detecting document layout and structure.
16
 
17
  ## 🎯 What Does This Do?
18
 
@@ -23,6 +23,7 @@ This Space automatically analyzes your documents (PDFs, images, scanned document
23
  - πŸ–ΌοΈ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
24
  - πŸ“ **Export Formats**: Provides Markdown, JSON, and visual outputs
25
  - πŸ” **OCR Support**: Automatically processes scanned documents and images
 
26
 
27
  ## πŸš€ How to Use
28
 
@@ -53,6 +54,7 @@ This Space uses state-of-the-art AI models:
53
  - **Table Structure Model**: TableFormer architecture for table detection and extraction
54
  - **OCR Engine**: Integrated OCR for text recognition in scanned documents
55
  - **Framework**: Modern document processing pipeline
 
56
 
57
  ## 🎨 Output Formats
58
 
@@ -82,28 +84,80 @@ This tool offers:
82
  - Exports to various formats (Markdown, JSON)
83
  - Fast and accurate processing modes
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  ## πŸ§ͺ Local Testing
86
 
87
- Want to test locally? Check out `test_local.py` in this repository.
88
 
89
  ```bash
90
  # Install dependencies
91
  pip install -r requirements.txt
92
 
 
 
 
93
  # Run the app locally
94
  python app.py
 
 
 
95
 
96
- # Or test on a specific file
97
- python test_local.py path/to/your/document.pdf
 
 
 
 
98
  ```
99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  ## 🀝 Contributing
101
 
102
  Found a bug or have a suggestion? Feel free to open an issue or contribute!
103
 
104
  ## πŸ“ License
105
 
106
- MIT License - Feel free to use and modify for your projects.
 
107
 
108
  ---
109
 
 
10
  license: mit
11
  ---
12
 
13
+ # πŸ“„ Document Layout, Table Structure & Signature Detection
14
 
15
+ A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.
16
 
17
  ## 🎯 What Does This Do?
18
 
 
23
  - πŸ–ΌοΈ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
24
  - πŸ“ **Export Formats**: Provides Markdown, JSON, and visual outputs
25
  - πŸ” **OCR Support**: Automatically processes scanned documents and images
26
+ - ✍️ **Signature Detection**: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)
27
 
28
  ## πŸš€ How to Use
29
 
 
54
  - **Table Structure Model**: TableFormer architecture for table detection and extraction
55
  - **OCR Engine**: Integrated OCR for text recognition in scanned documents
56
  - **Framework**: Modern document processing pipeline
57
+ - **Signature Model (Optional)**: Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face
58
 
59
  ## 🎨 Output Formats
60
 
 
84
  - Exports to various formats (Markdown, JSON)
85
  - Fast and accurate processing modes
86
 
87
+ ## πŸš€ Deployment on Hugging Face Spaces
88
+
89
+ This app is ready to deploy on Hugging Face Spaces!
90
+
91
+ ### Setup HF_TOKEN Secret
92
+
93
+ The signature detector model is gated and requires authentication:
94
+
95
+ 1. Go to your Space settings: `Settings` β†’ `Repository secrets`
96
+ 2. Add a new secret:
97
+ - **Name**: `HF_TOKEN`
98
+ - **Value**: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
99
+ 3. Click `Add Secret`
100
+
101
+ The app will automatically use this token to download the signature model on startup.
102
+
103
+ ### Requirements
104
+
105
+ - SDK: Gradio 5.x
106
+ - Python: 3.11+
107
+ - Hardware: CPU (2 cores, 18GB RAM on Spaces)
108
+ - Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference
109
+
110
+ All dependencies are in `requirements.txt` and will be installed automatically.
111
+
112
  ## πŸ§ͺ Local Testing
113
 
114
+ Want to test locally?
115
 
116
  ```bash
117
  # Install dependencies
118
  pip install -r requirements.txt
119
 
120
+ # Set HF token (if signature model is gated)
121
+ export HF_TOKEN=hf_xxx
122
+
123
  # Run the app locally
124
  python app.py
125
+ ```
126
+
127
+ ### Test Scripts
128
 
129
+ ```bash
130
+ # Test signature detection only
131
+ python test_signature.py
132
+
133
+ # Test full document analysis
134
+ python test_analyze.py
135
  ```
136
 
137
+ ### Signature Detector Notes
138
+
139
+ - The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`)
140
+ - CPU inference is supported; no GPU required
141
+ - The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
142
+ - First run downloads ~12MB model checkpoint
143
+
144
+ ## πŸ“Έ Examples
145
+
146
+ Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab.
147
+
148
+ ### OCR Engine
149
+
150
+ - This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
151
+ - If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine.
152
+
153
  ## 🀝 Contributing
154
 
155
  Found a bug or have a suggestion? Feel free to open an issue or contribute!
156
 
157
  ## πŸ“ License
158
 
159
+ - App code: MIT License
160
+ - Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.
161
 
162
  ---
163
 
app.py CHANGED
@@ -1,11 +1,27 @@
1
  import gradio as gr
2
  from docling.document_converter import DocumentConverter
3
  from docling.datamodel.base_models import InputFormat
4
- from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode
5
  from docling.document_converter import PdfFormatOption
6
  from PIL import Image, ImageDraw, ImageFont
7
  import json
8
  import fitz # PyMuPDF
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  # Color mapping for different layout elements
11
  COLORS = {
@@ -37,6 +53,91 @@ COLORS = {
37
  "other": "#CCCCCC",
38
  }
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  def draw_layout_boxes(image_path, layout_data, scale_x=1.0, scale_y=1.0):
41
  """Draw bounding boxes on the image based on layout predictions"""
42
  # Open the image
@@ -92,7 +193,7 @@ def draw_layout_boxes(image_path, layout_data, scale_x=1.0, scale_y=1.0):
92
 
93
  return img
94
 
95
- def process_document(file_path, mode, enable_ocr, enable_tables):
96
  """Process document with Docling and return results"""
97
  try:
98
  # Configure pipeline options
@@ -106,6 +207,12 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
106
  pipeline_options.table_structure_options.mode = TableFormerMode.FAST
107
 
108
  pipeline_options.do_ocr = enable_ocr
 
 
 
 
 
 
109
  pipeline_options.generate_page_images = True
110
  pipeline_options.generate_picture_images = True
111
  pipeline_options.do_picture_classification = True # Enable classification
@@ -198,6 +305,7 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
198
 
199
  # Create visualization for first page
200
  visualization = None
 
201
  if result.pages and layout_info:
202
  # Draw boxes on first page only
203
  first_page_layout = [item for item in layout_info if item["page"] == 1]
@@ -210,6 +318,7 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
210
  # For images: Open directly, coordinates should match 1:1
211
  first_page_image = Image.open(file_path).convert("RGB")
212
  # No scaling needed for images - coordinates are already in pixels
 
213
  visualization = draw_layout_boxes(first_page_image, first_page_layout,
214
  scale_x=1.0, scale_y=1.0)
215
  else:
@@ -234,6 +343,7 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
234
 
235
  doc.close()
236
 
 
237
  # Draw boxes with calculated scale
238
  visualization = draw_layout_boxes(first_page_image, first_page_layout,
239
  scale_x=scale_x, scale_y=scale_y)
@@ -241,6 +351,25 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
241
  print(f"Could not create visualization: {e}")
242
  import traceback
243
  traceback.print_exc()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
244
 
245
  # Create summary
246
  summary = f"""## Document Analysis Summary
@@ -269,12 +398,117 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
269
  error_msg = f"Error processing document: {str(e)}"
270
  return None, error_msg, error_msg, error_msg
271
 
272
- def gradio_interface(file, mode, enable_ocr, enable_tables):
273
  """Gradio interface function"""
274
  if file is None:
275
  return None, "Please upload a document", "", ""
276
 
277
- return process_document(file.name, mode, enable_ocr, enable_tables)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
278
 
279
  # Create Gradio interface
280
  with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as demo:
@@ -289,51 +523,158 @@ with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as dem
289
  - **OCR Support**: Reads text from scanned documents and images
290
  """)
291
 
292
- with gr.Row():
293
- with gr.Column(scale=1):
294
- file_input = gr.File(
295
- label="Upload Document",
296
- file_types=[".pdf", ".jpg", ".jpeg", ".png", ".tiff", ".bmp"]
297
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
298
 
299
- mode_dropdown = gr.Dropdown(
300
- choices=["Fast", "Accurate"],
301
- value="Fast",
302
- label="Processing Mode",
303
- info="Accurate mode is slower but better for complex tables"
304
- )
305
 
306
- ocr_checkbox = gr.Checkbox(
307
- label="Enable OCR",
308
- value=True,
309
- info="Use OCR for scanned documents and images"
310
- )
 
 
 
 
311
 
312
- tables_checkbox = gr.Checkbox(
313
- label="Enable Table Detection",
314
- value=True,
315
- info="Detect and extract table structures"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
316
  )
317
 
318
- process_btn = gr.Button("πŸš€ Process Document", variant="primary", size="lg")
319
-
320
- with gr.Column(scale=2):
321
- visualization_output = gr.Image(label="Layout Visualization (First Page)")
322
- summary_output = gr.Markdown(label="Summary")
323
-
324
- with gr.Tabs():
325
- with gr.Tab("πŸ“ Markdown Output"):
326
- markdown_output = gr.Textbox(
327
- label="Extracted Content (Markdown)",
328
- lines=20,
329
- max_lines=30
330
  )
331
-
332
- with gr.Tab("πŸ”§ JSON Layout Data"):
333
- json_output = gr.Code(
334
- label="Layout Predictions (JSON)",
335
- language="json",
336
- lines=20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
337
  )
338
 
339
  gr.Markdown("""
@@ -357,36 +698,19 @@ with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as dem
357
  Click on any example document to see instant results on different document types.
358
  """)
359
 
360
- # Add examples with image previews
361
- with gr.Row():
362
- gr.Examples(
363
- examples=[
364
- ["sample/Screenshot 2025-10-13 114010.png", "Fast", True, True],
365
- ["sample/Screenshot 2025-10-13 114606.png", "Fast", True, True],
366
- ["sample/Screenshot 2025-10-15 191615.png", "Fast", True, True],
367
- ],
368
- inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
369
- outputs=[visualization_output, summary_output, markdown_output, json_output],
370
- fn=gradio_interface,
371
- cache_examples=False,
372
- label="πŸ“š Example Documents",
373
- examples_per_page=3
374
- )
375
-
376
- # Connect the button
377
- process_btn.click(
378
- fn=gradio_interface,
379
- inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
380
- outputs=[visualization_output, summary_output, markdown_output, json_output]
381
- )
382
-
383
- # Auto-process on file upload (optional)
384
- file_input.change(
385
- fn=gradio_interface,
386
- inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
387
- outputs=[visualization_output, summary_output, markdown_output, json_output]
388
- )
389
 
390
  # Launch the app
391
  if __name__ == "__main__":
 
 
 
 
 
 
 
 
 
 
 
392
  demo.launch()
 
1
  import gradio as gr
2
  from docling.document_converter import DocumentConverter
3
  from docling.datamodel.base_models import InputFormat
4
+ from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode, RapidOcrOptions
5
  from docling.document_converter import PdfFormatOption
6
  from PIL import Image, ImageDraw, ImageFont
7
  import json
8
  import fitz # PyMuPDF
9
+ import os
10
+ from dotenv import load_dotenv
11
+ import io
12
+ import numpy as np
13
+ import cv2
14
+ from typing import List, Tuple, Optional
15
+
16
+ # Optional imports for signature detection
17
+ try:
18
+ import supervision as sv
19
+ from ultralytics import YOLO
20
+ from huggingface_hub import hf_hub_download
21
+ except Exception:
22
+ sv = None
23
+ YOLO = None
24
+ hf_hub_download = None
25
 
26
  # Color mapping for different layout elements
27
  COLORS = {
 
53
  "other": "#CCCCCC",
54
  }
55
 
56
+ # Load environment variables from .env if present (useful for HF_TOKEN)
57
+ try:
58
+ load_dotenv()
59
+ except Exception:
60
+ pass
61
+
62
+ # ------------- Signature Model Utilities -------------
63
+ _SIGNATURE_MODEL = None
64
+
65
+
66
+ def load_signature_model() -> Optional["YOLO"]:
67
+ """Load and cache the YOLOv8s signature model from Hugging Face.
68
+
69
+ Returns None if dependencies are missing.
70
+ """
71
+ global _SIGNATURE_MODEL
72
+ if _SIGNATURE_MODEL is not None:
73
+ return _SIGNATURE_MODEL
74
+ if YOLO is None or hf_hub_download is None:
75
+ return None
76
+ try:
77
+ # Use token from env if model is gated
78
+ model_path = hf_hub_download(
79
+ repo_id="tech4humans/yolov8s-signature-detector",
80
+ filename="yolov8s.pt",
81
+ token=os.environ.get("HF_TOKEN")
82
+ )
83
+ _SIGNATURE_MODEL = YOLO(model_path)
84
+ return _SIGNATURE_MODEL
85
+ except Exception as e:
86
+ print(f"Could not load signature model: {e}")
87
+ return None
88
+
89
+
90
+ def yolo_detect_signatures(
91
+ image_bgr: np.ndarray,
92
+ imgsz: int = 1280,
93
+ conf: float = 0.05,
94
+ iou: float = 0.45,
95
+ augment: bool = True,
96
+ ) -> List[Tuple[np.ndarray, float, int]]:
97
+ """Run YOLO signature detection on a BGR image.
98
+
99
+ Returns list of (xyxy np.array[4], score float, class_idx int)
100
+ """
101
+ model = load_signature_model()
102
+ if model is None:
103
+ return []
104
+ try:
105
+ results = model(image_bgr, imgsz=imgsz, conf=conf, iou=iou, augment=augment)
106
+ r = results[0]
107
+ boxes = []
108
+ if hasattr(r, "boxes") and r.boxes is not None:
109
+ xyxy = r.boxes.xyxy.cpu().numpy()
110
+ scores = r.boxes.conf.cpu().numpy()
111
+ classes = r.boxes.cls.cpu().numpy().astype(int)
112
+ for b, s, c in zip(xyxy, scores, classes):
113
+ boxes.append((b, float(s), int(c)))
114
+ return boxes
115
+ except Exception as e:
116
+ print(f"YOLO detection failed: {e}")
117
+ return []
118
+
119
+
120
+ def annotate_signature_boxes_on_pil(img_pil: Image.Image, boxes: List[Tuple[np.ndarray, float, int]]) -> Image.Image:
121
+ """Draw signature boxes on a PIL image and return annotated copy."""
122
+ if not boxes:
123
+ return img_pil
124
+ img = img_pil.copy()
125
+ draw = ImageDraw.Draw(img)
126
+ # Try fonts
127
+ try:
128
+ font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 16)
129
+ except Exception:
130
+ font = ImageFont.load_default()
131
+ color = COLORS.get("signature", "#9D4EDD")
132
+ for (xyxy, score, cls) in boxes:
133
+ x1, y1, x2, y2 = map(int, xyxy)
134
+ draw.rectangle([x1, y1, x2, y2], outline=color, width=3)
135
+ label = f"Signature {score*100:.0f}%"
136
+ bbox_text = draw.textbbox((x1, y1 - 22), label, font=font)
137
+ draw.rectangle([bbox_text[0] - 2, bbox_text[1] - 2, bbox_text[2] + 2, bbox_text[3] + 2], fill=color)
138
+ draw.text((x1, y1 - 22), label, fill="white", font=font)
139
+ return img
140
+
141
  def draw_layout_boxes(image_path, layout_data, scale_x=1.0, scale_y=1.0):
142
  """Draw bounding boxes on the image based on layout predictions"""
143
  # Open the image
 
193
 
194
  return img
195
 
196
+ def process_document(file_path, mode, enable_ocr, enable_tables, run_signature_yolo=False, signature_conf=0.05):
197
  """Process document with Docling and return results"""
198
  try:
199
  # Configure pipeline options
 
207
  pipeline_options.table_structure_options.mode = TableFormerMode.FAST
208
 
209
  pipeline_options.do_ocr = enable_ocr
210
+ if enable_ocr:
211
+ # Force RapidOCR with ONNX backend for fast & accurate CPU inference
212
+ pipeline_options.ocr_options = RapidOcrOptions(
213
+ backend="onnxruntime",
214
+ force_full_page_ocr=True,
215
+ )
216
  pipeline_options.generate_page_images = True
217
  pipeline_options.generate_picture_images = True
218
  pipeline_options.do_picture_classification = True # Enable classification
 
305
 
306
  # Create visualization for first page
307
  visualization = None
308
+ first_page_base_image = None # PIL image in pixel space used for overlays
309
  if result.pages and layout_info:
310
  # Draw boxes on first page only
311
  first_page_layout = [item for item in layout_info if item["page"] == 1]
 
318
  # For images: Open directly, coordinates should match 1:1
319
  first_page_image = Image.open(file_path).convert("RGB")
320
  # No scaling needed for images - coordinates are already in pixels
321
+ first_page_base_image = first_page_image
322
  visualization = draw_layout_boxes(first_page_image, first_page_layout,
323
  scale_x=1.0, scale_y=1.0)
324
  else:
 
343
 
344
  doc.close()
345
 
346
+ first_page_base_image = first_page_image
347
  # Draw boxes with calculated scale
348
  visualization = draw_layout_boxes(first_page_image, first_page_layout,
349
  scale_x=scale_x, scale_y=scale_y)
 
351
  print(f"Could not create visualization: {e}")
352
  import traceback
353
  traceback.print_exc()
354
+
355
+ # Optionally run YOLO signature detection on the same first-page image and overlay
356
+ if run_signature_yolo and first_page_base_image is not None:
357
+ try:
358
+ # Convert PIL RGB to BGR numpy for YOLO
359
+ img_bgr = cv2.cvtColor(np.array(first_page_base_image), cv2.COLOR_RGB2BGR)
360
+ sig_boxes = yolo_detect_signatures(
361
+ img_bgr,
362
+ imgsz=1280,
363
+ conf=float(signature_conf),
364
+ iou=0.45,
365
+ augment=True,
366
+ )
367
+ if sig_boxes:
368
+ # Overlay signature boxes on top of visualization
369
+ base_for_overlay = visualization if visualization is not None else first_page_base_image
370
+ visualization = annotate_signature_boxes_on_pil(base_for_overlay, sig_boxes)
371
+ except Exception as e:
372
+ print(f"Signature overlay failed: {e}")
373
 
374
  # Create summary
375
  summary = f"""## Document Analysis Summary
 
398
  error_msg = f"Error processing document: {str(e)}"
399
  return None, error_msg, error_msg, error_msg
400
 
401
+ def gradio_interface(file, mode, enable_ocr, enable_tables, run_signature_yolo=False, signature_conf=0.05):
402
  """Gradio interface function"""
403
  if file is None:
404
  return None, "Please upload a document", "", ""
405
 
406
+ return process_document(file.name, mode, enable_ocr, enable_tables, run_signature_yolo, signature_conf)
407
+
408
+
409
+ # -------- Small preview helper (first page / image) --------
410
+ def preview_first_page(file: gr.File):
411
+ """Return filepath for preview. For PDFs, extract first page as temp image."""
412
+ if file is None:
413
+ return None
414
+ try:
415
+ path = file.name
416
+ ext = (os.path.splitext(path)[1] or "").lower()
417
+ if ext in (".pdf",):
418
+ # For PDF, render first page to temp image
419
+ import tempfile
420
+ doc = fitz.open(path)
421
+ page = doc[0]
422
+ pix = page.get_pixmap(matrix=fitz.Matrix(1.5, 1.5))
423
+ img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
424
+ doc.close()
425
+ tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
426
+ img.save(tmp.name)
427
+ return tmp.name
428
+ else:
429
+ # For images, return path directly
430
+ return path
431
+ except Exception:
432
+ return None
433
+
434
+
435
+ def analyze_with_preview(file, mode, enable_ocr, enable_tables, run_signature_yolo=False, signature_conf=0.05):
436
+ """Wrapper to also return an input preview for Examples clicks."""
437
+ preview = preview_first_page(file)
438
+ vis, summ, md, js = gradio_interface(file, mode, enable_ocr, enable_tables, run_signature_yolo, signature_conf)
439
+ return preview, vis, summ, md, js
440
+
441
+
442
+ def signature_only_with_preview(file, try_scales, conf, iou, augment):
443
+ """Wrapper to also return an input preview for Examples clicks."""
444
+ preview = preview_first_page(file)
445
+ img, summ, js = signature_only_infer(file, try_scales, conf, iou, augment)
446
+ return preview, img, summ, js
447
+
448
+ # -------- Signature-only utilities (full-image, no ROI) --------
449
+ def signature_only_infer(
450
+ file: gr.File,
451
+ try_scales: bool,
452
+ conf: float,
453
+ iou: float,
454
+ augment: bool,
455
+ ):
456
+ if file is None:
457
+ return None, "Upload an image or PDF", "[]"
458
+
459
+ # Load source image (first page for PDFs)
460
+ path = file.name
461
+ ext = (os.path.splitext(path)[1] or "").lower()
462
+ if ext in (".pdf",):
463
+ doc = fitz.open(path)
464
+ page = doc[0]
465
+ pix = page.get_pixmap(matrix=fitz.Matrix(2, 2))
466
+ base_rgb = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
467
+ doc.close()
468
+ else:
469
+ base_rgb = Image.open(path).convert("RGB")
470
+
471
+ base_bgr = cv2.cvtColor(np.array(base_rgb), cv2.COLOR_RGB2BGR)
472
+
473
+ scales = [1.0, 1.5, 2.0] if try_scales else [1.0]
474
+ best = None
475
+ all_boxes_mapped = []
476
+ rh, rw = base_bgr.shape[:2]
477
+
478
+ for s in scales:
479
+ tw, th = int(rw * s), int(rh * s)
480
+ resized = cv2.resize(base_bgr, (tw, th), interpolation=cv2.INTER_CUBIC)
481
+ boxes = yolo_detect_signatures(resized, imgsz=1280, conf=conf, iou=iou, augment=augment)
482
+ if not boxes:
483
+ continue
484
+ sx, sy = rw / max(1, tw), rh / max(1, th)
485
+ for (xyxy, score, cls) in boxes:
486
+ xb1, yb1, xb2, yb2 = xyxy
487
+ # Map back to original image coords
488
+ x1o = xb1 * sx
489
+ y1o = yb1 * sy
490
+ x2o = xb2 * sx
491
+ y2o = yb2 * sy
492
+ mapped = (np.array([x1o, y1o, x2o, y2o]), float(score), int(cls))
493
+ all_boxes_mapped.append(mapped)
494
+ if best is None or score > best[1]:
495
+ best = mapped
496
+
497
+ # Annotate and prepare outputs
498
+ annotated = annotate_signature_boxes_on_pil(base_rgb, all_boxes_mapped)
499
+ det_json = [
500
+ {
501
+ "bbox": list(map(lambda v: float(v), xyxy.tolist() if hasattr(xyxy, "tolist") else list(xyxy))),
502
+ "score": float(score),
503
+ "class": int(cls)
504
+ }
505
+ for (xyxy, score, cls) in all_boxes_mapped
506
+ ]
507
+ summary = (
508
+ f"Detections: {len(all_boxes_mapped)}" +
509
+ (f" | Best score: {best[1]:.3f}" if best else " | No detections above threshold")
510
+ )
511
+ return annotated, summary, json.dumps(det_json, indent=2)
512
 
513
  # Create Gradio interface
514
  with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as demo:
 
523
  - **OCR Support**: Reads text from scanned documents and images
524
  """)
525
 
526
+ # Top-level tabs: Analyze and Signature Detection
527
+ with gr.Tabs() as top_tabs:
528
+ with gr.Tab("πŸ“„ Analyze"):
529
+ with gr.Row():
530
+ with gr.Column(scale=1):
531
+ file_input = gr.File(
532
+ label="Upload Document",
533
+ file_types=[".pdf", ".jpg", ".jpeg", ".png", ".tiff", ".bmp"]
534
+ )
535
+ input_preview = gr.Image(label="Input Preview", type="filepath", height=240, interactive=False, show_label=True)
536
+
537
+ mode_dropdown = gr.Dropdown(
538
+ choices=["Fast", "Accurate"],
539
+ value="Fast",
540
+ label="Processing Mode",
541
+ info="Accurate mode is slower but better for complex tables"
542
+ )
543
+
544
+ ocr_checkbox = gr.Checkbox(
545
+ label="Enable OCR",
546
+ value=True,
547
+ info="Use OCR for scanned documents and images"
548
+ )
549
+
550
+ tables_checkbox = gr.Checkbox(
551
+ label="Enable Table Detection",
552
+ value=True,
553
+ info="Detect and extract table structures"
554
+ )
555
+
556
+ process_btn = gr.Button("πŸš€ Process Document", variant="primary", size="lg")
557
+ run_sig_chk = gr.Checkbox(label="Also detect signatures (Finetuned Signature Model)", value=False)
558
+ sig_conf_slider = gr.Slider(minimum=0.01, maximum=0.5, step=0.01, value=0.05, label="Signature confidence")
559
+
560
+ with gr.Column(scale=2):
561
+ visualization_output = gr.Image(label="Layout Visualization (First Page)")
562
+ summary_output = gr.Markdown(label="Summary")
563
+
564
+ with gr.Tabs():
565
+ with gr.Tab("πŸ“ Markdown Output"):
566
+ markdown_output = gr.Textbox(
567
+ label="Extracted Content (Markdown)",
568
+ lines=20,
569
+ max_lines=30
570
+ )
571
+
572
+ with gr.Tab("πŸ”§ JSON Layout Data"):
573
+ json_output = gr.Code(
574
+ label="Layout Predictions (JSON)",
575
+ language="json",
576
+ lines=20
577
+ )
578
+
579
+ gr.Markdown("""
580
+ ### Legend
581
+ Different colors represent different document elements:
582
 
583
+ **Layout Elements:**
584
+ - πŸ”΄ Title β€’ πŸ”΅ Text β€’ 🟒 Section Header β€’ 🟠 Table β€’ 🟣 List/Figure/Formula
 
 
 
 
585
 
586
+ **Picture Classifications (AI-detected):**
587
+ - 🟣 Signature β€’ 🟒 QR Code β€’ 🟒 Barcode β€’ 🟑 Logo β€’ πŸ”΄ Stamp
588
+ - 🟦 Charts (Bar/Pie/Line) β€’ 🟣 Flow Chart β€’ 🟠 Screenshot β€’ βšͺ Other
589
+
590
+ ### How to Use
591
+ 1. Upload your document (PDF or image of ID card, invoice, report, etc.)
592
+ 2. Choose processing options (Fast mode recommended for quick results)
593
+ 3. Click "Process Document"
594
+ 4. View the visualization with bounding boxes and explore the outputs
595
 
596
+ ### πŸ’‘ Try Examples Below!
597
+ Click on any example document to see instant results on different document types.
598
+ """)
599
+
600
+ # Add examples; clicking a row will also show a small input preview
601
+ with gr.Row():
602
+ gr.Examples(
603
+ examples=[
604
+ ["sample/Screenshot 2025-10-13 114010.png", "Fast", True, True, False, 0.05],
605
+ ["sample/Screenshot 2025-10-13 114606.png", "Fast", True, True, False, 0.05],
606
+ ["sample/Screenshot 2025-10-15 191615.png", "Fast", True, True, False, 0.05],
607
+ ],
608
+ inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox, run_sig_chk, sig_conf_slider],
609
+ outputs=[input_preview, visualization_output, summary_output, markdown_output, json_output],
610
+ fn=analyze_with_preview,
611
+ cache_examples=False,
612
+ label="πŸ“š Example Documents",
613
+ examples_per_page=3
614
+ )
615
+
616
+ # Connect the button
617
+ process_btn.click(
618
+ fn=gradio_interface,
619
+ inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox, run_sig_chk, sig_conf_slider],
620
+ outputs=[visualization_output, summary_output, markdown_output, json_output]
621
  )
622
 
623
+ # Preview on file selection
624
+ file_input.change(
625
+ fn=preview_first_page,
626
+ inputs=[file_input],
627
+ outputs=[input_preview]
 
 
 
 
 
 
 
628
  )
629
+
630
+ # Auto-process on file upload (optional)
631
+ file_input.change(
632
+ fn=gradio_interface,
633
+ inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox, run_sig_chk, sig_conf_slider],
634
+ outputs=[visualization_output, summary_output, markdown_output, json_output]
635
+ )
636
+
637
+ with gr.Tab("✍️ Signature Detection (Only)"):
638
+ gr.Markdown("""
639
+ Run the finetuned signature model on an image or the first page of a PDF. Simple controls, no ROI.
640
+ """)
641
+ with gr.Row():
642
+ with gr.Column(scale=1):
643
+ sig_file_input = gr.File(
644
+ label="Upload Image or PDF (first page processed)",
645
+ file_types=[".pdf", ".jpg", ".jpeg", ".png", ".tiff", ".bmp"]
646
+ )
647
+ sig_input_preview = gr.Image(label="Input Preview", type="filepath", height=240, interactive=False, show_label=True)
648
+ try_scales = gr.Checkbox(label="Try multiscale (1.0, 1.5, 2.0)", value=True)
649
+ sig_only_conf = gr.Slider(0.01, 0.5, value=0.03, step=0.01, label="Confidence")
650
+ sig_only_iou = gr.Slider(0.1, 0.9, value=0.45, step=0.05, label="IoU")
651
+ sig_only_aug = gr.Checkbox(label="Augment (slower, more recall)", value=True)
652
+ sig_run_btn = gr.Button("πŸ”Ž Detect Signatures", variant="primary")
653
+ with gr.Column(scale=2):
654
+ sig_only_image = gr.Image(label="Annotated Signatures")
655
+ sig_only_summary = gr.Markdown(label="Signature Summary")
656
+ sig_only_json = gr.Code(label="Detections JSON", language="json", lines=16)
657
+
658
+ gr.Examples(
659
+ examples=[["sample_signature/X_074.jpeg"], ["sample_signature/X_014.jpeg"], ["sample_signature/X_081.jpeg"]],
660
+ inputs=[sig_file_input, try_scales, sig_only_conf, sig_only_iou, sig_only_aug],
661
+ outputs=[sig_input_preview, sig_only_image, sig_only_summary, sig_only_json],
662
+ fn=signature_only_with_preview,
663
+ label="✍️ Signature Examples"
664
+ )
665
+
666
+ # Wire signature-only button
667
+ sig_run_btn.click(
668
+ fn=signature_only_infer,
669
+ inputs=[sig_file_input, try_scales, sig_only_conf, sig_only_iou, sig_only_aug],
670
+ outputs=[sig_only_image, sig_only_summary, sig_only_json]
671
+ )
672
+
673
+ # Preview for signature-only selection
674
+ sig_file_input.change(
675
+ fn=preview_first_page,
676
+ inputs=[sig_file_input],
677
+ outputs=[sig_input_preview]
678
  )
679
 
680
  gr.Markdown("""
 
698
  Click on any example document to see instant results on different document types.
699
  """)
700
 
701
+ # Events are now scoped within tabs above
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
702
 
703
  # Launch the app
704
  if __name__ == "__main__":
705
+ # Queue with up to 2 concurrent workers (fits Spaces CPU with 2 cores)
706
+ # Optional: pre-load signature model to reduce first-run latency (requires HF access)
707
+ try:
708
+ load_signature_model()
709
+ except Exception:
710
+ pass
711
+ # Gradio v5 uses default_concurrency_limit; fallback to concurrency_count for older versions
712
+ try:
713
+ demo.queue(default_concurrency_limit=2)
714
+ except TypeError:
715
+ demo.queue(concurrency_count=2)
716
  demo.launch()
requirements.txt CHANGED
@@ -7,3 +7,8 @@ torchvision
7
  docling>=2.0
8
  gradio>=5.0
9
  pymupdf>=1.24
 
 
 
 
 
 
7
  docling>=2.0
8
  gradio>=5.0
9
  pymupdf>=1.24
10
+ ultralytics>=8.3
11
+ supervision>=0.24
12
+ huggingface_hub>=0.23
13
+ opencv-python-headless>=4.10
14
+ onnxruntime>=1.20
sample_signature/X_014.jpeg ADDED

Git LFS Details

  • SHA256: 596df710b942fe868a87eaa16caaaf7be271c11417916bc0ecd6ae724606d2ed
  • Pointer size: 131 Bytes
  • Size of remote file: 479 kB
sample_signature/X_074.jpeg ADDED

Git LFS Details

  • SHA256: 97600eb4a0727646891b2202416dfe18a94d799946da5cbadfcc28ed7f9c7623
  • Pointer size: 131 Bytes
  • Size of remote file: 966 kB
sample_signature/X_081.jpeg ADDED

Git LFS Details

  • SHA256: 49f0f0647b58def20e0b44c432ec5af2d47ef7e76a8e59bf4938addc8ecb2fe1
  • Pointer size: 131 Bytes
  • Size of remote file: 830 kB