Spaces:

Ayaan-Sharif
/

ocr-layout-detection-poc

Running

Ayaan Sharif commited on Nov 2

Commit

9434a85

1 Parent(s): 1d76058

Add signature detection with finetuned model and UI improvements

- Integrate tech4humans/yolov8s-signature-detector
- Add signature overlay in Analyze tab
- Add dedicated Signature Detection tab
- Add input preview images (240px fixed height)
- Enforce RapidOCR ONNX backend
- Reorganize UI with top-level tabs
- Add sample_signature/ folder with 3 examples
- Update README with deployment instructions

Files changed (6) hide show

README.md +60 -6
app.py +396 -72
requirements.txt +5 -0
sample_signature/X_014.jpeg +3 -0
sample_signature/X_074.jpeg +3 -0
sample_signature/X_081.jpeg +3 -0

README.md CHANGED Viewed

@@ -10,9 +10,9 @@ pinned: false
 license: mit
 ---
-# 📄 Document Layout & Table Structure Detection
-A powerful AI-powered tool for automatically detecting document layout and structure.
 ## 🎯 What Does This Do?
@@ -23,6 +23,7 @@ This Space automatically analyzes your documents (PDFs, images, scanned document
 - 🖼️ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
 - 📝 **Export Formats**: Provides Markdown, JSON, and visual outputs
 - 🔍 **OCR Support**: Automatically processes scanned documents and images
 ## 🚀 How to Use
@@ -53,6 +54,7 @@ This Space uses state-of-the-art AI models:
 - **Table Structure Model**: TableFormer architecture for table detection and extraction
 - **OCR Engine**: Integrated OCR for text recognition in scanned documents
 - **Framework**: Modern document processing pipeline
 ## 🎨 Output Formats
@@ -82,28 +84,80 @@ This tool offers:
 - Exports to various formats (Markdown, JSON)
 - Fast and accurate processing modes
 ## 🧪 Local Testing
-Want to test locally? Check out `test_local.py` in this repository.
 ```bash
 # Install dependencies
 pip install -r requirements.txt
 # Run the app locally
 python app.py
-# Or test on a specific file
-python test_local.py path/to/your/document.pdf
 ```
 ## 🤝 Contributing
 Found a bug or have a suggestion? Feel free to open an issue or contribute!
 ## 📝 License
-MIT License - Feel free to use and modify for your projects.
 ---

 license: mit
 ---
+# 📄 Document Layout, Table Structure & Signature Detection
+A powerful AI-powered tool for automatically detecting document layout and structure, with an optional specialized handwritten signature detector.
 ## 🎯 What Does This Do?
 - 🖼️ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
 - 📝 **Export Formats**: Provides Markdown, JSON, and visual outputs
 - 🔍 **OCR Support**: Automatically processes scanned documents and images
+ - ✍️ **Signature Detection**: Uses a fine-tuned YOLOv8s model to find handwritten signatures (overlay on layout view or run as a dedicated tool)
 ## 🚀 How to Use
 - **Table Structure Model**: TableFormer architecture for table detection and extraction
 - **OCR Engine**: Integrated OCR for text recognition in scanned documents
 - **Framework**: Modern document processing pipeline
+- **Signature Model (Optional)**: Finetuned signature detector (tech4humans/yolov8s-signature-detector) from Hugging Face
 ## 🎨 Output Formats
 - Exports to various formats (Markdown, JSON)
 - Fast and accurate processing modes
+## 🚀 Deployment on Hugging Face Spaces
+This app is ready to deploy on Hugging Face Spaces!
+### Setup HF_TOKEN Secret
+The signature detector model is gated and requires authentication:
+1. Go to your Space settings: `Settings` → `Repository secrets`
+2. Add a new secret:
+   - **Name**: `HF_TOKEN`
+   - **Value**: Your Hugging Face token (get it from https://huggingface.co/settings/tokens)
+3. Click `Add Secret`
+The app will automatically use this token to download the signature model on startup.
+### Requirements
+- SDK: Gradio 5.x
+- Python: 3.11+
+- Hardware: CPU (2 cores, 18GB RAM on Spaces)
+- Runtime: ~2-3 minutes first load (model downloads), then ~1-3s per inference
+All dependencies are in `requirements.txt` and will be installed automatically.
 ## 🧪 Local Testing
+Want to test locally?
 ```bash
 # Install dependencies
 pip install -r requirements.txt
+# Set HF token (if signature model is gated)
+export HF_TOKEN=hf_xxx
 # Run the app locally
 python app.py
+```
+### Test Scripts
+```bash
+# Test signature detection only
+python test_signature.py
+# Test full document analysis
+python test_analyze.py
 ```
+### Signature Detector Notes
+- The signature model weights are hosted on Hugging Face (`tech4humans/yolov8s-signature-detector`)
+- CPU inference is supported; no GPU required
+- The app queues up to 2 concurrent jobs to align with Spaces CPU (2 cores)
+- First run downloads ~12MB model checkpoint
+## 📸 Examples
+Signature-only examples live under `sample_signature/`. Try them in the "Signature Detection (Only)" tab.
+### OCR Engine
+- This app uses RapidOCR with the ONNX Runtime backend by default when OCR is enabled, for fast and accurate CPU inference.
+- If ONNXRuntime is missing, Docling may fall back to other engines; this repo includes `onnxruntime` in `requirements.txt` and configures `RapidOcrOptions(backend="onnxruntime")` to enforce the preferred engine.
 ## 🤝 Contributing
 Found a bug or have a suggestion? Feel free to open an issue or contribute!
 ## 📝 License
+- App code: MIT License
+- Signature weights: AGPL-3.0 (see the model card on Hugging Face). Using the model in a network service may require making corresponding source available per AGPL.
 ---

app.py CHANGED Viewed

@@ -1,11 +1,27 @@
 import gradio as gr
 from docling.document_converter import DocumentConverter
 from docling.datamodel.base_models import InputFormat
-from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode
 from docling.document_converter import PdfFormatOption
 from PIL import Image, ImageDraw, ImageFont
 import json
 import fitz  # PyMuPDF
 # Color mapping for different layout elements
 COLORS = {
@@ -37,6 +53,91 @@ COLORS = {
     "other": "#CCCCCC",
 }
 def draw_layout_boxes(image_path, layout_data, scale_x=1.0, scale_y=1.0):
     """Draw bounding boxes on the image based on layout predictions"""
     # Open the image
@@ -92,7 +193,7 @@ def draw_layout_boxes(image_path, layout_data, scale_x=1.0, scale_y=1.0):
     return img
-def process_document(file_path, mode, enable_ocr, enable_tables):
     """Process document with Docling and return results"""
     try:
         # Configure pipeline options
@@ -106,6 +207,12 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
                 pipeline_options.table_structure_options.mode = TableFormerMode.FAST
         pipeline_options.do_ocr = enable_ocr
         pipeline_options.generate_page_images = True
         pipeline_options.generate_picture_images = True
         pipeline_options.do_picture_classification = True  # Enable classification
@@ -198,6 +305,7 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
         # Create visualization for first page
         visualization = None
         if result.pages and layout_info:
             # Draw boxes on first page only
             first_page_layout = [item for item in layout_info if item["page"] == 1]
@@ -210,6 +318,7 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
                     # For images: Open directly, coordinates should match 1:1
                     first_page_image = Image.open(file_path).convert("RGB")
                     # No scaling needed for images - coordinates are already in pixels
                     visualization = draw_layout_boxes(first_page_image, first_page_layout,
                                                      scale_x=1.0, scale_y=1.0)
                 else:
@@ -234,6 +343,7 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
                     doc.close()
                     # Draw boxes with calculated scale
                     visualization = draw_layout_boxes(first_page_image, first_page_layout,
                                                      scale_x=scale_x, scale_y=scale_y)
@@ -241,6 +351,25 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
                 print(f"Could not create visualization: {e}")
                 import traceback
                 traceback.print_exc()
         # Create summary
         summary = f"""## Document Analysis Summary
@@ -269,12 +398,117 @@ def process_document(file_path, mode, enable_ocr, enable_tables):
         error_msg = f"Error processing document: {str(e)}"
         return None, error_msg, error_msg, error_msg
-def gradio_interface(file, mode, enable_ocr, enable_tables):
     """Gradio interface function"""
     if file is None:
         return None, "Please upload a document", "", ""
-    return process_document(file.name, mode, enable_ocr, enable_tables)
 # Create Gradio interface
 with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as demo:
@@ -289,51 +523,158 @@ with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as dem
     - **OCR Support**: Reads text from scanned documents and images
     """)
-    with gr.Row():
-        with gr.Column(scale=1):
-            file_input = gr.File(
-                label="Upload Document",
-                file_types=[".pdf", ".jpg", ".jpeg", ".png", ".tiff", ".bmp"]
-            )
-            mode_dropdown = gr.Dropdown(
-                choices=["Fast", "Accurate"],
-                value="Fast",
-                label="Processing Mode",
-                info="Accurate mode is slower but better for complex tables"
-            )
-            ocr_checkbox = gr.Checkbox(
-                label="Enable OCR",
-                value=True,
-                info="Use OCR for scanned documents and images"
-            )
-            tables_checkbox = gr.Checkbox(
-                label="Enable Table Detection",
-                value=True,
-                info="Detect and extract table structures"
             )
-            process_btn = gr.Button("🚀 Process Document", variant="primary", size="lg")
-        with gr.Column(scale=2):
-            visualization_output = gr.Image(label="Layout Visualization (First Page)")
-            summary_output = gr.Markdown(label="Summary")
-    with gr.Tabs():
-        with gr.Tab("📝 Markdown Output"):
-            markdown_output = gr.Textbox(
-                label="Extracted Content (Markdown)",
-                lines=20,
-                max_lines=30
             )
-        with gr.Tab("🔧 JSON Layout Data"):
-            json_output = gr.Code(
-                label="Layout Predictions (JSON)",
-                language="json",
-                lines=20
             )
     gr.Markdown("""
@@ -357,36 +698,19 @@ with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as dem
     Click on any example document to see instant results on different document types.
     """)
-    # Add examples with image previews
-    with gr.Row():
-        gr.Examples(
-            examples=[
-                ["sample/Screenshot 2025-10-13 114010.png", "Fast", True, True],
-                ["sample/Screenshot 2025-10-13 114606.png", "Fast", True, True],
-                ["sample/Screenshot 2025-10-15 191615.png", "Fast", True, True],
-            ],
-            inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
-            outputs=[visualization_output, summary_output, markdown_output, json_output],
-            fn=gradio_interface,
-            cache_examples=False,
-            label="📚 Example Documents",
-            examples_per_page=3
-        )
-    # Connect the button
-    process_btn.click(
-        fn=gradio_interface,
-        inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
-        outputs=[visualization_output, summary_output, markdown_output, json_output]
-    )
-    # Auto-process on file upload (optional)
-    file_input.change(
-        fn=gradio_interface,
-        inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
-        outputs=[visualization_output, summary_output, markdown_output, json_output]
-    )
 # Launch the app
 if __name__ == "__main__":
     demo.launch()

 import gradio as gr
 from docling.document_converter import DocumentConverter
 from docling.datamodel.base_models import InputFormat
+from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode, RapidOcrOptions
 from docling.document_converter import PdfFormatOption
 from PIL import Image, ImageDraw, ImageFont
 import json
 import fitz  # PyMuPDF
+import os
+from dotenv import load_dotenv
+import io
+import numpy as np
+import cv2
+from typing import List, Tuple, Optional
+# Optional imports for signature detection
+try:
+    import supervision as sv
+    from ultralytics import YOLO
+    from huggingface_hub import hf_hub_download
+except Exception:
+    sv = None
+    YOLO = None
+    hf_hub_download = None
 # Color mapping for different layout elements
 COLORS = {
     "other": "#CCCCCC",
 }
+# Load environment variables from .env if present (useful for HF_TOKEN)
+try:
+    load_dotenv()
+except Exception:
+    pass
+# ------------- Signature Model Utilities -------------
+_SIGNATURE_MODEL = None
+def load_signature_model() -> Optional["YOLO"]:
+    """Load and cache the YOLOv8s signature model from Hugging Face.
+    Returns None if dependencies are missing.
+    """
+    global _SIGNATURE_MODEL
+    if _SIGNATURE_MODEL is not None:
+        return _SIGNATURE_MODEL
+    if YOLO is None or hf_hub_download is None:
+        return None
+    try:
+        # Use token from env if model is gated
+        model_path = hf_hub_download(
+            repo_id="tech4humans/yolov8s-signature-detector",
+            filename="yolov8s.pt",
+            token=os.environ.get("HF_TOKEN")
+        )
+        _SIGNATURE_MODEL = YOLO(model_path)
+        return _SIGNATURE_MODEL
+    except Exception as e:
+        print(f"Could not load signature model: {e}")
+        return None
+def yolo_detect_signatures(
+    image_bgr: np.ndarray,
+    imgsz: int = 1280,
+    conf: float = 0.05,
+    iou: float = 0.45,
+    augment: bool = True,
+) -> List[Tuple[np.ndarray, float, int]]:
+    """Run YOLO signature detection on a BGR image.
+    Returns list of (xyxy np.array[4], score float, class_idx int)
+    """
+    model = load_signature_model()
+    if model is None:
+        return []
+    try:
+        results = model(image_bgr, imgsz=imgsz, conf=conf, iou=iou, augment=augment)
+        r = results[0]
+        boxes = []
+        if hasattr(r, "boxes") and r.boxes is not None:
+            xyxy = r.boxes.xyxy.cpu().numpy()
+            scores = r.boxes.conf.cpu().numpy()
+            classes = r.boxes.cls.cpu().numpy().astype(int)
+            for b, s, c in zip(xyxy, scores, classes):
+                boxes.append((b, float(s), int(c)))
+        return boxes
+    except Exception as e:
+        print(f"YOLO detection failed: {e}")
+        return []
+def annotate_signature_boxes_on_pil(img_pil: Image.Image, boxes: List[Tuple[np.ndarray, float, int]]) -> Image.Image:
+    """Draw signature boxes on a PIL image and return annotated copy."""
+    if not boxes:
+        return img_pil
+    img = img_pil.copy()
+    draw = ImageDraw.Draw(img)
+    # Try fonts
+    try:
+        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 16)
+    except Exception:
+        font = ImageFont.load_default()
+    color = COLORS.get("signature", "#9D4EDD")
+    for (xyxy, score, cls) in boxes:
+        x1, y1, x2, y2 = map(int, xyxy)
+        draw.rectangle([x1, y1, x2, y2], outline=color, width=3)
+        label = f"Signature {score*100:.0f}%"
+        bbox_text = draw.textbbox((x1, y1 - 22), label, font=font)
+        draw.rectangle([bbox_text[0] - 2, bbox_text[1] - 2, bbox_text[2] + 2, bbox_text[3] + 2], fill=color)
+        draw.text((x1, y1 - 22), label, fill="white", font=font)
+    return img
 def draw_layout_boxes(image_path, layout_data, scale_x=1.0, scale_y=1.0):
     """Draw bounding boxes on the image based on layout predictions"""
     # Open the image
     return img
+def process_document(file_path, mode, enable_ocr, enable_tables, run_signature_yolo=False, signature_conf=0.05):
     """Process document with Docling and return results"""
     try:
         # Configure pipeline options
                 pipeline_options.table_structure_options.mode = TableFormerMode.FAST
         pipeline_options.do_ocr = enable_ocr
+        if enable_ocr:
+            # Force RapidOCR with ONNX backend for fast & accurate CPU inference
+            pipeline_options.ocr_options = RapidOcrOptions(
+                backend="onnxruntime",
+                force_full_page_ocr=True,
+            )
         pipeline_options.generate_page_images = True
         pipeline_options.generate_picture_images = True
         pipeline_options.do_picture_classification = True  # Enable classification
         # Create visualization for first page
         visualization = None
+        first_page_base_image = None  # PIL image in pixel space used for overlays
         if result.pages and layout_info:
             # Draw boxes on first page only
             first_page_layout = [item for item in layout_info if item["page"] == 1]
                     # For images: Open directly, coordinates should match 1:1
                     first_page_image = Image.open(file_path).convert("RGB")
                     # No scaling needed for images - coordinates are already in pixels
+                    first_page_base_image = first_page_image
                     visualization = draw_layout_boxes(first_page_image, first_page_layout,
                                                      scale_x=1.0, scale_y=1.0)
                 else:
                     doc.close()
+                    first_page_base_image = first_page_image
                     # Draw boxes with calculated scale
                     visualization = draw_layout_boxes(first_page_image, first_page_layout,
                                                      scale_x=scale_x, scale_y=scale_y)
                 print(f"Could not create visualization: {e}")
                 import traceback
                 traceback.print_exc()
+        # Optionally run YOLO signature detection on the same first-page image and overlay
+        if run_signature_yolo and first_page_base_image is not None:
+            try:
+                # Convert PIL RGB to BGR numpy for YOLO
+                img_bgr = cv2.cvtColor(np.array(first_page_base_image), cv2.COLOR_RGB2BGR)
+                sig_boxes = yolo_detect_signatures(
+                    img_bgr,
+                    imgsz=1280,
+                    conf=float(signature_conf),
+                    iou=0.45,
+                    augment=True,
+                )
+                if sig_boxes:
+                    # Overlay signature boxes on top of visualization
+                    base_for_overlay = visualization if visualization is not None else first_page_base_image
+                    visualization = annotate_signature_boxes_on_pil(base_for_overlay, sig_boxes)
+            except Exception as e:
+                print(f"Signature overlay failed: {e}")
         # Create summary
         summary = f"""## Document Analysis Summary
         error_msg = f"Error processing document: {str(e)}"
         return None, error_msg, error_msg, error_msg
+def gradio_interface(file, mode, enable_ocr, enable_tables, run_signature_yolo=False, signature_conf=0.05):
     """Gradio interface function"""
     if file is None:
         return None, "Please upload a document", "", ""
+    return process_document(file.name, mode, enable_ocr, enable_tables, run_signature_yolo, signature_conf)
+# -------- Small preview helper (first page / image) --------
+def preview_first_page(file: gr.File):
+    """Return filepath for preview. For PDFs, extract first page as temp image."""
+    if file is None:
+        return None
+    try:
+        path = file.name
+        ext = (os.path.splitext(path)[1] or "").lower()
+        if ext in (".pdf",):
+            # For PDF, render first page to temp image
+            import tempfile
+            doc = fitz.open(path)
+            page = doc[0]
+            pix = page.get_pixmap(matrix=fitz.Matrix(1.5, 1.5))
+            img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
+            doc.close()
+            tmp = tempfile.NamedTemporaryFile(delete=False, suffix=".png")
+            img.save(tmp.name)
+            return tmp.name
+        else:
+            # For images, return path directly
+            return path
+    except Exception:
+        return None
+def analyze_with_preview(file, mode, enable_ocr, enable_tables, run_signature_yolo=False, signature_conf=0.05):
+    """Wrapper to also return an input preview for Examples clicks."""
+    preview = preview_first_page(file)
+    vis, summ, md, js = gradio_interface(file, mode, enable_ocr, enable_tables, run_signature_yolo, signature_conf)
+    return preview, vis, summ, md, js
+def signature_only_with_preview(file, try_scales, conf, iou, augment):
+    """Wrapper to also return an input preview for Examples clicks."""
+    preview = preview_first_page(file)
+    img, summ, js = signature_only_infer(file, try_scales, conf, iou, augment)
+    return preview, img, summ, js
+# -------- Signature-only utilities (full-image, no ROI) --------
+def signature_only_infer(
+    file: gr.File,
+    try_scales: bool,
+    conf: float,
+    iou: float,
+    augment: bool,
+):
+    if file is None:
+        return None, "Upload an image or PDF", "[]"
+    # Load source image (first page for PDFs)
+    path = file.name
+    ext = (os.path.splitext(path)[1] or "").lower()
+    if ext in (".pdf",):
+        doc = fitz.open(path)
+        page = doc[0]
+        pix = page.get_pixmap(matrix=fitz.Matrix(2, 2))
+        base_rgb = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
+        doc.close()
+    else:
+        base_rgb = Image.open(path).convert("RGB")
+    base_bgr = cv2.cvtColor(np.array(base_rgb), cv2.COLOR_RGB2BGR)
+    scales = [1.0, 1.5, 2.0] if try_scales else [1.0]
+    best = None
+    all_boxes_mapped = []
+    rh, rw = base_bgr.shape[:2]
+    for s in scales:
+        tw, th = int(rw * s), int(rh * s)
+        resized = cv2.resize(base_bgr, (tw, th), interpolation=cv2.INTER_CUBIC)
+        boxes = yolo_detect_signatures(resized, imgsz=1280, conf=conf, iou=iou, augment=augment)
+        if not boxes:
+            continue
+        sx, sy = rw / max(1, tw), rh / max(1, th)
+        for (xyxy, score, cls) in boxes:
+            xb1, yb1, xb2, yb2 = xyxy
+            # Map back to original image coords
+            x1o = xb1 * sx
+            y1o = yb1 * sy
+            x2o = xb2 * sx
+            y2o = yb2 * sy
+            mapped = (np.array([x1o, y1o, x2o, y2o]), float(score), int(cls))
+            all_boxes_mapped.append(mapped)
+            if best is None or score > best[1]:
+                best = mapped
+    # Annotate and prepare outputs
+    annotated = annotate_signature_boxes_on_pil(base_rgb, all_boxes_mapped)
+    det_json = [
+        {
+            "bbox": list(map(lambda v: float(v), xyxy.tolist() if hasattr(xyxy, "tolist") else list(xyxy))),
+            "score": float(score),
+            "class": int(cls)
+        }
+        for (xyxy, score, cls) in all_boxes_mapped
+    ]
+    summary = (
+        f"Detections: {len(all_boxes_mapped)}" +
+        (f" | Best score: {best[1]:.3f}" if best else " | No detections above threshold")
+    )
+    return annotated, summary, json.dumps(det_json, indent=2)
 # Create Gradio interface
 with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as demo:
     - **OCR Support**: Reads text from scanned documents and images
     """)
+    # Top-level tabs: Analyze and Signature Detection
+    with gr.Tabs() as top_tabs:
+        with gr.Tab("📄 Analyze"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    file_input = gr.File(
+                        label="Upload Document",
+                        file_types=[".pdf", ".jpg", ".jpeg", ".png", ".tiff", ".bmp"]
+                    )
+                    input_preview = gr.Image(label="Input Preview", type="filepath", height=240, interactive=False, show_label=True)
+                    mode_dropdown = gr.Dropdown(
+                        choices=["Fast", "Accurate"],
+                        value="Fast",
+                        label="Processing Mode",
+                        info="Accurate mode is slower but better for complex tables"
+                    )
+                    ocr_checkbox = gr.Checkbox(
+                        label="Enable OCR",
+                        value=True,
+                        info="Use OCR for scanned documents and images"
+                    )
+                    tables_checkbox = gr.Checkbox(
+                        label="Enable Table Detection",
+                        value=True,
+                        info="Detect and extract table structures"
+                    )
+                    process_btn = gr.Button("🚀 Process Document", variant="primary", size="lg")
+                    run_sig_chk = gr.Checkbox(label="Also detect signatures (Finetuned Signature Model)", value=False)
+                    sig_conf_slider = gr.Slider(minimum=0.01, maximum=0.5, step=0.01, value=0.05, label="Signature confidence")
+                with gr.Column(scale=2):
+                    visualization_output = gr.Image(label="Layout Visualization (First Page)")
+                    summary_output = gr.Markdown(label="Summary")
+            with gr.Tabs():
+                with gr.Tab("📝 Markdown Output"):
+                    markdown_output = gr.Textbox(
+                        label="Extracted Content (Markdown)",
+                        lines=20,
+                        max_lines=30
+                    )
+                with gr.Tab("🔧 JSON Layout Data"):
+                    json_output = gr.Code(
+                        label="Layout Predictions (JSON)",
+                        language="json",
+                        lines=20
+                    )
+            gr.Markdown("""
+            ### Legend
+            Different colors represent different document elements:
+            **Layout Elements:**
+            - 🔴 Title • 🔵 Text • 🟢 Section Header • 🟠 Table • 🟣 List/Figure/Formula
+            **Picture Classifications (AI-detected):**
+            - 🟣 Signature • 🟢 QR Code • 🟢 Barcode • 🟡 Logo • 🔴 Stamp
+            - 🟦 Charts (Bar/Pie/Line) • 🟣 Flow Chart • 🟠 Screenshot • ⚪ Other
+            ### How to Use
+            1. Upload your document (PDF or image of ID card, invoice, report, etc.)
+            2. Choose processing options (Fast mode recommended for quick results)
+            3. Click "Process Document"
+            4. View the visualization with bounding boxes and explore the outputs
+            ### 💡 Try Examples Below!
+            Click on any example document to see instant results on different document types.
+            """)
+            # Add examples; clicking a row will also show a small input preview
+            with gr.Row():
+                gr.Examples(
+                    examples=[
+                        ["sample/Screenshot 2025-10-13 114010.png", "Fast", True, True, False, 0.05],
+                        ["sample/Screenshot 2025-10-13 114606.png", "Fast", True, True, False, 0.05],
+                        ["sample/Screenshot 2025-10-15 191615.png", "Fast", True, True, False, 0.05],
+                    ],
+                    inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox, run_sig_chk, sig_conf_slider],
+                    outputs=[input_preview, visualization_output, summary_output, markdown_output, json_output],
+                    fn=analyze_with_preview,
+                    cache_examples=False,
+                    label="📚 Example Documents",
+                    examples_per_page=3
+                )
+            # Connect the button
+            process_btn.click(
+                fn=gradio_interface,
+                inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox, run_sig_chk, sig_conf_slider],
+                outputs=[visualization_output, summary_output, markdown_output, json_output]
             )
+            # Preview on file selection
+            file_input.change(
+                fn=preview_first_page,
+                inputs=[file_input],
+                outputs=[input_preview]
             )
+            # Auto-process on file upload (optional)
+            file_input.change(
+                fn=gradio_interface,
+                inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox, run_sig_chk, sig_conf_slider],
+                outputs=[visualization_output, summary_output, markdown_output, json_output]
+            )
+        with gr.Tab("✍️ Signature Detection (Only)"):
+            gr.Markdown("""
+            Run the finetuned signature model on an image or the first page of a PDF. Simple controls, no ROI.
+            """)
+            with gr.Row():
+                with gr.Column(scale=1):
+                    sig_file_input = gr.File(
+                        label="Upload Image or PDF (first page processed)",
+                        file_types=[".pdf", ".jpg", ".jpeg", ".png", ".tiff", ".bmp"]
+                    )
+                    sig_input_preview = gr.Image(label="Input Preview", type="filepath", height=240, interactive=False, show_label=True)
+                    try_scales = gr.Checkbox(label="Try multiscale (1.0, 1.5, 2.0)", value=True)
+                    sig_only_conf = gr.Slider(0.01, 0.5, value=0.03, step=0.01, label="Confidence")
+                    sig_only_iou = gr.Slider(0.1, 0.9, value=0.45, step=0.05, label="IoU")
+                    sig_only_aug = gr.Checkbox(label="Augment (slower, more recall)", value=True)
+                    sig_run_btn = gr.Button("🔎 Detect Signatures", variant="primary")
+                with gr.Column(scale=2):
+                    sig_only_image = gr.Image(label="Annotated Signatures")
+                    sig_only_summary = gr.Markdown(label="Signature Summary")
+                    sig_only_json = gr.Code(label="Detections JSON", language="json", lines=16)
+            gr.Examples(
+                examples=[["sample_signature/X_074.jpeg"], ["sample_signature/X_014.jpeg"], ["sample_signature/X_081.jpeg"]],
+                inputs=[sig_file_input, try_scales, sig_only_conf, sig_only_iou, sig_only_aug],
+                outputs=[sig_input_preview, sig_only_image, sig_only_summary, sig_only_json],
+                fn=signature_only_with_preview,
+                label="✍️ Signature Examples"
+            )
+            # Wire signature-only button
+            sig_run_btn.click(
+                fn=signature_only_infer,
+                inputs=[sig_file_input, try_scales, sig_only_conf, sig_only_iou, sig_only_aug],
+                outputs=[sig_only_image, sig_only_summary, sig_only_json]
+            )
+            # Preview for signature-only selection
+            sig_file_input.change(
+                fn=preview_first_page,
+                inputs=[sig_file_input],
+                outputs=[sig_input_preview]
             )
     gr.Markdown("""
     Click on any example document to see instant results on different document types.
     """)
+    # Events are now scoped within tabs above
 # Launch the app
 if __name__ == "__main__":
+    # Queue with up to 2 concurrent workers (fits Spaces CPU with 2 cores)
+    # Optional: pre-load signature model to reduce first-run latency (requires HF access)
+    try:
+        load_signature_model()
+    except Exception:
+        pass
+    # Gradio v5 uses default_concurrency_limit; fallback to concurrency_count for older versions
+    try:
+        demo.queue(default_concurrency_limit=2)
+    except TypeError:
+        demo.queue(concurrency_count=2)
     demo.launch()

requirements.txt CHANGED Viewed

@@ -7,3 +7,8 @@ torchvision
 docling>=2.0
 gradio>=5.0
 pymupdf>=1.24

 docling>=2.0
 gradio>=5.0
 pymupdf>=1.24
+ultralytics>=8.3
+supervision>=0.24
+huggingface_hub>=0.23
+opencv-python-headless>=4.10
+onnxruntime>=1.20

sample_signature/X_014.jpeg ADDED Viewed

Git LFS Details

SHA256: 596df710b942fe868a87eaa16caaaf7be271c11417916bc0ecd6ae724606d2ed
Pointer size: 131 Bytes
Size of remote file: 479 kB

sample_signature/X_074.jpeg ADDED Viewed

Git LFS Details

SHA256: 97600eb4a0727646891b2202416dfe18a94d799946da5cbadfcc28ed7f9c7623
Pointer size: 131 Bytes
Size of remote file: 966 kB

sample_signature/X_081.jpeg ADDED Viewed

Git LFS Details

SHA256: 49f0f0647b58def20e0b44c432ec5af2d47ef7e76a8e59bf4938addc8ecb2fe1
Pointer size: 131 Bytes
Size of remote file: 830 kB