Spaces:

alfonsovelp
/

deepseek-ocr

Sleeping

App Files Files Community

Alfonso Velasco commited on Oct 23, 2025

Commit

62e6da6

1 Parent(s): 50abf16

fix results

Browse files

Files changed (2) hide show

TABLE_EXTRACTION_GUIDE.md +179 -0
app.py +7 -3

TABLE_EXTRACTION_GUIDE.md ADDED Viewed

	@@ -0,0 +1,179 @@

+# Table Extraction Guide for Engineering Drawings
+## The Problem
+When extracting tables from engineering drawings using DeepSeek-OCR, you may notice that the HTML table output contains many empty `<td></td>` cells and complex `rowspan`/`colspan` attributes. This makes the data difficult to use programmatically.
+### Why This Happens
+Engineering drawings have:
+- **Complex merged cells** with irregular boundaries
+- **Non-standard table structures** (not typical rows/columns)
+- **Small text** that's hard to OCR accurately
+- **Visual elements** mixed with text
+- **Rotated or angled text**
+DeepSeek-OCR tries to preserve the exact visual layout in HTML, resulting in structure without useful content.
+## Solutions
+### Option 1: Use Image Patches (Recommended)
+The API already extracts table regions as cropped images. This is the most reliable approach for complex drawings:
+```python
+import requests
+import base64
+from PIL import Image
+import io
+# Call the API
+response = requests.post('http://localhost:7860/extract', json={
+    'image': base64_image,
+    'layout_only': False  # or True for just bounding boxes
+})
+data = response.json()
+# Get table patches (cropped images of each table)
+table_patches = data['table_patches']
+for i, patch in enumerate(table_patches):
+    # Each patch contains:
+    # - bbox: {x1, y1, x2, y2, width, height}
+    # - data: base64-encoded image of the table
+    # - text_preview: HTML (often not useful for complex tables)
+    # Decode and save the table image
+    table_img_data = base64.b64decode(patch['data'])
+    table_img = Image.open(io.BytesIO(table_img_data))
+    table_img.save(f'table_{i}.png')
+    print(f"Table {i}: {patch['bbox']}")
+```
+**Benefits:**
+- Preserves all visual information
+- Can be manually reviewed
+- Can be processed with specialized table extraction tools
+- No loss of information
+### Option 2: Use Text-Only Mode (New)
+I've added a new `extract_mode` parameter that simplifies extraction for cases where you just want text without HTML structure:
+```python
+response = requests.post('http://localhost:7860/extract', json={
+    'image': base64_image,
+    'extract_mode': 'text_only'  # Simplifies table extraction
+})
+data = response.json()
+# The extractions will contain plain text instead of complex HTML
+for extraction in data['extractions']:
+    if extraction['type'] == 'table':
+        print(f"Table text: {extraction['text']}")
+        # Text will be simpler, without HTML tags
+```
+### Option 3: Use Layout-Only Mode
+If you only need to know **where** tables are (not their content), use layout-only mode:
+```python
+response = requests.post('http://localhost:7860/extract', json={
+    'image': base64_image,
+    'layout_only': True  # Just get bounding boxes
+})
+data = response.json()
+# Get structured layout information
+layout = data['layout_summary']
+print(f"Found {layout['counts']['tables']} tables")
+for table in layout['elements_by_type']['tables']:
+    print(f"Table at: {table['bbox']}")
+```
+## Extraction Modes
+The API now supports three extraction modes:
+| Mode | Parameter | Use Case |
+|------|-----------|----------|
+| **Full** (default) | `extract_mode: "full"` | Complete extraction with HTML tables |
+| **Text Only** | `extract_mode: "text_only"` | Simplified text extraction without HTML |
+| **Layout Only** | `extract_mode: "layout_only"` or `layout_only: true` | Just bounding boxes, no content |
+## Recommended Workflow for Engineering Drawings
+1. **First pass:** Use `layout_only: true` to identify all tables and their locations
+2. **Extract images:** Use the bounding boxes to crop table regions from the original image
+3. **Process selectively:**
+   - For simple tables: Use `extract_mode: "text_only"`
+   - For complex tables: Keep as images or use specialized table extraction tools
+   - For critical data: Manual review of cropped table images
+## Example: Complete Workflow
+```python
+import requests
+import base64
+from PIL import Image
+import io
+# Step 1: Load and encode image
+with open('engineering_drawing.png', 'rb') as f:
+    image_data = base64.b64encode(f.read()).decode()
+# Step 2: Get layout (identify tables)
+layout_response = requests.post('http://localhost:7860/extract', json={
+    'image': image_data,
+    'layout_only': True
+})
+layout_data = layout_response.json()
+print(f"Found {layout_data['num_tables']} tables")
+# Step 3: Get full extraction with table images
+full_response = requests.post('http://localhost:7860/extract', json={
+    'image': image_data,
+    'extract_mode': 'full'  # or 'text_only' for simpler output
+})
+full_data = full_response.json()
+# Step 4: Save table images for review or further processing
+for i, patch in enumerate(full_data['table_patches']):
+    # Save table image
+    table_img_data = base64.b64decode(patch['data'])
+    table_img = Image.open(io.BytesIO(table_img_data))
+    table_img.save(f'output/table_{i}.png')
+    # Print location
+    bbox = patch['bbox']
+    print(f"Table {i}: ({bbox['x1']}, {bbox['y1']}) to ({bbox['x2']}, {bbox['y2']})")
+```
+## Alternative Tools for Table Extraction
+If you need better table content extraction, consider using the cropped table images with:
+1. **Table Transformer** (Microsoft) - Deep learning model for table structure
+2. **PaddleOCR** - Includes table recognition
+3. **Camelot** or **Tabula** - For PDF-based tables
+4. **Azure Form Recognizer** or **AWS Textract** - Cloud services with advanced table recognition
+5. **Manual labeling** - For critical engineering data
+## Summary
+For engineering drawings:
+- ✅ **Use image patches** (most reliable)
+- ✅ **Use layout-only mode** to find tables
+- ✅ **Use text-only mode** for simpler extraction
+- ❌ **Don't rely on HTML table structure** from complex drawings
+The HTML table output is structurally accurate but often not useful for data extraction due to the complexity of engineering drawings.

app.py CHANGED Viewed

@@ -142,7 +142,7 @@ async def extract_image(request: ImageRequest):
             # Use simpler prompt for layout-only mode
             prompt = request.prompt
             if request.layout_only:
-                prompt = "<image>\n<Identify all objects, table, diagrams, and text and output them in bounding boxes. "
                 print("Using layout-only mode with structured bounding boxes")
             # Capture stdout to get the raw model output with grounding tags
@@ -187,13 +187,15 @@ async def extract_image(request: ImageRequest):
             else:
                 print("Using saved result.mmd file")
-            print(f"Result preview: {result_text[:500] if result_text else 'No results found'}")
             print(f"Result image with boxes: {'Found' if result_image_with_boxes else 'Not found'}")
             print(f"Image patches: {len(image_patches)} patches found")
             # Parse the result
             extractions = parse_deepseek_result(result_text, img_width, img_height)
             # If layout_only mode, simplify the extractions
             if request.layout_only:
                 layout_extractions = simplify_extractions_for_layout(extractions)
@@ -526,7 +528,9 @@ def parse_deepseek_result(result: Any, img_width: int, img_height: int) -> List[
                 }
             else:
                 bbox = {"x1": 0, "y1": 0, "x2": 0, "y2": 0, "width": 0, "height": 0}
-        except (ValueError, IndexError):
             bbox = {"x1": 0, "y1": 0, "x2": 0, "y2": 0, "width": 0, "height": 0}
         # Extract content after this tag until the next tag (or end of string)

             # Use simpler prompt for layout-only mode
             prompt = request.prompt
             if request.layout_only:
+                prompt = "<image>\n<Identify all objects, table, diagrams, and text and output them in bounding boxes.o "
                 print("Using layout-only mode with structured bounding boxes")
             # Capture stdout to get the raw model output with grounding tags
             else:
                 print("Using saved result.mmd file")
+            print(f"Result preview: {result_text if result_text else 'No results found'}")
             print(f"Result image with boxes: {'Found' if result_image_with_boxes else 'Not found'}")
             print(f"Image patches: {len(image_patches)} patches found")
             # Parse the result
             extractions = parse_deepseek_result(result_text, img_width, img_height)
+            print(f"Extractions: {extractions}")
             # If layout_only mode, simplify the extractions
             if request.layout_only:
                 layout_extractions = simplify_extractions_for_layout(extractions)
                 }
             else:
                 bbox = {"x1": 0, "y1": 0, "x2": 0, "y2": 0, "width": 0, "height": 0}
+        except Exception as e:
+            print(f"Error parsing bounding box: {e} for bounding box: {bbox_str} for type {ref_type}")
             bbox = {"x1": 0, "y1": 0, "x2": 0, "y2": 0, "width": 0, "height": 0}
         # Extract content after this tag until the next tag (or end of string)