Ayaan Sharif commited on
Commit
933ba3b
Β·
1 Parent(s): 513efc5

Add AI-powered document layout detection app with examples

Browse files
README.md CHANGED
@@ -1,12 +1,110 @@
1
  ---
2
- title: Ocr Layout Detection Poc
3
- emoji: 😻
4
- colorFrom: indigo
5
- colorTo: pink
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Document Layout Detection
3
+ emoji: πŸ“„
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 5.49.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # πŸ“„ Document Layout & Table Structure Detection
14
+
15
+ A powerful AI-powered tool for automatically detecting document layout and structure.
16
+
17
+ ## 🎯 What Does This Do?
18
+
19
+ This Space automatically analyzes your documents (PDFs, images, scanned documents) to:
20
+
21
+ - 🏷️ **Detect Layout Elements**: Identifies titles, headers, paragraphs, lists, tables, figures, captions, formulas, and more
22
+ - πŸ“Š **Extract Tables**: Recognizes table structures and extracts data
23
+ - πŸ–ΌοΈ **Visual Output**: Shows bounding boxes around detected elements with color-coded labels
24
+ - πŸ“ **Export Formats**: Provides Markdown, JSON, and visual outputs
25
+ - πŸ” **OCR Support**: Automatically processes scanned documents and images
26
+
27
+ ## πŸš€ How to Use
28
+
29
+ 1. **Upload** your document (PDF, JPG, PNG, etc.)
30
+ 2. **Choose** processing mode:
31
+ - **Fast**: Quick processing for simple documents
32
+ - **Accurate**: Better quality for complex tables (slower)
33
+ 3. **Configure** options:
34
+ - Enable/disable OCR
35
+ - Enable/disable table detection
36
+ 4. **Process** and view results!
37
+
38
+ ## πŸ“š Use Cases
39
+
40
+ Perfect for analyzing:
41
+ - πŸ†” **ID Documents**: Aadhaar cards, passports, driver's licenses
42
+ - πŸ“„ **Forms & Applications**: Government forms, surveys, questionnaires
43
+ - 🧾 **Invoices & Receipts**: Business documents with tables
44
+ - πŸ“– **Research Papers**: Academic documents with complex layouts
45
+ - πŸ“Š **Reports**: Annual reports, financial statements
46
+ - πŸ“° **Articles & Documents**: Any structured document
47
+
48
+ ## πŸ› οΈ Technology
49
+
50
+ This Space uses state-of-the-art AI models:
51
+
52
+ - **Layout Model**: Advanced neural networks for document layout analysis
53
+ - **Table Structure Model**: TableFormer architecture for table detection and extraction
54
+ - **OCR Engine**: Integrated OCR for text recognition in scanned documents
55
+ - **Framework**: Modern document processing pipeline
56
+
57
+ ## 🎨 Output Formats
58
+
59
+ ### 1. Visual Visualization
60
+ - Bounding boxes drawn on the document
61
+ - Color-coded by element type
62
+ - Labels showing detected elements
63
+
64
+ ### 2. Markdown Export
65
+ - Clean, structured text output
66
+ - Preserves document hierarchy
67
+ - Ready for further processing
68
+
69
+ ### 3. JSON Data
70
+ - Complete layout predictions
71
+ - Bounding box coordinates
72
+ - Element types and confidence scores
73
+ - Machine-readable format
74
+
75
+ ## 🌟 Features
76
+
77
+ This tool offers:
78
+ - Advanced AI models for layout detection
79
+ - Supports multiple input formats (PDF, images)
80
+ - Accurate table structure extraction
81
+ - Handles both digital and scanned documents
82
+ - Exports to various formats (Markdown, JSON)
83
+ - Fast and accurate processing modes
84
+
85
+ ## πŸ§ͺ Local Testing
86
+
87
+ Want to test locally? Check out `test_local.py` in this repository.
88
+
89
+ ```bash
90
+ # Install dependencies
91
+ pip install -r requirements.txt
92
+
93
+ # Run the app locally
94
+ python app.py
95
+
96
+ # Or test on a specific file
97
+ python test_local.py path/to/your/document.pdf
98
+ ```
99
+
100
+ ## 🀝 Contributing
101
+
102
+ Found a bug or have a suggestion? Feel free to open an issue or contribute!
103
+
104
+ ## πŸ“ License
105
+
106
+ MIT License - Feel free to use and modify for your projects.
107
+
108
+ ---
109
+
110
+ **Made with ❀️ for better document understanding**
app.py ADDED
@@ -0,0 +1,318 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from docling.document_converter import DocumentConverter
3
+ from docling.datamodel.base_models import InputFormat
4
+ from docling.datamodel.pipeline_options import PdfPipelineOptions, TableFormerMode
5
+ from docling.document_converter import PdfFormatOption
6
+ from PIL import Image, ImageDraw, ImageFont
7
+ import json
8
+ import fitz # PyMuPDF
9
+
10
+ # Color mapping for different layout elements
11
+ COLORS = {
12
+ "title": "#FF6B6B",
13
+ "text": "#4ECDC4",
14
+ "section_header": "#95E1D3",
15
+ "table": "#F38181",
16
+ "list": "#AA96DA",
17
+ "figure": "#FCBAD3",
18
+ "caption": "#A8D8EA",
19
+ "formula": "#FFD93D",
20
+ "footnote": "#6BCB77",
21
+ "page_header": "#4D96FF",
22
+ "page_footer": "#9D84B7",
23
+ "picture": "#FF8C42",
24
+ }
25
+
26
+ def draw_layout_boxes(image_path, layout_data, scale_x=1.0, scale_y=1.0):
27
+ """Draw bounding boxes on the image based on layout predictions"""
28
+ # Open the image
29
+ if isinstance(image_path, str):
30
+ img = Image.open(image_path).convert("RGB")
31
+ else:
32
+ img = image_path.convert("RGB")
33
+
34
+ draw = ImageDraw.Draw(img)
35
+
36
+ # Try to load a font, fallback to default if not available
37
+ try:
38
+ font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 20)
39
+ small_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14)
40
+ except:
41
+ font = ImageFont.load_default()
42
+ small_font = ImageFont.load_default()
43
+
44
+ # Draw each cluster
45
+ for cluster in layout_data:
46
+ label = cluster.get("label", "unknown")
47
+ bbox = cluster.get("bbox")
48
+
49
+ if bbox:
50
+ # bbox format: [x0, y0, x1, y1] from PDF coordinates
51
+ # Scale to match rendered image dimensions
52
+ x0, y0, x1, y1 = bbox
53
+ x0 = x0 * scale_x
54
+ y0 = y0 * scale_y
55
+ x1 = x1 * scale_x
56
+ y1 = y1 * scale_y
57
+
58
+ # Get color for this label
59
+ color = COLORS.get(label, "#999999")
60
+
61
+ # Draw rectangle
62
+ draw.rectangle([x0, y0, x1, y1], outline=color, width=3)
63
+
64
+ # Draw label background
65
+ label_text = label.replace("_", " ").title()
66
+ bbox_text = draw.textbbox((x0, y0 - 25), label_text, font=small_font)
67
+ draw.rectangle([bbox_text[0] - 2, bbox_text[1] - 2, bbox_text[2] + 2, bbox_text[3] + 2],
68
+ fill=color)
69
+
70
+ # Draw label text
71
+ draw.text((x0, y0 - 25), label_text, fill="white", font=small_font)
72
+
73
+ return img
74
+
75
+ def process_document(file_path, mode, enable_ocr, enable_tables):
76
+ """Process document with Docling and return results"""
77
+ try:
78
+ # Configure pipeline options
79
+ pipeline_options = PdfPipelineOptions()
80
+ pipeline_options.do_table_structure = enable_tables
81
+
82
+ if enable_tables:
83
+ if mode == "Accurate":
84
+ pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE
85
+ else:
86
+ pipeline_options.table_structure_options.mode = TableFormerMode.FAST
87
+
88
+ pipeline_options.do_ocr = enable_ocr
89
+ pipeline_options.generate_page_images = True
90
+ pipeline_options.generate_picture_images = True
91
+
92
+ # Create converter
93
+ converter = DocumentConverter(
94
+ format_options={
95
+ InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
96
+ InputFormat.IMAGE: PdfFormatOption(pipeline_options=pipeline_options),
97
+ }
98
+ )
99
+
100
+ # Convert document
101
+ result = converter.convert(file_path)
102
+
103
+ # Extract layout information
104
+ layout_info = []
105
+ total_clusters = 0
106
+ table_count = 0
107
+
108
+ for page_no, page in enumerate(result.pages, 1):
109
+ if page.predictions.layout:
110
+ clusters = page.predictions.layout.clusters
111
+ total_clusters += len(clusters)
112
+
113
+ for cluster in clusters:
114
+ layout_info.append({
115
+ "page": page_no,
116
+ "label": cluster.label,
117
+ "bbox": [cluster.bbox.l, cluster.bbox.t, cluster.bbox.r, cluster.bbox.b],
118
+ "confidence": getattr(cluster, "confidence", None)
119
+ })
120
+
121
+ # Count tables
122
+ if page.predictions.tablestructure and page.predictions.tablestructure.table_map:
123
+ table_count += len(page.predictions.tablestructure.table_map)
124
+
125
+ # Get markdown output
126
+ markdown_output = result.document.export_to_markdown()
127
+
128
+ # Create visualization for first page
129
+ visualization = None
130
+ if result.pages and layout_info:
131
+ # Draw boxes on first page only
132
+ first_page_layout = [item for item in layout_info if item["page"] == 1]
133
+
134
+ try:
135
+ # Check if input is an image or PDF
136
+ file_ext = file_path.lower().split('.')[-1]
137
+
138
+ if file_ext in ['jpg', 'jpeg', 'png', 'tiff', 'bmp']:
139
+ # For images: Open directly, coordinates should match 1:1
140
+ first_page_image = Image.open(file_path).convert("RGB")
141
+ # No scaling needed for images - coordinates are already in pixels
142
+ visualization = draw_layout_boxes(first_page_image, first_page_layout,
143
+ scale_x=1.0, scale_y=1.0)
144
+ else:
145
+ # For PDFs: Render and calculate scale
146
+ doc = fitz.open(file_path)
147
+ page = doc[0]
148
+
149
+ # Get page dimensions in PDF points
150
+ page_rect = page.rect
151
+ pdf_width = page_rect.width
152
+ pdf_height = page_rect.height
153
+
154
+ # Render at 2x for better quality
155
+ zoom = 2.0
156
+ mat = fitz.Matrix(zoom, zoom)
157
+ pix = page.get_pixmap(matrix=mat)
158
+ first_page_image = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
159
+
160
+ # Calculate scale: rendered_pixels / pdf_points
161
+ scale_x = pix.width / pdf_width
162
+ scale_y = pix.height / pdf_height
163
+
164
+ doc.close()
165
+
166
+ # Draw boxes with calculated scale
167
+ visualization = draw_layout_boxes(first_page_image, first_page_layout,
168
+ scale_x=scale_x, scale_y=scale_y)
169
+ except Exception as e:
170
+ print(f"Could not create visualization: {e}")
171
+ import traceback
172
+ traceback.print_exc()
173
+
174
+ # Create summary
175
+ summary = f"""## Document Analysis Summary
176
+
177
+ πŸ“„ **Total Pages:** {len(result.document.pages)}
178
+ 🏷️ **Layout Elements Detected:** {total_clusters}
179
+ πŸ“Š **Tables Found:** {table_count}
180
+
181
+ ### Layout Elements by Type:
182
+ """
183
+ # Count elements by type
184
+ element_counts = {}
185
+ for item in layout_info:
186
+ label = item["label"]
187
+ element_counts[label] = element_counts.get(label, 0) + 1
188
+
189
+ for label, count in sorted(element_counts.items()):
190
+ summary += f"- **{label.replace('_', ' ').title()}**: {count}\n"
191
+
192
+ # JSON output
193
+ json_output = json.dumps(layout_info, indent=2)
194
+
195
+ return visualization, summary, markdown_output, json_output
196
+
197
+ except Exception as e:
198
+ error_msg = f"Error processing document: {str(e)}"
199
+ return None, error_msg, error_msg, error_msg
200
+
201
+ def gradio_interface(file, mode, enable_ocr, enable_tables):
202
+ """Gradio interface function"""
203
+ if file is None:
204
+ return None, "Please upload a document", "", ""
205
+
206
+ return process_document(file.name, mode, enable_ocr, enable_tables)
207
+
208
+ # Create Gradio interface
209
+ with gr.Blocks(title="Document Layout Detection", theme=gr.themes.Soft()) as demo:
210
+ gr.Markdown("""
211
+ # πŸ“„ Document Layout & Structure Detection
212
+
213
+ Upload a document (PDF, image, etc.) to automatically detect its layout structure including text, tables, figures, and more!
214
+
215
+ **Features:**
216
+ - **AI-Powered Layout Detection**: Automatically identifies document elements
217
+ - **Table Structure Extraction**: Recognizes and extracts table data
218
+ - **OCR Support**: Reads text from scanned documents and images
219
+ """)
220
+
221
+ with gr.Row():
222
+ with gr.Column(scale=1):
223
+ file_input = gr.File(
224
+ label="Upload Document",
225
+ file_types=[".pdf", ".jpg", ".jpeg", ".png", ".tiff", ".bmp"]
226
+ )
227
+
228
+ mode_dropdown = gr.Dropdown(
229
+ choices=["Fast", "Accurate"],
230
+ value="Fast",
231
+ label="Processing Mode",
232
+ info="Accurate mode is slower but better for complex tables"
233
+ )
234
+
235
+ ocr_checkbox = gr.Checkbox(
236
+ label="Enable OCR",
237
+ value=True,
238
+ info="Use OCR for scanned documents and images"
239
+ )
240
+
241
+ tables_checkbox = gr.Checkbox(
242
+ label="Enable Table Detection",
243
+ value=True,
244
+ info="Detect and extract table structures"
245
+ )
246
+
247
+ process_btn = gr.Button("πŸš€ Process Document", variant="primary", size="lg")
248
+
249
+ with gr.Column(scale=2):
250
+ visualization_output = gr.Image(label="Layout Visualization (First Page)")
251
+ summary_output = gr.Markdown(label="Summary")
252
+
253
+ with gr.Tabs():
254
+ with gr.Tab("πŸ“ Markdown Output"):
255
+ markdown_output = gr.Textbox(
256
+ label="Extracted Content (Markdown)",
257
+ lines=20,
258
+ max_lines=30
259
+ )
260
+
261
+ with gr.Tab("πŸ”§ JSON Layout Data"):
262
+ json_output = gr.Code(
263
+ label="Layout Predictions (JSON)",
264
+ language="json",
265
+ lines=20
266
+ )
267
+
268
+ gr.Markdown("""
269
+ ### Legend
270
+ Different colors represent different document elements:
271
+ - πŸ”΄ Title
272
+ - πŸ”΅ Text
273
+ - 🟒 Section Header
274
+ - 🟠 Table
275
+ - 🟣 List/Figure/Formula
276
+
277
+ ### How to Use
278
+ 1. Upload your document (PDF or image of ID card, invoice, report, etc.)
279
+ 2. Choose processing options (Fast mode recommended for quick results)
280
+ 3. Click "Process Document"
281
+ 4. View the visualization with bounding boxes and explore the outputs
282
+
283
+ ### πŸ’‘ Try Examples Below!
284
+ Click on any example to see instant results on different document types.
285
+ """)
286
+
287
+ # Add examples
288
+ gr.Examples(
289
+ examples=[
290
+ ["sample/Screenshot 2025-10-13 114010.png", "Fast", True, True],
291
+ ["sample/Screenshot 2025-10-13 114606.png", "Fast", True, True],
292
+ ["sample/Screenshot 2025-10-15 111602.png", "Fast", True, True],
293
+ ["sample/Screenshot 2025-10-15 175735.png", "Fast", True, True],
294
+ ],
295
+ inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
296
+ outputs=[visualization_output, summary_output, markdown_output, json_output],
297
+ fn=gradio_interface,
298
+ cache_examples=False,
299
+ label="πŸ“š Example Documents"
300
+ )
301
+
302
+ # Connect the button
303
+ process_btn.click(
304
+ fn=gradio_interface,
305
+ inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
306
+ outputs=[visualization_output, summary_output, markdown_output, json_output]
307
+ )
308
+
309
+ # Auto-process on file upload (optional)
310
+ file_input.change(
311
+ fn=gradio_interface,
312
+ inputs=[file_input, mode_dropdown, ocr_checkbox, tables_checkbox],
313
+ outputs=[visualization_output, summary_output, markdown_output, json_output]
314
+ )
315
+
316
+ # Launch the app
317
+ if __name__ == "__main__":
318
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Install torch first with CPU support
2
+ --extra-index-url https://download.pytorch.org/whl/cpu
3
+ torch
4
+ torchvision
5
+
6
+ # Main dependencies
7
+ docling>=2.0
8
+ gradio>=5.0
9
+ pymupdf>=1.24
sample/Screenshot 2025-10-13 114010.png ADDED

Git LFS Details

  • SHA256: de7049326db14e68944db3b142d79c1725a3399c4cc52970420be48ce73e9cd4
  • Pointer size: 131 Bytes
  • Size of remote file: 236 kB
sample/Screenshot 2025-10-13 114606.png ADDED

Git LFS Details

  • SHA256: c164aa77c45cefd007f28b73671d9b834611c93996affbbd56c49d11966b94b1
  • Pointer size: 131 Bytes
  • Size of remote file: 169 kB
sample/Screenshot 2025-10-15 111602.png ADDED

Git LFS Details

  • SHA256: 56992f493b30c9c763bb36cae7a71c80fdb99b34209c114c2a63fee4fe3ae835
  • Pointer size: 131 Bytes
  • Size of remote file: 454 kB
sample/Screenshot 2025-10-15 175735.png ADDED

Git LFS Details

  • SHA256: 54bb9f82b8f08629bd61af031bcdfd451b9ba603264d664abb128302e8793289
  • Pointer size: 131 Bytes
  • Size of remote file: 503 kB