ocr / README.md
jeyanthangj2004's picture
Upload 110 files
3f42a6f verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: eDOCr2 - Engineering Drawing OCR
emoji: πŸ”§
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

πŸ”§ eDOCr2 - Engineering Drawing OCR

Extract dimensions, tables, and GD&T symbols from engineering drawings automatically using deep learning.

🎯 Features

  • βœ… Table Extraction - Title blocks, revision tables, bill of materials
  • βœ… GD&T Recognition - Geometric dimensioning and tolerancing symbols
  • βœ… Dimension Detection - Measurements with tolerances
  • βœ… Multi-format Support - JPG, PNG, PDF
  • βœ… Structured Output - JSON and CSV export
  • βœ… Visual Annotation - Highlighted detection results

πŸš€ How to Use

  1. Upload your engineering drawing (JPG, PNG, or PDF)
  2. Click "Process Drawing" button
  3. View annotated results and extracted data
  4. Download complete results as ZIP file

πŸ“Š What Gets Extracted

Tables

  • Title blocks with part information
  • Revision history tables
  • Bill of materials (BOM)
  • General notes and specifications

GD&T Symbols

  • Geometric tolerancing symbols
  • Feature control frames
  • Datum references

Dimensions

  • Linear dimensions
  • Angular dimensions
  • Tolerance values
  • Diameter and radius callouts

πŸ”§ Technology Stack

  • Deep Learning Models: Custom-trained Keras OCR models
  • Text Detection: CRAFT-based detector
  • Text Recognition: CRNN-based recognizer
  • Symbol Matching: Template matching algorithms
  • Framework: Gradio for web interface

πŸ“š Research

This tool is based on the research paper:

"eDOCr2: Automated Extraction of Information from Engineering Drawings"
http://dx.doi.org/10.2139/ssrn.5045921

πŸ’‘ Tips for Best Results

  • Use high-resolution scans (300 DPI or higher)
  • Ensure clear text and symbols
  • Avoid skewed or rotated images
  • Use clean drawings without handwritten annotations

πŸ› οΈ Local Installation

To run this locally:

# Clone repository
git clone https://github.com/javvi51/edocr2.git
cd edocr2

# Install dependencies
pip install -r requirements.txt

# Download models (see releases)
# Place in edocr2/models/

# Run app
python app.py

πŸ“¦ Model Files

The pre-trained models are automatically loaded from the repository:

  • recognizer_gdts.keras (67.2 MB) - GD&T symbol recognition
  • recognizer_dimensions_2.keras (67.2 MB) - Dimension recognition

Download from: GitHub Releases

πŸ”— Links

πŸ“ License

MIT License - See LICENSE file for details

🀝 Citation

If you use this tool in your research, please cite:

@article{villena2024edocr2,
  title={eDOCr2: Automated Extraction of Information from Engineering Drawings},
  author={Villena Toro, Javier},
  year={2024},
  doi={10.2139/ssrn.5045921}
}

⚠️ Limitations

  • Works best with mechanical/production drawings
  • Requires clear, high-quality scans
  • May struggle with handwritten annotations
  • Processing time: 10-30 seconds per drawing

πŸ› Known Issues

  • PDF support limited to first page only
  • Very large images (>10MB) may timeout
  • Some custom GD&T symbols may not be recognized

πŸ“§ Contact

For issues and questions:


Enjoy using eDOCr2! πŸš€