ocr / README.md
jeyanthangj2004's picture
Upload 110 files
3f42a6f verified
---
title: eDOCr2 - Engineering Drawing OCR
emoji: πŸ”§
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# πŸ”§ eDOCr2 - Engineering Drawing OCR
Extract **dimensions**, **tables**, and **GD&T symbols** from engineering drawings automatically using deep learning.
## 🎯 Features
- βœ… **Table Extraction** - Title blocks, revision tables, bill of materials
- βœ… **GD&T Recognition** - Geometric dimensioning and tolerancing symbols
- βœ… **Dimension Detection** - Measurements with tolerances
- βœ… **Multi-format Support** - JPG, PNG, PDF
- βœ… **Structured Output** - JSON and CSV export
- βœ… **Visual Annotation** - Highlighted detection results
## πŸš€ How to Use
1. **Upload** your engineering drawing (JPG, PNG, or PDF)
2. **Click** "Process Drawing" button
3. **View** annotated results and extracted data
4. **Download** complete results as ZIP file
## πŸ“Š What Gets Extracted
### Tables
- Title blocks with part information
- Revision history tables
- Bill of materials (BOM)
- General notes and specifications
### GD&T Symbols
- Geometric tolerancing symbols
- Feature control frames
- Datum references
### Dimensions
- Linear dimensions
- Angular dimensions
- Tolerance values
- Diameter and radius callouts
## πŸ”§ Technology Stack
- **Deep Learning Models**: Custom-trained Keras OCR models
- **Text Detection**: CRAFT-based detector
- **Text Recognition**: CRNN-based recognizer
- **Symbol Matching**: Template matching algorithms
- **Framework**: Gradio for web interface
## πŸ“š Research
This tool is based on the research paper:
**"eDOCr2: Automated Extraction of Information from Engineering Drawings"**
[http://dx.doi.org/10.2139/ssrn.5045921](http://dx.doi.org/10.2139/ssrn.5045921)
## πŸ’‘ Tips for Best Results
- Use **high-resolution** scans (300 DPI or higher)
- Ensure **clear text** and symbols
- Avoid **skewed** or rotated images
- Use **clean** drawings without handwritten annotations
## πŸ› οΈ Local Installation
To run this locally:
```bash
# Clone repository
git clone https://github.com/javvi51/edocr2.git
cd edocr2
# Install dependencies
pip install -r requirements.txt
# Download models (see releases)
# Place in edocr2/models/
# Run app
python app.py
```
## πŸ“¦ Model Files
The pre-trained models are automatically loaded from the repository:
- `recognizer_gdts.keras` (67.2 MB) - GD&T symbol recognition
- `recognizer_dimensions_2.keras` (67.2 MB) - Dimension recognition
Download from: [GitHub Releases](https://github.com/javvi51/edocr2/releases/tag/v1.0.0)
## πŸ”— Links
- **GitHub Repository**: [github.com/javvi51/edocr2](https://github.com/javvi51/edocr2)
- **Research Paper**: [DOI:10.2139/ssrn.5045921](http://dx.doi.org/10.2139/ssrn.5045921)
- **Original Author**: Javier Villena Toro
- **Deployed by**: Jeyanthan GJ
## πŸ“ License
MIT License - See LICENSE file for details
## 🀝 Citation
If you use this tool in your research, please cite:
```bibtex
@article{villena2024edocr2,
title={eDOCr2: Automated Extraction of Information from Engineering Drawings},
author={Villena Toro, Javier},
year={2024},
doi={10.2139/ssrn.5045921}
}
```
## ⚠️ Limitations
- Works best with mechanical/production drawings
- Requires clear, high-quality scans
- May struggle with handwritten annotations
- Processing time: 10-30 seconds per drawing
## πŸ› Known Issues
- PDF support limited to first page only
- Very large images (>10MB) may timeout
- Some custom GD&T symbols may not be recognized
## πŸ“§ Contact
For issues and questions:
- Open an issue on [GitHub](https://github.com/javvi51/edocr2/issues)
- Check the [documentation](https://github.com/javvi51/edocr2/blob/main/docs/examples.md)
---
**Enjoy using eDOCr2! πŸš€**