Spaces:
Runtime error
A newer version of the Gradio SDK is available:
6.8.0
๐ง eDOCr2 Local Web Application
A Flask-based web interface for running eDOCr2 engineering drawing OCR locally on your machine.
๐ฏ Features
- โ Drag & Drop Upload - Easy file upload interface
- โ Real-time Processing - Live feedback during OCR processing
- โ Visual Results - Annotated drawings with detected elements highlighted
- โ Structured Data - Extract tables, dimensions, and GD&T symbols
- โ Download Results - Get all results as a ZIP file
- โ Responsive Design - Works on desktop and mobile browsers
๐ Prerequisites
System Requirements
- Python 3.8 - 3.11 (NumPy 1.26.4 compatibility)
- Tesseract OCR - Required for text recognition
- Poppler (for PDF support)
Installing System Dependencies
Windows:
# Install Tesseract OCR
# Download from: https://github.com/UB-Mannheim/tesseract/wiki
# Add to System PATH
# Install Poppler
# Download from: https://github.com/oschwartz10612/poppler-windows/releases
# Extract and add bin/ to System PATH
Linux (Ubuntu/Debian):
sudo apt-get update
sudo apt-get install tesseract-ocr poppler-utils
macOS:
brew install tesseract poppler
๐ Installation
Step 1: Clone Repository
git clone https://github.com/javvi51/edocr2.git
cd edocr2
Step 2: Create Virtual Environment
# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activate
Step 3: Install Python Dependencies
pip install -r requirements_webapp.txt
Step 4: Download Pre-trained Models
Download the model files from the GitHub Releases:
recognizer_gdts.keras(67.2 MB)recognizer_gdts.txtrecognizer_dimensions_2.keras(67.2 MB)recognizer_dimensions_2.txt
Place them in: edocr2/models/
Quick Download (Linux/Mac):
mkdir -p edocr2/models
cd edocr2/models
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt
cd ../..
Quick Download (Windows PowerShell):
New-Item -ItemType Directory -Force -Path edocr2\models
cd edocr2\models
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras" -OutFile "recognizer_gdts.keras"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt" -OutFile "recognizer_gdts.txt"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras" -OutFile "recognizer_dimensions_2.keras"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt" -OutFile "recognizer_dimensions_2.txt"
cd ..\..
๐ฎ Usage
Start the Server
python app.py
You should see:
๐ง Loading OCR models...
โ
Models loaded in X.XX seconds
โ
Server ready!
๐ฑ Open your browser and go to: http://localhost:5000
Using the Web Interface
- Open Browser - Navigate to
http://localhost:5000 - Upload Drawing - Drag & drop or click to browse
- Wait for Processing - Takes 10-30 seconds
- View Results - See annotated drawing and extracted data
- Download - Get all results as ZIP file
Supported File Formats
- โ JPG/JPEG - Engineering drawing images
- โ PNG - Engineering drawing images
- โ PDF - Engineering drawing PDFs (first page only)
Maximum file size: 50 MB
๐ What Gets Extracted
The application extracts:
- Tables - Title blocks, revision tables, bill of materials
- GD&T Symbols - Geometric dimensioning and tolerancing
- Dimensions - Measurements with tolerances
- Other Info - Additional text and annotations
๐ Output Files
Results are saved in the results/ folder:
results/
โโโ 20231218_123456_drawing_name/
โโโ drawing_name_mask.png # Annotated visualization
โโโ drawing_name.json # Structured data (JSON)
โโโ drawing_name.csv # Tabular data (CSV)
๐ Troubleshooting
Models Not Loading
Error: Model files not found!
Solution:
- Ensure models are in
edocr2/models/ - Check file names match exactly
- Verify files aren't corrupted (check file sizes)
NumPy Version Error
Error: AttributeError: np.sctypes was removed
Solution:
pip uninstall numpy
pip install numpy==1.26.4
Tesseract Not Found
Error: TesseractNotFoundError
Solution:
- Install Tesseract OCR
- Add to System PATH
- Restart terminal/IDE
PDF Processing Error
Error: PDFInfoNotInstalledError
Solution:
- Install Poppler
- Add poppler/bin to System PATH
- Restart terminal
Port Already in Use
Error: Address already in use
Solution:
# Change port in app.py (last line):
app.run(debug=True, host='0.0.0.0', port=5001) # Use different port
โ๏ธ Configuration
Edit app.py to customize:
# Maximum upload size
app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024 # 50MB
# Server port
app.run(debug=True, host='0.0.0.0', port=5000)
# Processing parameters
frame_thres=0.7 # Frame detection threshold
GDT_thres=0.02 # GD&T detection threshold
cluster_thres=20 # Dimension clustering threshold
max_img_size=1048 # Maximum image size for processing
๐ Security Notes
โ ๏ธ This is a local development server
- Not suitable for production deployment
- No authentication/authorization
- No HTTPS encryption
- Only use on trusted local network
For production deployment, use:
- Gunicorn/uWSGI
- Nginx reverse proxy
- HTTPS certificates
- Authentication middleware
๐ API Endpoints
POST /upload
Upload and process a drawing file.
Request: multipart/form-data with file field
Response:
{
"success": true,
"filename": "drawing.jpg",
"processing_time": 15.23,
"stats": {
"tables_found": 2,
"gdt_symbols": 5,
"dimensions": 23,
"other_info": 8
},
"data": { ... },
"mask_path": "drawing_mask.png",
"output_dir": "20231218_123456_drawing"
}
GET /results/<folder>/<filename>
Retrieve a result file.
GET /download/<folder>
Download all results as ZIP.
GET /health
Health check endpoint.
๐ Resources
๐จโ๐ป Credits
- Original eDOCr2: Javier Villena Toro
- Web Application: Jeyanthan GJ
- License: MIT
๐ค Contributing
Issues and pull requests are welcome!
Enjoy using eDOCr2! ๐