# 🔧 eDOCr2 Local Web Application A Flask-based web interface for running eDOCr2 engineering drawing OCR locally on your machine. ## 🎯 Features - ✅ **Drag & Drop Upload** - Easy file upload interface - ✅ **Real-time Processing** - Live feedback during OCR processing - ✅ **Visual Results** - Annotated drawings with detected elements highlighted - ✅ **Structured Data** - Extract tables, dimensions, and GD&T symbols - ✅ **Download Results** - Get all results as a ZIP file - ✅ **Responsive Design** - Works on desktop and mobile browsers ## 📋 Prerequisites ### System Requirements 1. **Python 3.8 - 3.11** (NumPy 1.26.4 compatibility) 2. **Tesseract OCR** - Required for text recognition 3. **Poppler** (for PDF support) ### Installing System Dependencies #### Windows: ```bash # Install Tesseract OCR # Download from: https://github.com/UB-Mannheim/tesseract/wiki # Add to System PATH # Install Poppler # Download from: https://github.com/oschwartz10612/poppler-windows/releases # Extract and add bin/ to System PATH ``` #### Linux (Ubuntu/Debian): ```bash sudo apt-get update sudo apt-get install tesseract-ocr poppler-utils ``` #### macOS: ```bash brew install tesseract poppler ``` ## 🚀 Installation ### Step 1: Clone Repository ```bash git clone https://github.com/javvi51/edocr2.git cd edocr2 ``` ### Step 2: Create Virtual Environment ```bash # Windows python -m venv venv venv\Scripts\activate # Linux/Mac python3 -m venv venv source venv/bin/activate ``` ### Step 3: Install Python Dependencies ```bash pip install -r requirements_webapp.txt ``` ### Step 4: Download Pre-trained Models Download the model files from the [GitHub Releases](https://github.com/javvi51/edocr2/releases/tag/v1.0.0): 1. `recognizer_gdts.keras` (67.2 MB) 2. `recognizer_gdts.txt` 3. `recognizer_dimensions_2.keras` (67.2 MB) 4. `recognizer_dimensions_2.txt` Place them in: `edocr2/models/` **Quick Download (Linux/Mac):** ```bash mkdir -p edocr2/models cd edocr2/models wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt cd ../.. ``` **Quick Download (Windows PowerShell):** ```powershell New-Item -ItemType Directory -Force -Path edocr2\models cd edocr2\models Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras" -OutFile "recognizer_gdts.keras" Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt" -OutFile "recognizer_gdts.txt" Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras" -OutFile "recognizer_dimensions_2.keras" Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt" -OutFile "recognizer_dimensions_2.txt" cd ..\.. ``` ## 🎮 Usage ### Start the Server ```bash python app.py ``` You should see: ``` 🔧 Loading OCR models... ✅ Models loaded in X.XX seconds ✅ Server ready! 📱 Open your browser and go to: http://localhost:5000 ``` ### Using the Web Interface 1. **Open Browser** - Navigate to `http://localhost:5000` 2. **Upload Drawing** - Drag & drop or click to browse 3. **Wait for Processing** - Takes 10-30 seconds 4. **View Results** - See annotated drawing and extracted data 5. **Download** - Get all results as ZIP file ### Supported File Formats - ✅ **JPG/JPEG** - Engineering drawing images - ✅ **PNG** - Engineering drawing images - ✅ **PDF** - Engineering drawing PDFs (first page only) **Maximum file size:** 50 MB ## 📊 What Gets Extracted The application extracts: 1. **Tables** - Title blocks, revision tables, bill of materials 2. **GD&T Symbols** - Geometric dimensioning and tolerancing 3. **Dimensions** - Measurements with tolerances 4. **Other Info** - Additional text and annotations ## 📁 Output Files Results are saved in the `results/` folder: ``` results/ └── 20231218_123456_drawing_name/ ├── drawing_name_mask.png # Annotated visualization ├── drawing_name.json # Structured data (JSON) └── drawing_name.csv # Tabular data (CSV) ``` ## 🐛 Troubleshooting ### Models Not Loading **Error:** `Model files not found!` **Solution:** - Ensure models are in `edocr2/models/` - Check file names match exactly - Verify files aren't corrupted (check file sizes) ### NumPy Version Error **Error:** `AttributeError: np.sctypes was removed` **Solution:** ```bash pip uninstall numpy pip install numpy==1.26.4 ``` ### Tesseract Not Found **Error:** `TesseractNotFoundError` **Solution:** - Install Tesseract OCR - Add to System PATH - Restart terminal/IDE ### PDF Processing Error **Error:** `PDFInfoNotInstalledError` **Solution:** - Install Poppler - Add poppler/bin to System PATH - Restart terminal ### Port Already in Use **Error:** `Address already in use` **Solution:** ```bash # Change port in app.py (last line): app.run(debug=True, host='0.0.0.0', port=5001) # Use different port ``` ## ⚙️ Configuration Edit `app.py` to customize: ```python # Maximum upload size app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024 # 50MB # Server port app.run(debug=True, host='0.0.0.0', port=5000) # Processing parameters frame_thres=0.7 # Frame detection threshold GDT_thres=0.02 # GD&T detection threshold cluster_thres=20 # Dimension clustering threshold max_img_size=1048 # Maximum image size for processing ``` ## 🔒 Security Notes ⚠️ **This is a local development server** - Not suitable for production deployment - No authentication/authorization - No HTTPS encryption - Only use on trusted local network For production deployment, use: - Gunicorn/uWSGI - Nginx reverse proxy - HTTPS certificates - Authentication middleware ## 📚 API Endpoints ### `POST /upload` Upload and process a drawing file. **Request:** `multipart/form-data` with `file` field **Response:** ```json { "success": true, "filename": "drawing.jpg", "processing_time": 15.23, "stats": { "tables_found": 2, "gdt_symbols": 5, "dimensions": 23, "other_info": 8 }, "data": { ... }, "mask_path": "drawing_mask.png", "output_dir": "20231218_123456_drawing" } ``` ### `GET /results//` Retrieve a result file. ### `GET /download/` Download all results as ZIP. ### `GET /health` Health check endpoint. ## 📖 Resources - [eDOCr2 GitHub](https://github.com/javvi51/edocr2) - [Research Paper](http://dx.doi.org/10.2139/ssrn.5045921) - [Model Downloads](https://github.com/javvi51/edocr2/releases/tag/v1.0.0) ## 👨‍💻 Credits - **Original eDOCr2**: Javier Villena Toro - **Web Application**: Jeyanthan GJ - **License**: MIT ## 🤝 Contributing Issues and pull requests are welcome! --- **Enjoy using eDOCr2! 🚀**