# 🔧 eDOCr2 Local Web Application

A Flask-based web interface for running eDOCr2 engineering drawing OCR locally on your machine.

## 🎯 Features

- ✅ **Drag & Drop Upload** - Easy file upload interface
- ✅ **Real-time Processing** - Live feedback during OCR processing
- ✅ **Visual Results** - Annotated drawings with detected elements highlighted
- ✅ **Structured Data** - Extract tables, dimensions, and GD&T symbols
- ✅ **Download Results** - Get all results as a ZIP file
- ✅ **Responsive Design** - Works on desktop and mobile browsers

## 📋 Prerequisites

### System Requirements

1. **Python 3.8 - 3.11** (NumPy 1.26.4 compatibility)
2. **Tesseract OCR** - Required for text recognition
3. **Poppler** (for PDF support)

### Installing System Dependencies

#### Windows:
```bash
# Install Tesseract OCR
# Download from: https://github.com/UB-Mannheim/tesseract/wiki
# Add to System PATH

# Install Poppler
# Download from: https://github.com/oschwartz10612/poppler-windows/releases
# Extract and add bin/ to System PATH
```

#### Linux (Ubuntu/Debian):
```bash
sudo apt-get update
sudo apt-get install tesseract-ocr poppler-utils
```

#### macOS:
```bash
brew install tesseract poppler
```

## 🚀 Installation

### Step 1: Clone Repository
```bash
git clone https://github.com/javvi51/edocr2.git
cd edocr2
```

### Step 2: Create Virtual Environment
```bash
# Windows
python -m venv venv
venv\Scripts\activate

# Linux/Mac
python3 -m venv venv
source venv/bin/activate
```

### Step 3: Install Python Dependencies
```bash
pip install -r requirements_webapp.txt
```

### Step 4: Download Pre-trained Models

Download the model files from the [GitHub Releases](https://github.com/javvi51/edocr2/releases/tag/v1.0.0):

1. `recognizer_gdts.keras` (67.2 MB)
2. `recognizer_gdts.txt`
3. `recognizer_dimensions_2.keras` (67.2 MB)
4. `recognizer_dimensions_2.txt`

Place them in: `edocr2/models/`

**Quick Download (Linux/Mac):**
```bash
mkdir -p edocr2/models
cd edocr2/models

wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt

cd ../..
```

**Quick Download (Windows PowerShell):**
```powershell
New-Item -ItemType Directory -Force -Path edocr2\models
cd edocr2\models

Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras" -OutFile "recognizer_gdts.keras"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt" -OutFile "recognizer_gdts.txt"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras" -OutFile "recognizer_dimensions_2.keras"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt" -OutFile "recognizer_dimensions_2.txt"

cd ..\..
```

## 🎮 Usage

### Start the Server

```bash
python app.py
```

You should see:
```
🔧 Loading OCR models...
✅ Models loaded in X.XX seconds

✅ Server ready!
📱 Open your browser and go to: http://localhost:5000
```

### Using the Web Interface

1. **Open Browser** - Navigate to `http://localhost:5000`
2. **Upload Drawing** - Drag & drop or click to browse
3. **Wait for Processing** - Takes 10-30 seconds
4. **View Results** - See annotated drawing and extracted data
5. **Download** - Get all results as ZIP file

### Supported File Formats

- ✅ **JPG/JPEG** - Engineering drawing images
- ✅ **PNG** - Engineering drawing images
- ✅ **PDF** - Engineering drawing PDFs (first page only)

**Maximum file size:** 50 MB

## 📊 What Gets Extracted

The application extracts:

1. **Tables** - Title blocks, revision tables, bill of materials
2. **GD&T Symbols** - Geometric dimensioning and tolerancing
3. **Dimensions** - Measurements with tolerances
4. **Other Info** - Additional text and annotations

## 📁 Output Files

Results are saved in the `results/` folder:

```
results/
└── 20231218_123456_drawing_name/
    ├── drawing_name_mask.png          # Annotated visualization
    ├── drawing_name.json              # Structured data (JSON)
    └── drawing_name.csv               # Tabular data (CSV)
```

## 🐛 Troubleshooting

### Models Not Loading

**Error:** `Model files not found!`

**Solution:** 
- Ensure models are in `edocr2/models/`
- Check file names match exactly
- Verify files aren't corrupted (check file sizes)

### NumPy Version Error

**Error:** `AttributeError: np.sctypes was removed`

**Solution:**
```bash
pip uninstall numpy
pip install numpy==1.26.4
```

### Tesseract Not Found

**Error:** `TesseractNotFoundError`

**Solution:**
- Install Tesseract OCR
- Add to System PATH
- Restart terminal/IDE

### PDF Processing Error

**Error:** `PDFInfoNotInstalledError`

**Solution:**
- Install Poppler
- Add poppler/bin to System PATH
- Restart terminal

### Port Already in Use

**Error:** `Address already in use`

**Solution:**
```bash
# Change port in app.py (last line):
app.run(debug=True, host='0.0.0.0', port=5001)  # Use different port
```

## ⚙️ Configuration

Edit `app.py` to customize:

```python
# Maximum upload size
app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024  # 50MB

# Server port
app.run(debug=True, host='0.0.0.0', port=5000)

# Processing parameters
frame_thres=0.7      # Frame detection threshold
GDT_thres=0.02       # GD&T detection threshold
cluster_thres=20     # Dimension clustering threshold
max_img_size=1048    # Maximum image size for processing
```

## 🔒 Security Notes

⚠️ **This is a local development server**

- Not suitable for production deployment
- No authentication/authorization
- No HTTPS encryption
- Only use on trusted local network

For production deployment, use:
- Gunicorn/uWSGI
- Nginx reverse proxy
- HTTPS certificates
- Authentication middleware

## 📚 API Endpoints

### `POST /upload`
Upload and process a drawing file.

**Request:** `multipart/form-data` with `file` field

**Response:**
```json
{
  "success": true,
  "filename": "drawing.jpg",
  "processing_time": 15.23,
  "stats": {
    "tables_found": 2,
    "gdt_symbols": 5,
    "dimensions": 23,
    "other_info": 8
  },
  "data": { ... },
  "mask_path": "drawing_mask.png",
  "output_dir": "20231218_123456_drawing"
}
```

### `GET /results/<folder>/<filename>`
Retrieve a result file.

### `GET /download/<folder>`
Download all results as ZIP.

### `GET /health`
Health check endpoint.

## 📖 Resources

- [eDOCr2 GitHub](https://github.com/javvi51/edocr2)
- [Research Paper](http://dx.doi.org/10.2139/ssrn.5045921)
- [Model Downloads](https://github.com/javvi51/edocr2/releases/tag/v1.0.0)

## 👨‍💻 Credits

- **Original eDOCr2**: Javier Villena Toro
- **Web Application**: Jeyanthan GJ
- **License**: MIT

## 🤝 Contributing

Issues and pull requests are welcome!

---

**Enjoy using eDOCr2! 🚀**