ocr / HUGGINGFACE_CHECKLIST.md
jeyanthangj2004's picture
Upload 110 files
3f42a6f verified
# πŸ“¦ Hugging Face Deployment - File Checklist
## βœ… Files Created for Deployment
### Core Application Files
- βœ… **app.py** - Gradio interface with OCR processing
- βœ… **requirements.txt** - Python dependencies (Gradio + eDOCr2)
- βœ… **packages.txt** - System dependencies (Tesseract, Poppler)
- βœ… **README.md** - Space description with YAML frontmatter
- βœ… **.gitattributes** - Git LFS configuration for model files
### Documentation
- βœ… **DEPLOYMENT.md** - Complete deployment guide
- βœ… **run_local.bat** - Windows quick start script
- βœ… **run_local.sh** - Linux/Mac quick start script
### Required Folders
- βœ… **edocr2/** - Main package (already exists)
- βœ… **edocr2/tools/** - OCR pipelines
- βœ… **edocr2/keras_ocr/** - OCR models
- ⚠️ **edocr2/models/** - Model files (MUST DOWNLOAD)
- βœ… **tests/test_samples/** - Example drawings (optional)
## πŸ”΄ IMPORTANT: Download Model Files
Before deploying, download these 4 files and place in `edocr2/models/`:
1. **recognizer_gdts.keras** (67.2 MB)
2. **recognizer_gdts.txt** (85 bytes)
3. **recognizer_dimensions_2.keras** (67.2 MB)
4. **recognizer_dimensions_2.txt** (42 bytes)
**Download from:** https://github.com/javvi51/edocr2/releases/tag/v1.0.0
## πŸ“‹ Pre-Deployment Checklist
### Local Testing
- [ ] Models downloaded and placed in `edocr2/models/`
- [ ] Run `python app.py` locally
- [ ] Test with sample images
- [ ] Verify all outputs (image, JSON, ZIP)
### Hugging Face Setup
- [ ] Hugging Face account created
- [ ] Git LFS installed
- [ ] New Space created on Hugging Face
### File Verification
- [ ] All files present in folder
- [ ] Model files in correct location
- [ ] `.gitattributes` configured for LFS
- [ ] `README.md` has YAML frontmatter
## πŸš€ Deployment Steps
### 1. Download Models
**Windows PowerShell:**
```powershell
cd edocr2-main
New-Item -ItemType Directory -Force -Path edocr2\models
cd edocr2\models
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras" -OutFile "recognizer_gdts.keras"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt" -OutFile "recognizer_gdts.txt"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras" -OutFile "recognizer_dimensions_2.keras"
Invoke-WebRequest -Uri "https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt" -OutFile "recognizer_dimensions_2.txt"
cd ..\..
```
**Linux/Mac:**
```bash
cd edocr2-main
mkdir -p edocr2/models
cd edocr2/models
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.keras
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_gdts.txt
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.keras
wget https://github.com/javvi51/edocr2/releases/download/v1.0.0/recognizer_dimensions_2.txt
cd ../..
```
### 2. Test Locally (Optional but Recommended)
**Windows:**
```bash
run_local.bat
```
**Linux/Mac:**
```bash
chmod +x run_local.sh
./run_local.sh
```
Open: http://localhost:7860
### 3. Create Hugging Face Space
1. Go to https://huggingface.co/spaces
2. Click "Create new Space"
3. Settings:
- Name: `edocr2` (or your choice)
- License: MIT
- SDK: Gradio
- Hardware: CPU Basic (free)
### 4. Clone Space Repository
```bash
git clone https://huggingface.co/spaces/YOUR_USERNAME/edocr2
cd edocr2
```
### 5. Copy Files
**Windows:**
```bash
xcopy /E /I C:\path\to\edocr2-main\* .
```
**Linux/Mac:**
```bash
cp -r /path/to/edocr2-main/* .
```
### 6. Setup Git LFS
```bash
git lfs install
git lfs track "*.keras"
git add .gitattributes
```
### 7. Commit and Push
```bash
git add .
git commit -m "Initial deployment of eDOCr2"
git push origin main
```
**Note:** Upload may take 5-10 minutes for large model files.
### 8. Wait for Build
- Go to your Space URL
- Wait 5-10 minutes for build
- Check "Logs" tab for errors
## βœ… Verification
Once deployed:
- [ ] Space shows Gradio interface
- [ ] Models load successfully (check logs)
- [ ] Can upload images
- [ ] Processing works
- [ ] Results display correctly
- [ ] Download ZIP works
## 🎯 Your Space URL
After deployment, your Space will be at:
```
https://huggingface.co/spaces/YOUR_USERNAME/edocr2
```
## πŸ“Š Expected Performance
### CPU Basic (Free)
- Processing time: 20-30 seconds per image
- Memory: 2 GB RAM
- Cost: FREE
### T4 GPU (Paid)
- Processing time: 5-10 seconds per image
- Memory: 16 GB RAM
- Cost: $0.60/hour
## πŸ› Common Issues
### "Models not found"
- Ensure models are in `edocr2/models/`
- Check Git LFS tracked the files
- Verify file names are correct
### "Out of memory"
- Upgrade to GPU hardware
- Or reduce `max_img_size` in app.py
### "Build failed"
- Check logs for specific error
- Verify all dependencies in requirements.txt
- Ensure packages.txt has system deps
## πŸ“š Resources
- **Deployment Guide**: See `DEPLOYMENT.md`
- **Hugging Face Docs**: https://huggingface.co/docs/hub/spaces
- **Gradio Docs**: https://gradio.app/docs
- **Original Repo**: https://github.com/javvi51/edocr2
## πŸŽ‰ Success!
Once deployed, share your Space:
```
πŸ”— https://huggingface.co/spaces/YOUR_USERNAME/edocr2
```
---
**Questions?** Check `DEPLOYMENT.md` for detailed troubleshooting.