AdvancedOCR / README.md
satvikjain's picture
fix: README
8287cb2
---
title: PDF OCR (Detectron2 + TrOCR)
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
## PDF OCR (Detectron2 + TrOCR) - Hugging Face Spaces
This repo contains a deployable Gradio app that detects text lines with Detectron2 and reads them with TrOCR. Optional Gemini correction can refine the text.
### Files
- `app.py`: Gradio UI
- `inference.py`: OCR pipeline (Detectron2 + TrOCR)
- `requirements.txt`: Python dependencies (Detectron2 installed in Dockerfile)
- `Dockerfile`: CUDA-enabled image for GPU Space
- `model_final.pth`: Detectron2 weights
### Deploy on Hugging Face Spaces (Docker Space)
1. Create a new Space on Hugging Face β†’ Type: Docker β†’ Hardware: GPU (T4/A10G).
2. Push these files to the Space repository (or connect this folder and `git push`).
3. Set optional secret: `GEMINI_API_KEY` (for correction) in Space Settings β†’ Secrets.
4. Wait for the build to finish. The app will start on port 7860.
### Use
1. Upload a PDF.
2. (Optional) Toggle Split-page (currently standard pipeline is used) and Gemini correction.
3. Click Process.
4. Download the ZIP of per-page JSONs. The full combined text is shown in the textbox.
### Local run (GPU recommended)
```bash
docker build -t ocr-app .
docker run --gpus all -p 7860:7860 ocr-app
```
Then open http://localhost:7860
### Notes
- Detectron2 requires GPU for reasonable speed; CPU will be slow.
- `TEXTLINE_MODEL_PATH` can be overridden via env var if the weights are elsewhere.
- TrOCR models are downloaded on first run and cached in the container layer after warmup.