Spaces:
Build error
Build error
File size: 1,580 Bytes
8287cb2 1024113 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
title: PDF OCR (Detectron2 + TrOCR)
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
## PDF OCR (Detectron2 + TrOCR) - Hugging Face Spaces
This repo contains a deployable Gradio app that detects text lines with Detectron2 and reads them with TrOCR. Optional Gemini correction can refine the text.
### Files
- `app.py`: Gradio UI
- `inference.py`: OCR pipeline (Detectron2 + TrOCR)
- `requirements.txt`: Python dependencies (Detectron2 installed in Dockerfile)
- `Dockerfile`: CUDA-enabled image for GPU Space
- `model_final.pth`: Detectron2 weights
### Deploy on Hugging Face Spaces (Docker Space)
1. Create a new Space on Hugging Face → Type: Docker → Hardware: GPU (T4/A10G).
2. Push these files to the Space repository (or connect this folder and `git push`).
3. Set optional secret: `GEMINI_API_KEY` (for correction) in Space Settings → Secrets.
4. Wait for the build to finish. The app will start on port 7860.
### Use
1. Upload a PDF.
2. (Optional) Toggle Split-page (currently standard pipeline is used) and Gemini correction.
3. Click Process.
4. Download the ZIP of per-page JSONs. The full combined text is shown in the textbox.
### Local run (GPU recommended)
```bash
docker build -t ocr-app .
docker run --gpus all -p 7860:7860 ocr-app
```
Then open http://localhost:7860
### Notes
- Detectron2 requires GPU for reasonable speed; CPU will be slow.
- `TEXTLINE_MODEL_PATH` can be overridden via env var if the weights are elsewhere.
- TrOCR models are downloaded on first run and cached in the container layer after warmup.
|