File size: 1,580 Bytes
8287cb2
 
 
 
 
 
 
 
 
1024113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
title: PDF OCR (Detectron2 + TrOCR)
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---

## PDF OCR (Detectron2 + TrOCR) - Hugging Face Spaces

This repo contains a deployable Gradio app that detects text lines with Detectron2 and reads them with TrOCR. Optional Gemini correction can refine the text.

### Files
- `app.py`: Gradio UI
- `inference.py`: OCR pipeline (Detectron2 + TrOCR)
- `requirements.txt`: Python dependencies (Detectron2 installed in Dockerfile)
- `Dockerfile`: CUDA-enabled image for GPU Space
- `model_final.pth`: Detectron2 weights

### Deploy on Hugging Face Spaces (Docker Space)
1. Create a new Space on Hugging Face → Type: Docker → Hardware: GPU (T4/A10G).
2. Push these files to the Space repository (or connect this folder and `git push`).
3. Set optional secret: `GEMINI_API_KEY` (for correction) in Space Settings → Secrets.
4. Wait for the build to finish. The app will start on port 7860.

### Use
1. Upload a PDF.
2. (Optional) Toggle Split-page (currently standard pipeline is used) and Gemini correction.
3. Click Process.
4. Download the ZIP of per-page JSONs. The full combined text is shown in the textbox.

### Local run (GPU recommended)
```bash
docker build -t ocr-app .
docker run --gpus all -p 7860:7860 ocr-app
```

Then open http://localhost:7860

### Notes
- Detectron2 requires GPU for reasonable speed; CPU will be slow.
- `TEXTLINE_MODEL_PATH` can be overridden via env var if the weights are elsewhere.
- TrOCR models are downloaded on first run and cached in the container layer after warmup.