Spaces:

satvikjain
/

AdvancedOCR

Build error

App Files Files Community

AdvancedOCR / README.md

satvikjain

fix: README

8287cb2 2 months ago

preview code

raw

history blame contribute delete

1.58 kB

	---
	title: PDF OCR (Detectron2 + TrOCR)
	emoji: 🧠
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	---

	## PDF OCR (Detectron2 + TrOCR) - Hugging Face Spaces

	This repo contains a deployable Gradio app that detects text lines with Detectron2 and reads them with TrOCR. Optional Gemini correction can refine the text.

	### Files
	- `app.py`: Gradio UI
	- `inference.py`: OCR pipeline (Detectron2 + TrOCR)
	- `requirements.txt`: Python dependencies (Detectron2 installed in Dockerfile)
	- `Dockerfile`: CUDA-enabled image for GPU Space
	- `model_final.pth`: Detectron2 weights

	### Deploy on Hugging Face Spaces (Docker Space)
	1. Create a new Space on Hugging Face → Type: Docker → Hardware: GPU (T4/A10G).
	2. Push these files to the Space repository (or connect this folder and `git push`).
	3. Set optional secret: `GEMINI_API_KEY` (for correction) in Space Settings → Secrets.
	4. Wait for the build to finish. The app will start on port 7860.

	### Use
	1. Upload a PDF.
	2. (Optional) Toggle Split-page (currently standard pipeline is used) and Gemini correction.
	3. Click Process.
	4. Download the ZIP of per-page JSONs. The full combined text is shown in the textbox.

	### Local run (GPU recommended)
	```bash
	docker build -t ocr-app .
	docker run --gpus all -p 7860:7860 ocr-app
	```

	Then open http://localhost:7860

	### Notes
	- Detectron2 requires GPU for reasonable speed; CPU will be slow.
	- `TEXTLINE_MODEL_PATH` can be overridden via env var if the weights are elsewhere.
	- TrOCR models are downloaded on first run and cached in the container layer after warmup.