Spaces:
Build error
Build error
| title: PDF OCR (Detectron2 + TrOCR) | |
| emoji: π§ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| ## PDF OCR (Detectron2 + TrOCR) - Hugging Face Spaces | |
| This repo contains a deployable Gradio app that detects text lines with Detectron2 and reads them with TrOCR. Optional Gemini correction can refine the text. | |
| ### Files | |
| - `app.py`: Gradio UI | |
| - `inference.py`: OCR pipeline (Detectron2 + TrOCR) | |
| - `requirements.txt`: Python dependencies (Detectron2 installed in Dockerfile) | |
| - `Dockerfile`: CUDA-enabled image for GPU Space | |
| - `model_final.pth`: Detectron2 weights | |
| ### Deploy on Hugging Face Spaces (Docker Space) | |
| 1. Create a new Space on Hugging Face β Type: Docker β Hardware: GPU (T4/A10G). | |
| 2. Push these files to the Space repository (or connect this folder and `git push`). | |
| 3. Set optional secret: `GEMINI_API_KEY` (for correction) in Space Settings β Secrets. | |
| 4. Wait for the build to finish. The app will start on port 7860. | |
| ### Use | |
| 1. Upload a PDF. | |
| 2. (Optional) Toggle Split-page (currently standard pipeline is used) and Gemini correction. | |
| 3. Click Process. | |
| 4. Download the ZIP of per-page JSONs. The full combined text is shown in the textbox. | |
| ### Local run (GPU recommended) | |
| ```bash | |
| docker build -t ocr-app . | |
| docker run --gpus all -p 7860:7860 ocr-app | |
| ``` | |
| Then open http://localhost:7860 | |
| ### Notes | |
| - Detectron2 requires GPU for reasonable speed; CPU will be slow. | |
| - `TEXTLINE_MODEL_PATH` can be overridden via env var if the weights are elsewhere. | |
| - TrOCR models are downloaded on first run and cached in the container layer after warmup. | |