Spaces:
Running
Running
| title: QuickPDF OCR | |
| emoji: ๐ | |
| colorFrom: blue | |
| colorTo: blue | |
| sdk: docker | |
| app_port: 8000 | |
| pinned: false | |
| # QuickPDF Studio: OCR Microservice | |
| A dedicated, self-hosted OCR extraction API built with **FastAPI**, **EasyOCR**, and **PyMuPDF**. | |
| ## Features | |
| - **Pure REST API**: No frontend bloat, just high-performance endpoints. | |
| - **Multi-Format Support**: JPG, PNG, WebP, and multi-page scanned PDFs. | |
| - **Localized**: Pre-configured for English ('en') and German ('de'). | |
| - **Privacy First**: Files are processed in memory and never persisted to disk. | |
| - **Docker Ready**: Easy deployment on any system. | |
| ## Quick Start (Local) | |
| 1. **Install Dependencies**: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **Run Server**: | |
| ```bash | |
| uvicorn main:app --reload | |
| ``` | |
| 3. **Check Health**: | |
| ```bash | |
| curl http://localhost:8000/health | |
| ``` | |
| ## API Usage | |
| ### `POST /ocr` | |
| Extract text from a document. | |
| **Parameters**: | |
| - `file`: Multipart/form-data upload. | |
| - `languages` (Optional): Query param (e.g. `?languages=en,de`). | |
| **Example Request (Curl)**: | |
| ```bash | |
| curl -X 'POST' \ | |
| 'http://localhost:8000/ocr?languages=en,de' \ | |
| -H 'accept: application/json' \ | |
| -H 'Content-Type: multipart/form-data' \ | |
| -F 'file=@sample.pdf' | |
| ``` | |
| **Example Response**: | |
| ```json | |
| { | |
| "success": true, | |
| "text": "Full extracted text...", | |
| "pages": [ | |
| {"page": 1, "text": "Page 1 text..."} | |
| ], | |
| "language_used": ["en", "de"] | |
| } | |
| ``` | |
| ## Docker Deployment | |
| 1. **Build Image**: | |
| ```bash | |
| docker build -t quickpdf-ocr . | |
| ``` | |
| 2. **Run Container**: | |
| ```bash | |
| docker run -d -p 8000:8000 --name ocr-service quickpdf-ocr | |
| ``` | |
| ## Notes | |
| - **Models**: On first run, EasyOCR will download approximately 150MB of trained models to `~/.EasyOCR`. | |
| - **Resources**: This service is optimized for CPU. It is recommended to have at least 2GB of RAM available. | |