--- title: QuickPDF OCR emoji: 📄 colorFrom: blue colorTo: blue sdk: docker app_port: 8000 pinned: false --- # QuickPDF Studio: OCR Microservice A dedicated, self-hosted OCR extraction API built with **FastAPI**, **EasyOCR**, and **PyMuPDF**. ## Features - **Pure REST API**: No frontend bloat, just high-performance endpoints. - **Multi-Format Support**: JPG, PNG, WebP, and multi-page scanned PDFs. - **Localized**: Pre-configured for English ('en') and German ('de'). - **Privacy First**: Files are processed in memory and never persisted to disk. - **Docker Ready**: Easy deployment on any system. ## Quick Start (Local) 1. **Install Dependencies**: ```bash pip install -r requirements.txt ``` 2. **Run Server**: ```bash uvicorn main:app --reload ``` 3. **Check Health**: ```bash curl http://localhost:8000/health ``` ## API Usage ### `POST /ocr` Extract text from a document. **Parameters**: - `file`: Multipart/form-data upload. - `languages` (Optional): Query param (e.g. `?languages=en,de`). **Example Request (Curl)**: ```bash curl -X 'POST' \ 'http://localhost:8000/ocr?languages=en,de' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'file=@sample.pdf' ``` **Example Response**: ```json { "success": true, "text": "Full extracted text...", "pages": [ {"page": 1, "text": "Page 1 text..."} ], "language_used": ["en", "de"] } ``` ## Docker Deployment 1. **Build Image**: ```bash docker build -t quickpdf-ocr . ``` 2. **Run Container**: ```bash docker run -d -p 8000:8000 --name ocr-service quickpdf-ocr ``` ## Notes - **Models**: On first run, EasyOCR will download approximately 150MB of trained models to `~/.EasyOCR`. - **Resources**: This service is optimized for CPU. It is recommended to have at least 2GB of RAM available.