Spaces:
Running
Running
metadata
title: QuickPDF OCR
emoji: 📄
colorFrom: blue
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
QuickPDF Studio: OCR Microservice
A dedicated, self-hosted OCR extraction API built with FastAPI, EasyOCR, and PyMuPDF.
Features
- Pure REST API: No frontend bloat, just high-performance endpoints.
- Multi-Format Support: JPG, PNG, WebP, and multi-page scanned PDFs.
- Localized: Pre-configured for English ('en') and German ('de').
- Privacy First: Files are processed in memory and never persisted to disk.
- Docker Ready: Easy deployment on any system.
Quick Start (Local)
Install Dependencies:
pip install -r requirements.txtRun Server:
uvicorn main:app --reloadCheck Health:
curl http://localhost:8000/health
API Usage
POST /ocr
Extract text from a document.
Parameters:
file: Multipart/form-data upload.languages(Optional): Query param (e.g.?languages=en,de).
Example Request (Curl):
curl -X 'POST' \
'http://localhost:8000/ocr?languages=en,de' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@sample.pdf'
Example Response:
{
"success": true,
"text": "Full extracted text...",
"pages": [
{"page": 1, "text": "Page 1 text..."}
],
"language_used": ["en", "de"]
}
Docker Deployment
Build Image:
docker build -t quickpdf-ocr .Run Container:
docker run -d -p 8000:8000 --name ocr-service quickpdf-ocr
Notes
- Models: On first run, EasyOCR will download approximately 150MB of trained models to
~/.EasyOCR. - Resources: This service is optimized for CPU. It is recommended to have at least 2GB of RAM available.