quickpdf-ocr / README.md
zasprince8's picture
Upload 5 files
cebe5cf verified
---
title: QuickPDF OCR
emoji: ๐Ÿ“„
colorFrom: blue
colorTo: blue
sdk: docker
app_port: 8000
pinned: false
---
# QuickPDF Studio: OCR Microservice
A dedicated, self-hosted OCR extraction API built with **FastAPI**, **EasyOCR**, and **PyMuPDF**.
## Features
- **Pure REST API**: No frontend bloat, just high-performance endpoints.
- **Multi-Format Support**: JPG, PNG, WebP, and multi-page scanned PDFs.
- **Localized**: Pre-configured for English ('en') and German ('de').
- **Privacy First**: Files are processed in memory and never persisted to disk.
- **Docker Ready**: Easy deployment on any system.
## Quick Start (Local)
1. **Install Dependencies**:
```bash
pip install -r requirements.txt
```
2. **Run Server**:
```bash
uvicorn main:app --reload
```
3. **Check Health**:
```bash
curl http://localhost:8000/health
```
## API Usage
### `POST /ocr`
Extract text from a document.
**Parameters**:
- `file`: Multipart/form-data upload.
- `languages` (Optional): Query param (e.g. `?languages=en,de`).
**Example Request (Curl)**:
```bash
curl -X 'POST' \
'http://localhost:8000/ocr?languages=en,de' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@sample.pdf'
```
**Example Response**:
```json
{
"success": true,
"text": "Full extracted text...",
"pages": [
{"page": 1, "text": "Page 1 text..."}
],
"language_used": ["en", "de"]
}
```
## Docker Deployment
1. **Build Image**:
```bash
docker build -t quickpdf-ocr .
```
2. **Run Container**:
```bash
docker run -d -p 8000:8000 --name ocr-service quickpdf-ocr
```
## Notes
- **Models**: On first run, EasyOCR will download approximately 150MB of trained models to `~/.EasyOCR`.
- **Resources**: This service is optimized for CPU. It is recommended to have at least 2GB of RAM available.