Spaces:

zasprince8
/

quickpdf-ocr

Running

App Files Files Community

quickpdf-ocr / README.md

zasprince8

Upload 5 files

cebe5cf verified about 1 month ago

preview code

raw

history blame contribute delete

1.85 kB

metadata

title: QuickPDF OCR
emoji: 📄
colorFrom: blue
colorTo: blue
sdk: docker
app_port: 8000
pinned: false

QuickPDF Studio: OCR Microservice

A dedicated, self-hosted OCR extraction API built with FastAPI, EasyOCR, and PyMuPDF.

Features

Pure REST API: No frontend bloat, just high-performance endpoints.
Multi-Format Support: JPG, PNG, WebP, and multi-page scanned PDFs.
Localized: Pre-configured for English ('en') and German ('de').
Privacy First: Files are processed in memory and never persisted to disk.
Docker Ready: Easy deployment on any system.

Quick Start (Local)

Install Dependencies:
```
pip install -r requirements.txt
```
Run Server:
```
uvicorn main:app --reload
```
Check Health:
```
curl http://localhost:8000/health
```

API Usage

`POST /ocr`

Extract text from a document.

Parameters:

file: Multipart/form-data upload.
languages (Optional): Query param (e.g. ?languages=en,de).

Example Request (Curl):

curl -X 'POST' \
  'http://localhost:8000/ocr?languages=en,de' \
  -H 'accept: application/json' \
  -H 'Content-Type: multipart/form-data' \
  -F 'file=@sample.pdf'

Example Response:

{
  "success": true,
  "text": "Full extracted text...",
  "pages": [
    {"page": 1, "text": "Page 1 text..."}
  ],
  "language_used": ["en", "de"]
}

Docker Deployment

Build Image:
```
docker build -t quickpdf-ocr .
```

Run Container:

docker run -d -p 8000:8000 --name ocr-service quickpdf-ocr

Notes

Models: On first run, EasyOCR will download approximately 150MB of trained models to ~/.EasyOCR.
Resources: This service is optimized for CPU. It is recommended to have at least 2GB of RAM available.