Spaces:

zasprince8
/

quickpdf-ocr

Running

App Files Files Community

quickpdf-ocr / README.md

zasprince8

Upload 5 files

cebe5cf verified about 1 month ago

preview code

raw

history blame contribute delete

1.85 kB

	---
	title: QuickPDF OCR
	emoji: 📄
	colorFrom: blue
	colorTo: blue
	sdk: docker
	app_port: 8000
	pinned: false
	---

	# QuickPDF Studio: OCR Microservice

	A dedicated, self-hosted OCR extraction API built with FastAPI, EasyOCR, and PyMuPDF.

	## Features
	- Pure REST API: No frontend bloat, just high-performance endpoints.
	- Multi-Format Support: JPG, PNG, WebP, and multi-page scanned PDFs.
	- Localized: Pre-configured for English ('en') and German ('de').
	- Privacy First: Files are processed in memory and never persisted to disk.
	- Docker Ready: Easy deployment on any system.

	## Quick Start (Local)

	1. Install Dependencies:
	```bash
	pip install -r requirements.txt
	```

	2. Run Server:
	```bash
	uvicorn main:app --reload
	```

	3. Check Health:
	```bash
	curl http://localhost:8000/health
	```

	## API Usage

	### `POST /ocr`
	Extract text from a document.

	Parameters:
	- `file`: Multipart/form-data upload.
	- `languages` (Optional): Query param (e.g. `?languages=en,de`).

	Example Request (Curl):
	```bash
	curl -X 'POST' \
	'http://localhost:8000/ocr?languages=en,de' \
	-H 'accept: application/json' \
	-H 'Content-Type: multipart/form-data' \
	-F 'file=@sample.pdf'
	```

	Example Response:
	```json
	{
	"success": true,
	"text": "Full extracted text...",
	"pages": [
	{"page": 1, "text": "Page 1 text..."}
	],
	"language_used": ["en", "de"]
	}
	```

	## Docker Deployment

	1. Build Image:
	```bash
	docker build -t quickpdf-ocr .
	```

	2. Run Container:
	```bash
	docker run -d -p 8000:8000 --name ocr-service quickpdf-ocr
	```

	## Notes
	- Models: On first run, EasyOCR will download approximately 150MB of trained models to `~/.EasyOCR`.
	- Resources: This service is optimized for CPU. It is recommended to have at least 2GB of RAM available.