Spaces:

superlinked
/

document-ocr

Running

App Files Files Community

document-ocr / README.md

Filip Makraduli

Switch to transformers5 SIE image; LightOnOCR as default recognition

4e0f10e 6 days ago

preview code

raw

history blame contribute delete

2.87 kB

	---
	title: Document OCR
	emoji: 📄
	colorFrom: purple
	colorTo: blue
	sdk: docker
	app_port: 7860
	pinned: false
	short_description: Multi-model OCR pipeline running on SIE
	---

	# Document OCR

	A small Hugging Face Space that runs three different OCR-class models through one
	[SIE](https://github.com/superlinked/sie) inference engine. Pick a sample
	document on the left, swap any of the three models in the dropdowns, watch
	SIE hot-swap them with one identifier change.

	## What runs in this Space

	A single Docker container with two processes:

	- `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, no preload
	(lazy-loads models on first click to fit free-tier memory).
	- A small Node web server on `0.0.0.0:7860` that serves the UI and
	proxies requests to SIE via SSE.

	Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-transformers5`.
	HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
	model weights survive Space restarts.

	## Model lineup

	\| Stage \| Default \| Alternates (lazy-load on click) \|
	\|---\|---\|---\|
	\| Recognition \| `lightonai/LightOnOCR-2-1B` (2.1B, Markdown output) \| PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) \|
	\| Structured \| `naver-clova-ix/donut-base-finetuned-cord-v2` \| Donut-DocVQA, Donut-RVLCDIP \|
	\| NER \| `urchade/gliner_multi-v2.1` \| GLiNER-large, GLiNER-PII, NuNER-Zero \|

	The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB).
	Alternates lazy-load on first click.

	## What SIE provides here

	Three different model architectures, one API:

	```
	client.extract(model_id, { images: [bytes] })
	```

	The model ID alone decides whether you get VLM Markdown (LightOnOCR),
	structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate
	auth, no separate rate limit, no separate deployment story.

	## Source

	Built from the `document-ocr` demo in
	[superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr).
	The local-Docker version uses `docker compose` against the same upstream
	SIE image; this Space packages everything into one container for HF.

	## Performance note

	This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click
	for each model is a cold load (60-180 s) while weights download and the
	adapter spins up. Subsequent clicks reuse the cached weights and run in
	20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the
	heavier models like GLM-OCR become tractable.

	The SIE image this Space runs on is `latest-cpu-transformers5`, where the
	LightOnOCR adapter lives. Florence-2 ships in the sibling `default`
	bundle (which pins `transformers<5`) and is not available on this image;
	see [sie-internal#828](https://github.com/superlinked/sie-internal/issues/828)
	for the bundle-composition story.

	Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).