document-ocr / README.md
Filip Makraduli
Switch to transformers5 SIE image; LightOnOCR as default recognition
4e0f10e
---
title: Document OCR
emoji: ๐Ÿ“„
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
short_description: Multi-model OCR pipeline running on SIE
---
# Document OCR
A small Hugging Face Space that runs three different OCR-class models through one
[SIE](https://github.com/superlinked/sie) inference engine. Pick a sample
document on the left, swap any of the three models in the dropdowns, watch
SIE hot-swap them with one identifier change.
## What runs in this Space
A single Docker container with two processes:
- `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, no preload
(lazy-loads models on first click to fit free-tier memory).
- A small Node web server on `0.0.0.0:7860` that serves the UI and
proxies requests to SIE via SSE.
Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-transformers5`.
HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
model weights survive Space restarts.
## Model lineup
| Stage | Default | Alternates (lazy-load on click) |
|---|---|---|
| Recognition | `lightonai/LightOnOCR-2-1B` (2.1B, Markdown output) | PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) |
| Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA, Donut-RVLCDIP |
| NER | `urchade/gliner_multi-v2.1` | GLiNER-large, GLiNER-PII, NuNER-Zero |
The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB).
Alternates lazy-load on first click.
## What SIE provides here
Three different model architectures, one API:
```
client.extract(model_id, { images: [bytes] })
```
The model ID alone decides whether you get VLM Markdown (LightOnOCR),
structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate
auth, no separate rate limit, no separate deployment story.
## Source
Built from the `document-ocr` demo in
[superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr).
The local-Docker version uses `docker compose` against the same upstream
SIE image; this Space packages everything into one container for HF.
## Performance note
This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click
for each model is a cold load (60-180 s) while weights download and the
adapter spins up. Subsequent clicks reuse the cached weights and run in
20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the
heavier models like GLM-OCR become tractable.
The SIE image this Space runs on is `latest-cpu-transformers5`, where the
LightOnOCR adapter lives. Florence-2 ships in the sibling `default`
bundle (which pins `transformers<5`) and is not available on this image;
see [sie-internal#828](https://github.com/superlinked/sie-internal/issues/828)
for the bundle-composition story.
Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).