Spaces:
Running
title: Document OCR
emoji: 📄
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
short_description: Multi-model OCR pipeline running on SIE
Document OCR
A small Hugging Face Space that runs three different OCR-class models through one SIE inference engine. Pick a sample document on the left, swap any of the three models in the dropdowns, watch SIE hot-swap them with one identifier change.
What runs in this Space
A single Docker container with two processes:
sie-server(the SIE inference engine) on127.0.0.1:8080, no preload (lazy-loads models on first click to fit free-tier memory).- A small Node web server on
0.0.0.0:7860that serves the UI and proxies requests to SIE via SSE.
Both are baked into one image extending ghcr.io/superlinked/sie-server:latest-cpu-transformers5.
HF Spaces' persistent /data directory is used as the HuggingFace cache so
model weights survive Space restarts.
Model lineup
| Stage | Default | Alternates (lazy-load on click) |
|---|---|---|
| Recognition | lightonai/LightOnOCR-2-1B (2.1B, Markdown output) |
PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) |
| Structured | naver-clova-ix/donut-base-finetuned-cord-v2 |
Donut-DocVQA, Donut-RVLCDIP |
| NER | urchade/gliner_multi-v2.1 |
GLiNER-large, GLiNER-PII, NuNER-Zero |
The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB). Alternates lazy-load on first click.
What SIE provides here
Three different model architectures, one API:
client.extract(model_id, { images: [bytes] })
The model ID alone decides whether you get VLM Markdown (LightOnOCR), structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate auth, no separate rate limit, no separate deployment story.
Source
Built from the document-ocr demo in
superlinked/brave-new-demos.
The local-Docker version uses docker compose against the same upstream
SIE image; this Space packages everything into one container for HF.
Performance note
This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click for each model is a cold load (60-180 s) while weights download and the adapter spins up. Subsequent clicks reuse the cached weights and run in 20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the heavier models like GLM-OCR become tractable.
The SIE image this Space runs on is latest-cpu-transformers5, where the
LightOnOCR adapter lives. Florence-2 ships in the sibling default
bundle (which pins transformers<5) and is not available on this image;
see sie-internal#828
for the bundle-composition story.
Built on SIE (Apache 2.0).