document-ocr / README.md
Filip Makraduli
Switch to transformers5 SIE image; LightOnOCR as default recognition
4e0f10e
metadata
title: Document OCR
emoji: 📄
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
short_description: Multi-model OCR pipeline running on SIE

Document OCR

A small Hugging Face Space that runs three different OCR-class models through one SIE inference engine. Pick a sample document on the left, swap any of the three models in the dropdowns, watch SIE hot-swap them with one identifier change.

What runs in this Space

A single Docker container with two processes:

  • sie-server (the SIE inference engine) on 127.0.0.1:8080, no preload (lazy-loads models on first click to fit free-tier memory).
  • A small Node web server on 0.0.0.0:7860 that serves the UI and proxies requests to SIE via SSE.

Both are baked into one image extending ghcr.io/superlinked/sie-server:latest-cpu-transformers5. HF Spaces' persistent /data directory is used as the HuggingFace cache so model weights survive Space restarts.

Model lineup

Stage Default Alternates (lazy-load on click)
Recognition lightonai/LightOnOCR-2-1B (2.1B, Markdown output) PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image)
Structured naver-clova-ix/donut-base-finetuned-cord-v2 Donut-DocVQA, Donut-RVLCDIP
NER urchade/gliner_multi-v2.1 GLiNER-large, GLiNER-PII, NuNER-Zero

The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB). Alternates lazy-load on first click.

What SIE provides here

Three different model architectures, one API:

client.extract(model_id, { images: [bytes] })

The model ID alone decides whether you get VLM Markdown (LightOnOCR), structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate auth, no separate rate limit, no separate deployment story.

Source

Built from the document-ocr demo in superlinked/brave-new-demos. The local-Docker version uses docker compose against the same upstream SIE image; this Space packages everything into one container for HF.

Performance note

This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click for each model is a cold load (60-180 s) while weights download and the adapter spins up. Subsequent clicks reuse the cached weights and run in 20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the heavier models like GLM-OCR become tractable.

The SIE image this Space runs on is latest-cpu-transformers5, where the LightOnOCR adapter lives. Florence-2 ships in the sibling default bundle (which pins transformers<5) and is not available on this image; see sie-internal#828 for the bundle-composition story.

Built on SIE (Apache 2.0).