--- title: Document OCR emoji: 📄 colorFrom: purple colorTo: blue sdk: docker app_port: 7860 pinned: false short_description: Multi-model OCR pipeline running on SIE --- # Document OCR A small Hugging Face Space that runs three different OCR-class models through one [SIE](https://github.com/superlinked/sie) inference engine. Pick a sample document on the left, swap any of the three models in the dropdowns, watch SIE hot-swap them with one identifier change. ## What runs in this Space A single Docker container with two processes: - `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, no preload (lazy-loads models on first click to fit free-tier memory). - A small Node web server on `0.0.0.0:7860` that serves the UI and proxies requests to SIE via SSE. Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-transformers5`. HF Spaces' persistent `/data` directory is used as the HuggingFace cache so model weights survive Space restarts. ## Model lineup | Stage | Default | Alternates (lazy-load on click) | |---|---|---| | Recognition | `lightonai/LightOnOCR-2-1B` (2.1B, Markdown output) | PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) | | Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA, Donut-RVLCDIP | | NER | `urchade/gliner_multi-v2.1` | GLiNER-large, GLiNER-PII, NuNER-Zero | The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB). Alternates lazy-load on first click. ## What SIE provides here Three different model architectures, one API: ``` client.extract(model_id, { images: [bytes] }) ``` The model ID alone decides whether you get VLM Markdown (LightOnOCR), structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate auth, no separate rate limit, no separate deployment story. ## Source Built from the `document-ocr` demo in [superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr). The local-Docker version uses `docker compose` against the same upstream SIE image; this Space packages everything into one container for HF. ## Performance note This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click for each model is a cold load (60-180 s) while weights download and the adapter spins up. Subsequent clicks reuse the cached weights and run in 20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the heavier models like GLM-OCR become tractable. The SIE image this Space runs on is `latest-cpu-transformers5`, where the LightOnOCR adapter lives. Florence-2 ships in the sibling `default` bundle (which pins `transformers<5`) and is not available on this image; see [sie-internal#828](https://github.com/superlinked/sie-internal/issues/828) for the bundle-composition story. Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).