Spaces:
Running
Running
| title: Document OCR | |
| emoji: ๐ | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| short_description: Multi-model OCR pipeline running on SIE | |
| # Document OCR | |
| A small Hugging Face Space that runs three different OCR-class models through one | |
| [SIE](https://github.com/superlinked/sie) inference engine. Pick a sample | |
| document on the left, swap any of the three models in the dropdowns, watch | |
| SIE hot-swap them with one identifier change. | |
| ## What runs in this Space | |
| A single Docker container with two processes: | |
| - `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, no preload | |
| (lazy-loads models on first click to fit free-tier memory). | |
| - A small Node web server on `0.0.0.0:7860` that serves the UI and | |
| proxies requests to SIE via SSE. | |
| Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-transformers5`. | |
| HF Spaces' persistent `/data` directory is used as the HuggingFace cache so | |
| model weights survive Space restarts. | |
| ## Model lineup | |
| | Stage | Default | Alternates (lazy-load on click) | | |
| |---|---|---| | |
| | Recognition | `lightonai/LightOnOCR-2-1B` (2.1B, Markdown output) | PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) | | |
| | Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA, Donut-RVLCDIP | | |
| | NER | `urchade/gliner_multi-v2.1` | GLiNER-large, GLiNER-PII, NuNER-Zero | | |
| The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB). | |
| Alternates lazy-load on first click. | |
| ## What SIE provides here | |
| Three different model architectures, one API: | |
| ``` | |
| client.extract(model_id, { images: [bytes] }) | |
| ``` | |
| The model ID alone decides whether you get VLM Markdown (LightOnOCR), | |
| structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate | |
| auth, no separate rate limit, no separate deployment story. | |
| ## Source | |
| Built from the `document-ocr` demo in | |
| [superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr). | |
| The local-Docker version uses `docker compose` against the same upstream | |
| SIE image; this Space packages everything into one container for HF. | |
| ## Performance note | |
| This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click | |
| for each model is a cold load (60-180 s) while weights download and the | |
| adapter spins up. Subsequent clicks reuse the cached weights and run in | |
| 20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the | |
| heavier models like GLM-OCR become tractable. | |
| The SIE image this Space runs on is `latest-cpu-transformers5`, where the | |
| LightOnOCR adapter lives. Florence-2 ships in the sibling `default` | |
| bundle (which pins `transformers<5`) and is not available on this image; | |
| see [sie-internal#828](https://github.com/superlinked/sie-internal/issues/828) | |
| for the bundle-composition story. | |
| Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0). | |