File size: 2,874 Bytes
c1df4e3
ffe59ba
 
 
c1df4e3
 
ffe59ba
c1df4e3
ffe59ba
c1df4e3
 
ffe59ba
 
 
 
 
 
 
 
 
 
 
4e0f10e
 
ffe59ba
 
 
4e0f10e
ffe59ba
 
 
 
 
4e0f10e
ffe59ba
4e0f10e
 
 
ffe59ba
4e0f10e
 
ffe59ba
 
 
 
 
 
 
 
 
4e0f10e
 
 
ffe59ba
 
 
 
 
 
 
 
 
 
4e0f10e
 
 
 
 
 
 
 
 
 
 
ffe59ba
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
title: Document OCR
emoji: 📄
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
short_description: Multi-model OCR pipeline running on SIE
---

# Document OCR

A small Hugging Face Space that runs three different OCR-class models through one
[SIE](https://github.com/superlinked/sie) inference engine. Pick a sample
document on the left, swap any of the three models in the dropdowns, watch
SIE hot-swap them with one identifier change.

## What runs in this Space

A single Docker container with two processes:

- `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, no preload
  (lazy-loads models on first click to fit free-tier memory).
- A small Node web server on `0.0.0.0:7860` that serves the UI and
  proxies requests to SIE via SSE.

Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-transformers5`.
HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
model weights survive Space restarts.

## Model lineup

| Stage | Default | Alternates (lazy-load on click) |
|---|---|---|
| Recognition | `lightonai/LightOnOCR-2-1B` (2.1B, Markdown output) | PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) |
| Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA, Donut-RVLCDIP |
| NER | `urchade/gliner_multi-v2.1` | GLiNER-large, GLiNER-PII, NuNER-Zero |

The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB).
Alternates lazy-load on first click.

## What SIE provides here

Three different model architectures, one API:

```
client.extract(model_id, { images: [bytes] })
```

The model ID alone decides whether you get VLM Markdown (LightOnOCR),
structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate
auth, no separate rate limit, no separate deployment story.

## Source

Built from the `document-ocr` demo in
[superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr).
The local-Docker version uses `docker compose` against the same upstream
SIE image; this Space packages everything into one container for HF.

## Performance note

This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click
for each model is a cold load (60-180 s) while weights download and the
adapter spins up. Subsequent clicks reuse the cached weights and run in
20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the
heavier models like GLM-OCR become tractable.

The SIE image this Space runs on is `latest-cpu-transformers5`, where the
LightOnOCR adapter lives. Florence-2 ships in the sibling `default`
bundle (which pins `transformers<5`) and is not available on this image;
see [sie-internal#828](https://github.com/superlinked/sie-internal/issues/828)
for the bundle-composition story.

Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).