Filip Makraduli commited on
Commit
4e0f10e
·
1 Parent(s): 0df0841

Switch to transformers5 SIE image; LightOnOCR as default recognition

Browse files

- Base image: latest-cpu-default → latest-cpu-transformers5 (where the
LightOnOCR adapter actually lives, fixes the ImportError on Space boot)
- Recognition default: Florence-2-base-ft → LightOnOCR-2-1B (Florence-2
isn't loadable on transformers5 due to tokenizer API change; tracked in
sie-internal#828)
- Drop Florence-2 entries from the dropdown so users don't click into an
error
- Expand alternates: 3 Donut variants, 4 NER models (GLiNER multi/large/
PII + NuNER Zero)
- Swap sample images: replace dense table/handwritten/multi-column with
event-poster/slide/letter to better suit the new recognition model
- Server: /api/health now reports cuda availability; UI auto-disables
gpuRequired models on the CPU image

Dockerfile CHANGED
@@ -14,7 +14,7 @@
14
  FROM node:22-bookworm-slim AS node
15
 
16
  # --- stage 2: final image ---
17
- FROM ghcr.io/superlinked/sie-server:latest-cpu-default
18
 
19
  USER root
20
 
 
14
  FROM node:22-bookworm-slim AS node
15
 
16
  # --- stage 2: final image ---
17
+ FROM ghcr.io/superlinked/sie-server:latest-cpu-transformers5
18
 
19
  USER root
20
 
README.md CHANGED
@@ -20,25 +20,25 @@ SIE hot-swap them with one identifier change.
20
 
21
  A single Docker container with two processes:
22
 
23
- - `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, preloading
24
- three small models from the default bundle at boot.
25
  - A small Node web server on `0.0.0.0:7860` that serves the UI and
26
  proxies requests to SIE via SSE.
27
 
28
- Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-default`.
29
  HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
30
  model weights survive Space restarts.
31
 
32
  ## Model lineup
33
 
34
- | Stage | Default (preloaded) | Alternates (lazy-load on click) |
35
  |---|---|---|
36
- | Recognition | `microsoft/Florence-2-base` (270M) | Florence-2-large, LightOnOCR-2-1B, GLM-OCR, PaddleOCR-VL |
37
- | Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA |
38
- | NER | `urchade/gliner_multi-v2.1` | GLiNER-large |
39
 
40
- The default trio is ~1 GB total. Alternates are listed in the dropdowns but
41
- only load when first clicked; some are GPU-only.
42
 
43
  ## What SIE provides here
44
 
@@ -48,9 +48,9 @@ Three different model architectures, one API:
48
  client.extract(model_id, { images: [bytes] })
49
  ```
50
 
51
- The model ID alone decides whether you get VLM Markdown (Florence-2,
52
- LightOnOCR), structured JSON (Donut), or typed entities (GLiNER). No
53
- separate auth, no separate rate limit, no separate deployment story.
54
 
55
  ## Source
56
 
@@ -61,9 +61,16 @@ SIE image; this Space packages everything into one container for HF.
61
 
62
  ## Performance note
63
 
64
- This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). Per-sample
65
- latency is in the 60-90 s range on the default Florence-2 + Donut + GLiNER
66
- trio; recognition is the slow step. On a GPU Space (paid), Florence-2 drops
67
- to a few seconds and the heavier models like GLM-OCR become tractable.
 
 
 
 
 
 
 
68
 
69
  Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).
 
20
 
21
  A single Docker container with two processes:
22
 
23
+ - `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, no preload
24
+ (lazy-loads models on first click to fit free-tier memory).
25
  - A small Node web server on `0.0.0.0:7860` that serves the UI and
26
  proxies requests to SIE via SSE.
27
 
28
+ Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-transformers5`.
29
  HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
30
  model weights survive Space restarts.
31
 
32
  ## Model lineup
33
 
34
+ | Stage | Default | Alternates (lazy-load on click) |
35
  |---|---|---|
36
+ | Recognition | `lightonai/LightOnOCR-2-1B` (2.1B, Markdown output) | PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) |
37
+ | Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA, Donut-RVLCDIP |
38
+ | NER | `urchade/gliner_multi-v2.1` | GLiNER-large, GLiNER-PII, NuNER-Zero |
39
 
40
+ The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB).
41
+ Alternates lazy-load on first click.
42
 
43
  ## What SIE provides here
44
 
 
48
  client.extract(model_id, { images: [bytes] })
49
  ```
50
 
51
+ The model ID alone decides whether you get VLM Markdown (LightOnOCR),
52
+ structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate
53
+ auth, no separate rate limit, no separate deployment story.
54
 
55
  ## Source
56
 
 
61
 
62
  ## Performance note
63
 
64
+ This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click
65
+ for each model is a cold load (60-180 s) while weights download and the
66
+ adapter spins up. Subsequent clicks reuse the cached weights and run in
67
+ 20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the
68
+ heavier models like GLM-OCR become tractable.
69
+
70
+ The SIE image this Space runs on is `latest-cpu-transformers5`, where the
71
+ LightOnOCR adapter lives. Florence-2 ships in the sibling `default`
72
+ bundle (which pins `transformers<5`) and is not available on this image;
73
+ see [sie-internal#828](https://github.com/superlinked/sie-internal/issues/828)
74
+ for the bundle-composition story.
75
 
76
  Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).
data/samples/event-poster.png ADDED
data/samples/handwritten.png DELETED
Binary file (21.9 kB)
 
data/samples/index.json CHANGED
@@ -3,7 +3,7 @@
3
  "id": "receipt",
4
  "filename": "receipt.png",
5
  "label": "Grocery receipt",
6
- "description": "Printed receipt with line items, subtotal, tax, total. Clean text, simple table layout.",
7
  "labels": [
8
  "merchant",
9
  "date",
@@ -18,7 +18,7 @@
18
  "id": "invoice",
19
  "filename": "invoice.png",
20
  "label": "Vendor invoice",
21
- "description": "Multi-column invoice with billing party, line items, subtotal, tax, total. Form-style layout.",
22
  "labels": [
23
  "vendor",
24
  "invoice_number",
@@ -33,7 +33,7 @@
33
  "id": "business-card",
34
  "filename": "business-card.png",
35
  "label": "Business card",
36
- "description": "Tight layout, mixed text sizes, multiple contact fields.",
37
  "labels": [
38
  "company",
39
  "person",
@@ -45,40 +45,43 @@
45
  ]
46
  },
47
  {
48
- "id": "table",
49
- "filename": "table.png",
50
- "label": "Quarterly table",
51
- "description": "Dense numerical table with totals row. Tests table-structure recognition.",
52
  "labels": [
53
- "department",
54
- "headcount",
55
- "amount",
56
- "category"
 
 
57
  ]
58
  },
59
  {
60
- "id": "handwritten",
61
- "filename": "handwritten.png",
62
- "label": "Casual notes",
63
- "description": "Jittered text simulating informal handwriting; non-template content.",
64
  "labels": [
65
- "task",
66
  "person",
67
- "place",
68
- "amount"
69
  ]
70
  },
71
  {
72
- "id": "multi-column",
73
- "filename": "multi-column.png",
74
- "label": "Newspaper page",
75
- "description": "Two-column newspaper-style layout. Reading order matters.",
76
  "labels": [
77
- "headline",
78
  "person",
79
- "organization",
80
- "place",
81
- "date"
 
82
  ]
83
  }
84
  ]
 
3
  "id": "receipt",
4
  "filename": "receipt.png",
5
  "label": "Grocery receipt",
6
+ "description": "Printed receipt with line items, subtotal, tax, total. Donut on CORD reads this end-to-end.",
7
  "labels": [
8
  "merchant",
9
  "date",
 
18
  "id": "invoice",
19
  "filename": "invoice.png",
20
  "label": "Vendor invoice",
21
+ "description": "Multi-column invoice with billing party, line items, subtotal, tax, total.",
22
  "labels": [
23
  "vendor",
24
  "invoice_number",
 
33
  "id": "business-card",
34
  "filename": "business-card.png",
35
  "label": "Business card",
36
+ "description": "Tight layout, mixed text sizes, multiple contact fields. Good NER showcase.",
37
  "labels": [
38
  "company",
39
  "person",
 
45
  ]
46
  },
47
  {
48
+ "id": "event-poster",
49
+ "filename": "event-poster.png",
50
+ "label": "Event poster",
51
+ "description": "Large-text poster with title, date, artists, ticket info. Florence-2 OCR's home turf.",
52
  "labels": [
53
+ "event",
54
+ "date",
55
+ "venue",
56
+ "artist",
57
+ "price",
58
+ "organization"
59
  ]
60
  },
61
  {
62
+ "id": "slide",
63
+ "filename": "slide.png",
64
+ "label": "Presentation slide",
65
+ "description": "Roadmap slide with title and three numbered items. Clean printed text on a single background.",
66
  "labels": [
67
+ "initiative",
68
  "person",
69
+ "date",
70
+ "quarter"
71
  ]
72
  },
73
  {
74
+ "id": "letter",
75
+ "filename": "letter.png",
76
+ "label": "Business letter",
77
+ "description": "Short printed business letter with sender, date, recipient, body, and signature.",
78
  "labels": [
79
+ "company",
80
  "person",
81
+ "address",
82
+ "date",
83
+ "amount",
84
+ "phone"
85
  ]
86
  }
87
  ]
data/samples/letter.png ADDED
data/samples/multi-column.png DELETED
Binary file (43.5 kB)
 
data/samples/slide.png ADDED
data/samples/table.png DELETED
Binary file (28.1 kB)
 
hf-entrypoint.sh CHANGED
@@ -10,7 +10,7 @@ set -euo pipefail
10
  #
11
  # Override via the PRELOAD env var in the Space's Settings if you upgrade to
12
  # CPU-Upgrade (32 GB) or a GPU tier:
13
- # PRELOAD="microsoft/Florence-2-base,naver-clova-ix/donut-base-finetuned-cord-v2,urchade/gliner_multi-v2.1"
14
  PRELOAD="${PRELOAD:-}"
15
 
16
  SIE_ARGS=(serve --host 127.0.0.1 --port 8080)
 
10
  #
11
  # Override via the PRELOAD env var in the Space's Settings if you upgrade to
12
  # CPU-Upgrade (32 GB) or a GPU tier:
13
+ # PRELOAD="microsoft/Florence-2-base-ft,naver-clova-ix/donut-base-finetuned-cord-v2,urchade/gliner_multi-v2.1"
14
  PRELOAD="${PRELOAD:-}"
15
 
16
  SIE_ARGS=(serve --host 127.0.0.1 --port 8080)
src/config.ts CHANGED
@@ -13,29 +13,17 @@ export const RECOGNITION_MODELS: ModelOption[] = [
13
  label: "LightOnOCR-2-1B (default)",
14
  description: "Pixtral encoder + Qwen3 decoder, 2.1B. Strong Markdown output across dense layouts. ~4 GB to download on first call.",
15
  },
16
- {
17
- id: "microsoft/Florence-2-base",
18
- label: "Florence-2-base (small, fast)",
19
- description: "Microsoft DaViT + decoder, 270M. Fast on CPU but terse on dense layouts; better on multi-column text.",
20
- options: { task: "<OCR>" },
21
- },
22
- {
23
- id: "microsoft/Florence-2-large",
24
- label: "Florence-2-large",
25
- description: "Larger Florence-2 variant, 770M. Better than Florence-2-base but still leans terse on receipts.",
26
- options: { task: "<OCR>" },
27
- },
28
  {
29
  id: "PaddlePaddle/PaddleOCR-VL-1.5",
30
  label: "PaddleOCR-VL-1.5 (GPU image)",
31
- description: "Paddle's VLM-OCR, 1.5B. Six task modes. Available on the CUDA image.",
32
  options: { task: "ocr" },
33
  gpuRequired: true,
34
  },
35
  {
36
  id: "zai-org/GLM-OCR",
37
  label: "GLM-OCR (GPU only)",
38
- description: "CogViT + GLM-0.5B decoder, 9B in bfloat16. Premium quality, needs ~18 GB VRAM.",
39
  gpuRequired: true,
40
  },
41
  ];
@@ -51,6 +39,11 @@ export const STRUCTURED_MODELS: ModelOption[] = [
51
  label: "Donut on DocVQA",
52
  description: "Same Donut architecture, fine-tuned for visual question answering. Returns text answers.",
53
  },
 
 
 
 
 
54
  ];
55
 
56
  export const NER_MODELS: ModelOption[] = [
@@ -64,6 +57,16 @@ export const NER_MODELS: ModelOption[] = [
64
  label: "GLiNER large (English)",
65
  description: "440M, English-focused, higher quality on English text.",
66
  },
 
 
 
 
 
 
 
 
 
 
67
  ];
68
 
69
  export const config = {
 
13
  label: "LightOnOCR-2-1B (default)",
14
  description: "Pixtral encoder + Qwen3 decoder, 2.1B. Strong Markdown output across dense layouts. ~4 GB to download on first call.",
15
  },
 
 
 
 
 
 
 
 
 
 
 
 
16
  {
17
  id: "PaddlePaddle/PaddleOCR-VL-1.5",
18
  label: "PaddleOCR-VL-1.5 (GPU image)",
19
+ description: "Paddle's VLM-OCR, 1.5B. Six task modes. Available on the CUDA image (compose.gpu.yml).",
20
  options: { task: "ocr" },
21
  gpuRequired: true,
22
  },
23
  {
24
  id: "zai-org/GLM-OCR",
25
  label: "GLM-OCR (GPU only)",
26
+ description: "CogViT + GLM-0.5B decoder, 9B in bfloat16. Premium quality, needs ~18 GB VRAM (compose.gpu.yml).",
27
  gpuRequired: true,
28
  },
29
  ];
 
39
  label: "Donut on DocVQA",
40
  description: "Same Donut architecture, fine-tuned for visual question answering. Returns text answers.",
41
  },
42
+ {
43
+ id: "naver-clova-ix/donut-base-finetuned-rvlcdip",
44
+ label: "Donut on RVL-CDIP (doc classification)",
45
+ description: "Same Donut architecture, fine-tuned for document-type classification across 16 classes (invoice, receipt, form, ...).",
46
+ },
47
  ];
48
 
49
  export const NER_MODELS: ModelOption[] = [
 
57
  label: "GLiNER large (English)",
58
  description: "440M, English-focused, higher quality on English text.",
59
  },
60
+ {
61
+ id: "urchade/gliner_multi_pii-v1",
62
+ label: "GLiNER multi PII",
63
+ description: "GLiNER fine-tuned for PII extraction. Good for redaction-style pipelines on documents.",
64
+ },
65
+ {
66
+ id: "numind/NuNER_Zero",
67
+ label: "NuNER Zero",
68
+ description: "NuMind's zero-shot NER. Different architecture from GLiNER; useful for comparing zero-shot NER families on the same input text.",
69
+ },
70
  ];
71
 
72
  export const config = {
web/public/app.js CHANGED
@@ -25,6 +25,7 @@ let donutBuf = { entities: [], data: null };
25
  let glinerBuf = [];
26
  let modelConfig = null;
27
  let registeredSet = new Set();
 
28
 
29
  function setBadge(text, cls) {
30
  els.badge.textContent = text;
@@ -98,12 +99,15 @@ function populateDropdown(selectEl, options, defaultId) {
98
  for (const opt of options) {
99
  const node = document.createElement("option");
100
  node.value = opt.id;
101
- const available =
102
- registeredSet.size === 0 || registeredSet.has(opt.id);
 
103
  const labelSuffix = !available
104
- ? opt.gpuRequired
105
  ? " (GPU image needed)"
106
- : " (not registered)"
 
 
107
  : "";
108
  node.textContent = opt.label + labelSuffix;
109
  if (!available) node.disabled = true;
@@ -264,6 +268,7 @@ async function init() {
264
  } else {
265
  els.sieState.textContent = `SIE healthy · ${j.registeredModels} models registered`;
266
  registered = j.registered ?? [];
 
267
  }
268
  } catch {
269
  els.sieState.textContent = "could not reach the local server";
 
25
  let glinerBuf = [];
26
  let modelConfig = null;
27
  let registeredSet = new Set();
28
+ let cudaAvailable = false;
29
 
30
  function setBadge(text, cls) {
31
  els.badge.textContent = text;
 
99
  for (const opt of options) {
100
  const node = document.createElement("option");
101
  node.value = opt.id;
102
+ const inCatalog = registeredSet.size === 0 || registeredSet.has(opt.id);
103
+ const blockedByCuda = opt.gpuRequired && !cudaAvailable;
104
+ const available = inCatalog && !blockedByCuda;
105
  const labelSuffix = !available
106
+ ? blockedByCuda
107
  ? " (GPU image needed)"
108
+ : opt.gpuRequired
109
+ ? " (GPU image needed)"
110
+ : " (not registered)"
111
  : "";
112
  node.textContent = opt.label + labelSuffix;
113
  if (!available) node.disabled = true;
 
268
  } else {
269
  els.sieState.textContent = `SIE healthy · ${j.registeredModels} models registered`;
270
  registered = j.registered ?? [];
271
+ cudaAvailable = !!j.cuda;
272
  }
273
  } catch {
274
  els.sieState.textContent = "could not reach the local server";
web/public/index.html CHANGED
@@ -46,7 +46,7 @@
46
  <span>↓</span><span>↓</span><span>↓</span>
47
  </div>
48
  <div class="diagram-models">
49
- <div class="diagram-box diagram-recognition">VLM-OCR<br><span>(Florence-2, LightOnOCR, GLM-OCR, ...)</span></div>
50
  <div class="diagram-box diagram-structured">Donut<br><span>(end-to-end JSON)</span></div>
51
  <div class="diagram-box diagram-ner">GLiNER<br><span>(zero-shot NER)</span></div>
52
  </div>
 
46
  <span>↓</span><span>↓</span><span>↓</span>
47
  </div>
48
  <div class="diagram-models">
49
+ <div class="diagram-box diagram-recognition">VLM-OCR<br><span>(LightOnOCR-2-1B, PaddleOCR-VL, GLM-OCR)</span></div>
50
  <div class="diagram-box diagram-structured">Donut<br><span>(end-to-end JSON)</span></div>
51
  <div class="diagram-box diagram-ner">GLiNER<br><span>(zero-shot NER)</span></div>
52
  </div>
web/server.ts CHANGED
@@ -49,14 +49,20 @@ function setupSse(res: http.ServerResponse) {
49
  };
50
  }
51
 
52
- async function fetchModels(): Promise<{ ok: boolean; names: string[] }> {
53
  try {
54
  const r = await fetch(`${config.sieUrl}/v1/models`, { signal: AbortSignal.timeout(3000) });
55
- if (!r.ok) return { ok: false, names: [] };
56
- const json = (await r.json()) as { models?: { name: string }[] };
57
- return { ok: true, names: (json.models ?? []).map((m) => m.name) };
 
 
 
 
 
 
58
  } catch {
59
- return { ok: false, names: [] };
60
  }
61
  }
62
 
@@ -129,7 +135,7 @@ const server = http.createServer(async (req, res) => {
129
  if (p.startsWith("/samples/")) return serveFile(res, path.join(SAMPLE_DIR, p.slice("/samples/".length)));
130
 
131
  if (p === "/api/health") {
132
- const { ok, names } = await fetchModels();
133
  return send(
134
  res,
135
  200,
@@ -138,6 +144,7 @@ const server = http.createServer(async (req, res) => {
138
  sieUrl: config.sieUrl,
139
  registeredModels: names.length,
140
  registered: names,
 
141
  }),
142
  "application/json",
143
  );
 
49
  };
50
  }
51
 
52
+ async function fetchModels(): Promise<{ ok: boolean; names: string[]; cuda: boolean }> {
53
  try {
54
  const r = await fetch(`${config.sieUrl}/v1/models`, { signal: AbortSignal.timeout(3000) });
55
+ if (!r.ok) return { ok: false, names: [], cuda: false };
56
+ const json = (await r.json()) as {
57
+ models?: { name: string; device?: string; state?: string }[];
58
+ };
59
+ const models = json.models ?? [];
60
+ // GPU compose preloads GPU-only models. If any catalog entry is currently
61
+ // loaded on a non-cpu device, treat this server as GPU-capable.
62
+ const cuda = models.some((m) => (m.device ?? "").toLowerCase().includes("cuda"));
63
+ return { ok: true, names: models.map((m) => m.name), cuda };
64
  } catch {
65
+ return { ok: false, names: [], cuda: false };
66
  }
67
  }
68
 
 
135
  if (p.startsWith("/samples/")) return serveFile(res, path.join(SAMPLE_DIR, p.slice("/samples/".length)));
136
 
137
  if (p === "/api/health") {
138
+ const { ok, names, cuda } = await fetchModels();
139
  return send(
140
  res,
141
  200,
 
144
  sieUrl: config.sieUrl,
145
  registeredModels: names.length,
146
  registered: names,
147
+ cuda,
148
  }),
149
  "application/json",
150
  );