Spaces:

superlinked
/

document-ocr

Running

Filip Makraduli commited on 5 days ago

Commit

4e0f10e

1 Parent(s): 0df0841

Switch to transformers5 SIE image; LightOnOCR as default recognition

- Base image: latest-cpu-default → latest-cpu-transformers5 (where the
LightOnOCR adapter actually lives, fixes the ImportError on Space boot)
- Recognition default: Florence-2-base-ft → LightOnOCR-2-1B (Florence-2
isn't loadable on transformers5 due to tokenizer API change; tracked in
sie-internal#828)
- Drop Florence-2 entries from the dropdown so users don't click into an
error
- Expand alternates: 3 Donut variants, 4 NER models (GLiNER multi/large/
PII + NuNER Zero)
- Swap sample images: replace dense table/handwritten/multi-column with
event-poster/slide/letter to better suit the new recognition model
- Server: /api/health now reports cuda availability; UI auto-disables
gpuRequired models on the CPU image

Files changed (14) hide show

Dockerfile +1 -1
README.md +23 -16
data/samples/event-poster.png +0 -0
data/samples/handwritten.png +0 -0
data/samples/index.json +29 -26
data/samples/letter.png +0 -0
data/samples/multi-column.png +0 -0
data/samples/slide.png +0 -0
data/samples/table.png +0 -0
hf-entrypoint.sh +1 -1
src/config.ts +17 -14
web/public/app.js +9 -4
web/public/index.html +1 -1
web/server.ts +13 -6

Dockerfile CHANGED Viewed

@@ -14,7 +14,7 @@
 FROM node:22-bookworm-slim AS node
 # --- stage 2: final image ---
-FROM ghcr.io/superlinked/sie-server:latest-cpu-default
 USER root

 FROM node:22-bookworm-slim AS node
 # --- stage 2: final image ---
+FROM ghcr.io/superlinked/sie-server:latest-cpu-transformers5
 USER root

README.md CHANGED Viewed

@@ -20,25 +20,25 @@ SIE hot-swap them with one identifier change.
 A single Docker container with two processes:
-- `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, preloading
-  three small models from the default bundle at boot.
 - A small Node web server on `0.0.0.0:7860` that serves the UI and
   proxies requests to SIE via SSE.
-Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-default`.
 HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
 model weights survive Space restarts.
 ## Model lineup
-| Stage | Default (preloaded) | Alternates (lazy-load on click) |
 |---|---|---|
-| Recognition | `microsoft/Florence-2-base` (270M) | Florence-2-large, LightOnOCR-2-1B, GLM-OCR, PaddleOCR-VL |
-| Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA |
-| NER | `urchade/gliner_multi-v2.1` | GLiNER-large |
-The default trio is ~1 GB total. Alternates are listed in the dropdowns but
-only load when first clicked; some are GPU-only.
 ## What SIE provides here
@@ -48,9 +48,9 @@ Three different model architectures, one API:
 client.extract(model_id, { images: [bytes] })
 ```
-The model ID alone decides whether you get VLM Markdown (Florence-2,
-LightOnOCR), structured JSON (Donut), or typed entities (GLiNER). No
-separate auth, no separate rate limit, no separate deployment story.
 ## Source
@@ -61,9 +61,16 @@ SIE image; this Space packages everything into one container for HF.
 ## Performance note
-This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). Per-sample
-latency is in the 60-90 s range on the default Florence-2 + Donut + GLiNER
-trio; recognition is the slow step. On a GPU Space (paid), Florence-2 drops
-to a few seconds and the heavier models like GLM-OCR become tractable.
 Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).

 A single Docker container with two processes:
+- `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, no preload
+  (lazy-loads models on first click to fit free-tier memory).
 - A small Node web server on `0.0.0.0:7860` that serves the UI and
   proxies requests to SIE via SSE.
+Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-transformers5`.
 HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
 model weights survive Space restarts.
 ## Model lineup
+| Stage | Default | Alternates (lazy-load on click) |
 |---|---|---|
+| Recognition | `lightonai/LightOnOCR-2-1B` (2.1B, Markdown output) | PaddleOCR-VL, GLM-OCR (GPU-only; disabled on the CPU image) |
+| Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA, Donut-RVLCDIP |
+| NER | `urchade/gliner_multi-v2.1` | GLiNER-large, GLiNER-PII, NuNER-Zero |
+The default trio is ~5 GB total (LightOnOCR is the big one at ~4 GB).
+Alternates lazy-load on first click.
 ## What SIE provides here
 client.extract(model_id, { images: [bytes] })
 ```
+The model ID alone decides whether you get VLM Markdown (LightOnOCR),
+structured JSON (Donut), or typed entities (GLiNER / NuNER). No separate
+auth, no separate rate limit, no separate deployment story.
 ## Source
 ## Performance note
+This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). The first click
+for each model is a cold load (60-180 s) while weights download and the
+adapter spins up. Subsequent clicks reuse the cached weights and run in
+20-30 s. On a GPU Space (paid), recognition drops to a few seconds and the
+heavier models like GLM-OCR become tractable.
+The SIE image this Space runs on is `latest-cpu-transformers5`, where the
+LightOnOCR adapter lives. Florence-2 ships in the sibling `default`
+bundle (which pins `transformers<5`) and is not available on this image;
+see [sie-internal#828](https://github.com/superlinked/sie-internal/issues/828)
+for the bundle-composition story.
 Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).

data/samples/event-poster.png ADDED Viewed

data/samples/handwritten.png DELETED Viewed

Binary file (21.9 kB)

data/samples/index.json CHANGED Viewed

@@ -3,7 +3,7 @@
     "id": "receipt",
     "filename": "receipt.png",
     "label": "Grocery receipt",
-    "description": "Printed receipt with line items, subtotal, tax, total. Clean text, simple table layout.",
     "labels": [
       "merchant",
       "date",
@@ -18,7 +18,7 @@
     "id": "invoice",
     "filename": "invoice.png",
     "label": "Vendor invoice",
-    "description": "Multi-column invoice with billing party, line items, subtotal, tax, total. Form-style layout.",
     "labels": [
       "vendor",
       "invoice_number",
@@ -33,7 +33,7 @@
     "id": "business-card",
     "filename": "business-card.png",
     "label": "Business card",
-    "description": "Tight layout, mixed text sizes, multiple contact fields.",
     "labels": [
       "company",
       "person",
@@ -45,40 +45,43 @@
     ]
   },
   {
-    "id": "table",
-    "filename": "table.png",
-    "label": "Quarterly table",
-    "description": "Dense numerical table with totals row. Tests table-structure recognition.",
     "labels": [
-      "department",
-      "headcount",
-      "amount",
-      "category"
     ]
   },
   {
-    "id": "handwritten",
-    "filename": "handwritten.png",
-    "label": "Casual notes",
-    "description": "Jittered text simulating informal handwriting; non-template content.",
     "labels": [
-      "task",
       "person",
-      "place",
-      "amount"
     ]
   },
   {
-    "id": "multi-column",
-    "filename": "multi-column.png",
-    "label": "Newspaper page",
-    "description": "Two-column newspaper-style layout. Reading order matters.",
     "labels": [
-      "headline",
       "person",
-      "organization",
-      "place",
-      "date"
     ]
   }
 ]

     "id": "receipt",
     "filename": "receipt.png",
     "label": "Grocery receipt",
+    "description": "Printed receipt with line items, subtotal, tax, total. Donut on CORD reads this end-to-end.",
     "labels": [
       "merchant",
       "date",
     "id": "invoice",
     "filename": "invoice.png",
     "label": "Vendor invoice",
+    "description": "Multi-column invoice with billing party, line items, subtotal, tax, total.",
     "labels": [
       "vendor",
       "invoice_number",
     "id": "business-card",
     "filename": "business-card.png",
     "label": "Business card",
+    "description": "Tight layout, mixed text sizes, multiple contact fields. Good NER showcase.",
     "labels": [
       "company",
       "person",
     ]
   },
   {
+    "id": "event-poster",
+    "filename": "event-poster.png",
+    "label": "Event poster",
+    "description": "Large-text poster with title, date, artists, ticket info. Florence-2 OCR's home turf.",
     "labels": [
+      "event",
+      "date",
+      "venue",
+      "artist",
+      "price",
+      "organization"
     ]
   },
   {
+    "id": "slide",
+    "filename": "slide.png",
+    "label": "Presentation slide",
+    "description": "Roadmap slide with title and three numbered items. Clean printed text on a single background.",
     "labels": [
+      "initiative",
       "person",
+      "date",
+      "quarter"
     ]
   },
   {
+    "id": "letter",
+    "filename": "letter.png",
+    "label": "Business letter",
+    "description": "Short printed business letter with sender, date, recipient, body, and signature.",
     "labels": [
+      "company",
       "person",
+      "address",
+      "date",
+      "amount",
+      "phone"
     ]
   }
 ]

data/samples/letter.png ADDED Viewed

data/samples/multi-column.png DELETED Viewed

Binary file (43.5 kB)

data/samples/slide.png ADDED Viewed

data/samples/table.png DELETED Viewed

Binary file (28.1 kB)

hf-entrypoint.sh CHANGED Viewed

@@ -10,7 +10,7 @@ set -euo pipefail
 #
 # Override via the PRELOAD env var in the Space's Settings if you upgrade to
 # CPU-Upgrade (32 GB) or a GPU tier:
-#   PRELOAD="microsoft/Florence-2-base,naver-clova-ix/donut-base-finetuned-cord-v2,urchade/gliner_multi-v2.1"
 PRELOAD="${PRELOAD:-}"
 SIE_ARGS=(serve --host 127.0.0.1 --port 8080)

 #
 # Override via the PRELOAD env var in the Space's Settings if you upgrade to
 # CPU-Upgrade (32 GB) or a GPU tier:
+#   PRELOAD="microsoft/Florence-2-base-ft,naver-clova-ix/donut-base-finetuned-cord-v2,urchade/gliner_multi-v2.1"
 PRELOAD="${PRELOAD:-}"
 SIE_ARGS=(serve --host 127.0.0.1 --port 8080)

src/config.ts CHANGED Viewed

@@ -13,29 +13,17 @@ export const RECOGNITION_MODELS: ModelOption[] = [
     label: "LightOnOCR-2-1B (default)",
     description: "Pixtral encoder + Qwen3 decoder, 2.1B. Strong Markdown output across dense layouts. ~4 GB to download on first call.",
   },
-  {
-    id: "microsoft/Florence-2-base",
-    label: "Florence-2-base (small, fast)",
-    description: "Microsoft DaViT + decoder, 270M. Fast on CPU but terse on dense layouts; better on multi-column text.",
-    options: { task: "<OCR>" },
-  },
-  {
-    id: "microsoft/Florence-2-large",
-    label: "Florence-2-large",
-    description: "Larger Florence-2 variant, 770M. Better than Florence-2-base but still leans terse on receipts.",
-    options: { task: "<OCR>" },
-  },
   {
     id: "PaddlePaddle/PaddleOCR-VL-1.5",
     label: "PaddleOCR-VL-1.5 (GPU image)",
-    description: "Paddle's VLM-OCR, 1.5B. Six task modes. Available on the CUDA image.",
     options: { task: "ocr" },
     gpuRequired: true,
   },
   {
     id: "zai-org/GLM-OCR",
     label: "GLM-OCR (GPU only)",
-    description: "CogViT + GLM-0.5B decoder, 9B in bfloat16. Premium quality, needs ~18 GB VRAM.",
     gpuRequired: true,
   },
 ];
@@ -51,6 +39,11 @@ export const STRUCTURED_MODELS: ModelOption[] = [
     label: "Donut on DocVQA",
     description: "Same Donut architecture, fine-tuned for visual question answering. Returns text answers.",
   },
 ];
 export const NER_MODELS: ModelOption[] = [
@@ -64,6 +57,16 @@ export const NER_MODELS: ModelOption[] = [
     label: "GLiNER large (English)",
     description: "440M, English-focused, higher quality on English text.",
   },
 ];
 export const config = {

     label: "LightOnOCR-2-1B (default)",
     description: "Pixtral encoder + Qwen3 decoder, 2.1B. Strong Markdown output across dense layouts. ~4 GB to download on first call.",
   },
   {
     id: "PaddlePaddle/PaddleOCR-VL-1.5",
     label: "PaddleOCR-VL-1.5 (GPU image)",
+    description: "Paddle's VLM-OCR, 1.5B. Six task modes. Available on the CUDA image (compose.gpu.yml).",
     options: { task: "ocr" },
     gpuRequired: true,
   },
   {
     id: "zai-org/GLM-OCR",
     label: "GLM-OCR (GPU only)",
+    description: "CogViT + GLM-0.5B decoder, 9B in bfloat16. Premium quality, needs ~18 GB VRAM (compose.gpu.yml).",
     gpuRequired: true,
   },
 ];
     label: "Donut on DocVQA",
     description: "Same Donut architecture, fine-tuned for visual question answering. Returns text answers.",
   },
+  {
+    id: "naver-clova-ix/donut-base-finetuned-rvlcdip",
+    label: "Donut on RVL-CDIP (doc classification)",
+    description: "Same Donut architecture, fine-tuned for document-type classification across 16 classes (invoice, receipt, form, ...).",
+  },
 ];
 export const NER_MODELS: ModelOption[] = [
     label: "GLiNER large (English)",
     description: "440M, English-focused, higher quality on English text.",
   },
+  {
+    id: "urchade/gliner_multi_pii-v1",
+    label: "GLiNER multi PII",
+    description: "GLiNER fine-tuned for PII extraction. Good for redaction-style pipelines on documents.",
+  },
+  {
+    id: "numind/NuNER_Zero",
+    label: "NuNER Zero",
+    description: "NuMind's zero-shot NER. Different architecture from GLiNER; useful for comparing zero-shot NER families on the same input text.",
+  },
 ];
 export const config = {

web/public/app.js CHANGED Viewed

@@ -25,6 +25,7 @@ let donutBuf = { entities: [], data: null };
 let glinerBuf = [];
 let modelConfig = null;
 let registeredSet = new Set();
 function setBadge(text, cls) {
   els.badge.textContent = text;
@@ -98,12 +99,15 @@ function populateDropdown(selectEl, options, defaultId) {
   for (const opt of options) {
     const node = document.createElement("option");
     node.value = opt.id;
-    const available =
-      registeredSet.size === 0 || registeredSet.has(opt.id);
     const labelSuffix = !available
-      ? opt.gpuRequired
         ? " (GPU image needed)"
-        : " (not registered)"
       : "";
     node.textContent = opt.label + labelSuffix;
     if (!available) node.disabled = true;
@@ -264,6 +268,7 @@ async function init() {
     } else {
       els.sieState.textContent = `SIE healthy · ${j.registeredModels} models registered`;
       registered = j.registered ?? [];
     }
   } catch {
     els.sieState.textContent = "could not reach the local server";

 let glinerBuf = [];
 let modelConfig = null;
 let registeredSet = new Set();
+let cudaAvailable = false;
 function setBadge(text, cls) {
   els.badge.textContent = text;
   for (const opt of options) {
     const node = document.createElement("option");
     node.value = opt.id;
+    const inCatalog = registeredSet.size === 0 || registeredSet.has(opt.id);
+    const blockedByCuda = opt.gpuRequired && !cudaAvailable;
+    const available = inCatalog && !blockedByCuda;
     const labelSuffix = !available
+      ? blockedByCuda
         ? " (GPU image needed)"
+        : opt.gpuRequired
+          ? " (GPU image needed)"
+          : " (not registered)"
       : "";
     node.textContent = opt.label + labelSuffix;
     if (!available) node.disabled = true;
     } else {
       els.sieState.textContent = `SIE healthy · ${j.registeredModels} models registered`;
       registered = j.registered ?? [];
+      cudaAvailable = !!j.cuda;
     }
   } catch {
     els.sieState.textContent = "could not reach the local server";

web/public/index.html CHANGED Viewed

@@ -46,7 +46,7 @@
             <span>↓</span><span>↓</span><span>↓</span>
           </div>
           <div class="diagram-models">
-            <div class="diagram-box diagram-recognition">VLM-OCR<br><span>(Florence-2, LightOnOCR, GLM-OCR, ...)</span></div>
             <div class="diagram-box diagram-structured">Donut<br><span>(end-to-end JSON)</span></div>
             <div class="diagram-box diagram-ner">GLiNER<br><span>(zero-shot NER)</span></div>
           </div>

             <span>↓</span><span>↓</span><span>↓</span>
           </div>
           <div class="diagram-models">
+            <div class="diagram-box diagram-recognition">VLM-OCR<br><span>(LightOnOCR-2-1B, PaddleOCR-VL, GLM-OCR)</span></div>
             <div class="diagram-box diagram-structured">Donut<br><span>(end-to-end JSON)</span></div>
             <div class="diagram-box diagram-ner">GLiNER<br><span>(zero-shot NER)</span></div>
           </div>

web/server.ts CHANGED Viewed

@@ -49,14 +49,20 @@ function setupSse(res: http.ServerResponse) {
   };
 }
-async function fetchModels(): Promise<{ ok: boolean; names: string[] }> {
   try {
     const r = await fetch(`${config.sieUrl}/v1/models`, { signal: AbortSignal.timeout(3000) });
-    if (!r.ok) return { ok: false, names: [] };
-    const json = (await r.json()) as { models?: { name: string }[] };
-    return { ok: true, names: (json.models ?? []).map((m) => m.name) };
   } catch {
-    return { ok: false, names: [] };
   }
 }
@@ -129,7 +135,7 @@ const server = http.createServer(async (req, res) => {
   if (p.startsWith("/samples/")) return serveFile(res, path.join(SAMPLE_DIR, p.slice("/samples/".length)));
   if (p === "/api/health") {
-    const { ok, names } = await fetchModels();
     return send(
       res,
       200,
@@ -138,6 +144,7 @@ const server = http.createServer(async (req, res) => {
         sieUrl: config.sieUrl,
         registeredModels: names.length,
         registered: names,
       }),
       "application/json",
     );

   };
 }
+async function fetchModels(): Promise<{ ok: boolean; names: string[]; cuda: boolean }> {
   try {
     const r = await fetch(`${config.sieUrl}/v1/models`, { signal: AbortSignal.timeout(3000) });
+    if (!r.ok) return { ok: false, names: [], cuda: false };
+    const json = (await r.json()) as {
+      models?: { name: string; device?: string; state?: string }[];
+    };
+    const models = json.models ?? [];
+    // GPU compose preloads GPU-only models. If any catalog entry is currently
+    // loaded on a non-cpu device, treat this server as GPU-capable.
+    const cuda = models.some((m) => (m.device ?? "").toLowerCase().includes("cuda"));
+    return { ok: true, names: models.map((m) => m.name), cuda };
   } catch {
+    return { ok: false, names: [], cuda: false };
   }
 }
   if (p.startsWith("/samples/")) return serveFile(res, path.join(SAMPLE_DIR, p.slice("/samples/".length)));
   if (p === "/api/health") {
+    const { ok, names, cuda } = await fetchModels();
     return send(
       res,
       200,
         sieUrl: config.sieUrl,
         registeredModels: names.length,
         registered: names,
+        cuda,
       }),
       "application/json",
     );