Spaces:

superlinked
/

document-ocr

Running

fm1320 commited on 6 days ago

Commit

ffe59ba

1 Parent(s): c1df4e3

Initial: document-ocr demo for HF Spaces

Single-container build of the document-ocr demo from
superlinked/brave-new-demos. Extends ghcr.io/superlinked/sie-server:latest-cpu-default
with Node 22, runs SIE on 127.0.0.1:8080 and the UI on 0.0.0.0:7860.

Preloads three small models from the default bundle (Florence-2-base,
Donut-CORD-v2, GLiNER-multi). Alternates are dropdown-selectable and
lazy-load on first click.

HF persistent /data is used as HF_HOME so model weights survive Space
restarts.

Files changed (24) hide show

Dockerfile +45 -0
README.md +63 -4
data/samples/README.md +18 -0
data/samples/business-card.png +0 -0
data/samples/handwritten.png +0 -0
data/samples/index.json +84 -0
data/samples/invoice.png +0 -0
data/samples/multi-column.png +0 -0
data/samples/receipt.png +0 -0
data/samples/table.png +0 -0
hf-entrypoint.sh +36 -0
package.json +24 -0
src/config.ts +85 -0
src/donut.ts +25 -0
src/events.ts +13 -0
src/extract.ts +17 -0
src/ocr.ts +24 -0
src/pipeline.ts +102 -0
src/types.ts +33 -0
tsconfig.json +16 -0
web/public/app.js +236 -0
web/public/index.html +80 -0
web/public/style.css +148 -0
web/server.ts +186 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,45 @@

+# Hugging Face Spaces image. Extends the official SIE server image, adds
+# Node 22 for the UI server, and runs both processes from one container.
+#
+# This Dockerfile is HF Spaces specific. Local Docker users should use
+# compose.yml (which just runs the unmodified upstream SIE image with the
+# Node UI on the host).
+FROM ghcr.io/superlinked/sie-server:latest-cpu-default
+USER root
+# Node 22 for the UI server (tsx + node:http + our Express-free server.ts)
+# The SIE base image is minimal; install curl + ca-certificates before
+# pulling NodeSource's setup script.
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends curl ca-certificates gnupg \
+ && curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
+ && apt-get install -y --no-install-recommends nodejs \
+ && apt-get clean \
+ && rm -rf /var/lib/apt/lists/*
+# HF Spaces' persistent storage lives at /data (50 GB free tier).
+# Send the HuggingFace cache there so model weights survive Space restarts.
+ENV HF_HOME=/data/.cache/huggingface
+# UI server lives under /app/ui (the SIE base image already owns /app/...)
+WORKDIR /app/ui
+COPY package.json tsconfig.json ./
+RUN npm install --silent
+COPY src ./src
+COPY web ./web
+COPY data ./data
+# Entrypoint script orchestrates both processes (SIE in background, UI in foreground).
+COPY hf-entrypoint.sh /usr/local/bin/hf-entrypoint.sh
+RUN chmod +x /usr/local/bin/hf-entrypoint.sh
+# Make /app/ui world-readable so any HF Space user account can access it.
+# Also make /data writable so the HF cache can be written there.
+RUN chmod -R a+rx /app/ui && mkdir -p /data && chmod -R a+rwx /data
+EXPOSE 7860
+ENTRYPOINT ["/usr/local/bin/hf-entrypoint.sh"]

README.md CHANGED Viewed

@@ -1,10 +1,69 @@
 ---
-title: Document Ocr
-emoji: 📈
-colorFrom: indigo
 colorTo: blue
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Document OCR
+emoji: 📄
+colorFrom: purple
 colorTo: blue
 sdk: docker
+app_port: 7860
 pinned: false
+short_description: Multi-model OCR pipeline running on SIE
 ---
+# Document OCR
+A small Hugging Face Space that runs three different OCR-class models through one
+[SIE](https://github.com/superlinked/sie) inference engine. Pick a sample
+document on the left, swap any of the three models in the dropdowns, watch
+SIE hot-swap them with one identifier change.
+## What runs in this Space
+A single Docker container with two processes:
+- `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, preloading
+  three small models from the default bundle at boot.
+- A small Node web server on `0.0.0.0:7860` that serves the UI and
+  proxies requests to SIE via SSE.
+Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-default`.
+HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
+model weights survive Space restarts.
+## Model lineup
+| Stage | Default (preloaded) | Alternates (lazy-load on click) |
+|---|---|---|
+| Recognition | `microsoft/Florence-2-base` (270M) | Florence-2-large, LightOnOCR-2-1B, GLM-OCR, PaddleOCR-VL |
+| Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA |
+| NER | `urchade/gliner_multi-v2.1` | GLiNER-large |
+The default trio is ~1 GB total. Alternates are listed in the dropdowns but
+only load when first clicked; some are GPU-only.
+## What SIE provides here
+Three different model architectures, one API:
+```
+client.extract(model_id, { images: [bytes] })
+```
+The model ID alone decides whether you get VLM Markdown (Florence-2,
+LightOnOCR), structured JSON (Donut), or typed entities (GLiNER). No
+separate auth, no separate rate limit, no separate deployment story.
+## Source
+Built from the `document-ocr` demo in
+[superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr).
+The local-Docker version uses `docker compose` against the same upstream
+SIE image; this Space packages everything into one container for HF.
+## Performance note
+This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). Per-sample
+latency is in the 60-90 s range on the default Florence-2 + Donut + GLiNER
+trio; recognition is the slow step. On a GPU Space (paid), Florence-2 drops
+to a few seconds and the heavier models like GLM-OCR become tractable.
+Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).

data/samples/README.md ADDED Viewed

	@@ -0,0 +1,18 @@

+# Bundled sample documents
+Six synthetic, public-domain images covering the document shapes most
+real-world OCR pipelines hit:
+- `receipt.png`: printed grocery receipt, line items + totals
+- `invoice.png`: vendor invoice, multi-column form layout
+- `business-card.png`: tight contact card, mixed text sizes
+- `table.png`: dense numerical table with totals row
+- `handwritten.png`: jittered text that simulates informal handwriting
+- `multi-column.png`: two-column newspaper-style layout where reading order
+  matters
+`index.json` carries metadata for each: the GLiNER labels we ask for, plus a
+short description shown in the UI.
+Regenerate with `python scripts/generate_samples.py`. Pillow is the only
+dep; no real customer data is involved.

data/samples/business-card.png ADDED Viewed

data/samples/handwritten.png ADDED Viewed

data/samples/index.json ADDED Viewed

	@@ -0,0 +1,84 @@

+[
+  {
+    "id": "receipt",
+    "filename": "receipt.png",
+    "label": "Grocery receipt",
+    "description": "Printed receipt with line items, subtotal, tax, total. Clean text, simple table layout.",
+    "labels": [
+      "merchant",
+      "date",
+      "line_item",
+      "subtotal",
+      "tax",
+      "total",
+      "payment_method"
+    ]
+  },
+  {
+    "id": "invoice",
+    "filename": "invoice.png",
+    "label": "Vendor invoice",
+    "description": "Multi-column invoice with billing party, line items, subtotal, tax, total. Form-style layout.",
+    "labels": [
+      "vendor",
+      "invoice_number",
+      "date",
+      "due_date",
+      "billing_party",
+      "line_item",
+      "total"
+    ]
+  },
+  {
+    "id": "business-card",
+    "filename": "business-card.png",
+    "label": "Business card",
+    "description": "Tight layout, mixed text sizes, multiple contact fields.",
+    "labels": [
+      "company",
+      "person",
+      "role",
+      "email",
+      "phone",
+      "address",
+      "website"
+    ]
+  },
+  {
+    "id": "table",
+    "filename": "table.png",
+    "label": "Quarterly table",
+    "description": "Dense numerical table with totals row. Tests table-structure recognition.",
+    "labels": [
+      "department",
+      "headcount",
+      "amount",
+      "category"
+    ]
+  },
+  {
+    "id": "handwritten",
+    "filename": "handwritten.png",
+    "label": "Casual notes",
+    "description": "Jittered text simulating informal handwriting; non-template content.",
+    "labels": [
+      "task",
+      "person",
+      "place",
+      "amount"
+    ]
+  },
+  {
+    "id": "multi-column",
+    "filename": "multi-column.png",
+    "label": "Newspaper page",
+    "description": "Two-column newspaper-style layout. Reading order matters.",
+    "labels": [
+      "headline",
+      "person",
+      "organization",
+      "place",
+      "date"
+    ]
+  }
+]

data/samples/invoice.png ADDED Viewed

data/samples/multi-column.png ADDED Viewed

data/samples/receipt.png ADDED Viewed

data/samples/table.png ADDED Viewed

hf-entrypoint.sh ADDED Viewed

	@@ -0,0 +1,36 @@

+#!/usr/bin/env bash
+# Boot SIE in the background, then run the UI server in the foreground.
+# Single-container layout for Hugging Face Spaces.
+set -euo pipefail
+# Models to preload at SIE startup. Same trio the local compose uses.
+PRELOAD="microsoft/Florence-2-base,naver-clova-ix/donut-base-finetuned-cord-v2,urchade/gliner_multi-v2.1"
+echo "[hf-entrypoint] starting sie-server on 127.0.0.1:8080 with preload=$PRELOAD"
+sie-server serve \
+  --host 127.0.0.1 \
+  --port 8080 \
+  --preload "$PRELOAD" &
+SIE_PID=$!
+# Wait up to 20 minutes for SIE to come up. First boot pulls model weights
+# from HF; subsequent restarts reuse the /data cache and start in seconds.
+echo "[hf-entrypoint] waiting for /healthz"
+for i in $(seq 1 1200); do
+  if curl -fsS http://127.0.0.1:8080/healthz > /dev/null 2>&1; then
+    echo "[hf-entrypoint] sie healthy in ~${i}s"
+    break
+  fi
+  if ! kill -0 "$SIE_PID" 2>/dev/null; then
+    echo "[hf-entrypoint] sie-server died before becoming healthy"
+    exit 1
+  fi
+  sleep 1
+done
+cd /app/ui
+export PORT="${PORT:-7860}"
+export SIE_URL="${SIE_URL:-http://127.0.0.1:8080}"
+export OPEN_BROWSER=0
+echo "[hf-entrypoint] starting UI on 0.0.0.0:$PORT"
+exec npx tsx web/server.ts

package.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "name": "document-ocr",
+  "version": "0.1.0",
+  "private": true,
+  "type": "module",
+  "engines": {
+    "node": ">=22"
+  },
+  "scripts": {
+    "start": "docker compose up -d --build && tsx web/server.ts",
+    "start:gpu": "docker compose -f compose.gpu.yml up -d --build && tsx web/server.ts",
+    "ui": "tsx web/server.ts",
+    "typecheck": "tsc -p tsconfig.json --noEmit",
+    "regen-samples": "python3 scripts/generate_samples.py"
+  },
+  "dependencies": {
+    "@superlinked/sie-sdk": "^0.3.1"
+  },
+  "devDependencies": {
+    "@types/node": "^22.10.0",
+    "tsx": "^4.21.0",
+    "typescript": "^5.7.2"
+  }
+}

src/config.ts ADDED Viewed

	@@ -0,0 +1,85 @@

+export type ModelOption = {
+  id: string;
+  label: string;
+  description: string;
+  gpuRequired?: boolean;
+  /** Adapter-specific options passed via SIE's `options` field. */
+  options?: Record<string, unknown>;
+};
+export const RECOGNITION_MODELS: ModelOption[] = [
+  {
+    id: "microsoft/Florence-2-base",
+    label: "Florence-2-base (small, fast)",
+    description: "Microsoft DaViT + decoder, 270M. Default OCR with the <OCR> task. Fast on CPU.",
+    options: { task: "<OCR>" },
+  },
+  {
+    id: "microsoft/Florence-2-large",
+    label: "Florence-2-large",
+    description: "Larger Florence-2 variant, 770M. Higher quality, still CPU-runnable; ~2x latency.",
+    options: { task: "<OCR>" },
+  },
+  {
+    id: "lightonai/LightOnOCR-2-1B",
+    label: "LightOnOCR-2-1B (premium, GPU recommended)",
+    description: "Pixtral encoder + Qwen3 decoder, 2.1B. Markdown output. Loads on CPU but slow under Rosetta.",
+  },
+  {
+    id: "PaddlePaddle/PaddleOCR-VL-1.5",
+    label: "PaddleOCR-VL-1.5 (GPU image)",
+    description: "Paddle's VLM-OCR, 1.5B. Six task modes. Available on the CUDA image.",
+    options: { task: "ocr" },
+    gpuRequired: true,
+  },
+  {
+    id: "zai-org/GLM-OCR",
+    label: "GLM-OCR (GPU only)",
+    description: "CogViT + GLM-0.5B decoder, 9B in bfloat16. Premium quality, needs ~18 GB VRAM.",
+    gpuRequired: true,
+  },
+];
+export const STRUCTURED_MODELS: ModelOption[] = [
+  {
+    id: "naver-clova-ix/donut-base-finetuned-cord-v2",
+    label: "Donut on CORD (receipts)",
+    description: "Fine-tuned for the CORD receipt schema. Pixels in, nested JSON out.",
+  },
+  {
+    id: "naver-clova-ix/donut-base-finetuned-docvqa",
+    label: "Donut on DocVQA",
+    description: "Same Donut architecture, fine-tuned for visual question answering. Returns text answers.",
+  },
+];
+export const NER_MODELS: ModelOption[] = [
+  {
+    id: "urchade/gliner_multi-v2.1",
+    label: "GLiNER multi (multilingual)",
+    description: "280M, zero-shot NER, 100+ languages. Good default.",
+  },
+  {
+    id: "urchade/gliner_large-v2.1",
+    label: "GLiNER large (English)",
+    description: "440M, English-focused, higher quality on English text.",
+  },
+];
+export const config = {
+  sieUrl: process.env.SIE_URL ?? "http://localhost:8080",
+  sieApiKey: process.env.SIE_API_KEY,
+  defaults: {
+    recognition: RECOGNITION_MODELS[0].id,
+    structured: STRUCTURED_MODELS[0].id,
+    ner: NER_MODELS[0].id,
+  },
+  paths: {
+    samples: "data/samples/index.json",
+    sampleDir: "data/samples",
+  },
+  port: Number(process.env.PORT ?? 3032),
+} as const;

src/donut.ts ADDED Viewed

	@@ -0,0 +1,25 @@

+import { detectImageFormat, type SIEClient } from "@superlinked/sie-sdk";
+import type { DonutEntity } from "./types.js";
+/** Run any image-input "structured" extractor (Donut variants, etc.). */
+export async function structuredExtract(
+  client: SIEClient,
+  model: string,
+  imageBytes: Uint8Array,
+  options?: Record<string, unknown>,
+): Promise<{ entities: DonutEntity[]; data: unknown }> {
+  const format = detectImageFormat(imageBytes);
+  if (format === "unknown") throw new Error("could not detect image format");
+  const wire = { data: imageBytes, format };
+  const result = await client.extract(
+    model,
+    { images: [wire] as unknown as Uint8Array[] },
+    { labels: [], options } as unknown as Parameters<typeof client.extract>[2],
+  );
+  const entities = (result.entities ?? []).map((e) => ({
+    label: e.label,
+    text: e.text,
+  }));
+  const data = (result as unknown as { data?: unknown }).data;
+  return { entities, data };
+}

src/events.ts ADDED Viewed

	@@ -0,0 +1,13 @@

+import type { DonutEntity, ExtractedField } from "./types.js";
+export type PipelineEvent =
+  | { type: "models"; data: { extractor: string; recognition: string; structured: string } }
+  | { type: "recognition_start"; data: { model: string } }
+  | { type: "recognition_chunk"; data: { textLen: number } }
+  | { type: "recognition_done"; data: { markdown: string; ms: number } }
+  | { type: "donut_start" }
+  | { type: "donut_done"; data: { entities: DonutEntity[]; rawData: unknown; ms: number } }
+  | { type: "gliner_start"; data: { labels: string[] } }
+  | { type: "gliner_done"; data: { fields: ExtractedField[]; ms: number } }
+  | { type: "done"; data: { totalMs: number } }
+  | { type: "error"; data: { message: string; stage: string } };

src/extract.ts ADDED Viewed

	@@ -0,0 +1,17 @@

+import type { SIEClient } from "@superlinked/sie-sdk";
+import type { ExtractedField } from "./types.js";
+export async function extractFields(
+  client: SIEClient,
+  model: string,
+  text: string,
+  labels: string[],
+): Promise<ExtractedField[]> {
+  if (!text.trim()) return [];
+  const result = await client.extract(model, { text }, { labels, threshold: 0.4 });
+  return (result.entities ?? []).map((e) => ({
+    label: e.label,
+    text: e.text,
+    score: e.score,
+  }));
+}

src/ocr.ts ADDED Viewed

	@@ -0,0 +1,24 @@

+import { detectImageFormat, type SIEClient } from "@superlinked/sie-sdk";
+export async function recognize(
+  client: SIEClient,
+  model: string,
+  imageBytes: Uint8Array,
+  options?: Record<string, unknown>,
+): Promise<string> {
+  const format = detectImageFormat(imageBytes);
+  if (format === "unknown") throw new Error("could not detect image format");
+  const wire = { data: imageBytes, format };
+  // The TS SDK types declare images as Uint8Array[], but the wire format
+  // expects {data, format} dicts. Cast around the typing gap.
+  const result = await client.extract(
+    model,
+    { images: [wire] as unknown as Uint8Array[] },
+    // The TS SDK's ExtractOptions doesn't declare `options`, but the wire
+    // protocol forwards it to the adapter. Cast to bridge the typing gap.
+    { labels: [], options } as unknown as Parameters<typeof client.extract>[2],
+  );
+  if (!result.entities || result.entities.length === 0) return "";
+  const text = result.entities[0]?.text;
+  return typeof text === "string" ? text : "";
+}

src/pipeline.ts ADDED Viewed

	@@ -0,0 +1,102 @@

+import type { SIEClient } from "@superlinked/sie-sdk";
+import { NER_MODELS, RECOGNITION_MODELS, STRUCTURED_MODELS } from "./config.js";
+import { structuredExtract } from "./donut.js";
+import type { PipelineEvent } from "./events.js";
+import { extractFields } from "./extract.js";
+import { recognize } from "./ocr.js";
+import type { SampleDoc, TriageResult } from "./types.js";
+export type RunInput = {
+  client: SIEClient;
+  imageBytes: Uint8Array;
+  sample: SampleDoc;
+  recognitionModel: string;
+  structuredModel: string;
+  nerModel: string;
+  emit: (event: PipelineEvent) => void;
+};
+function lookup<T extends { id: string }>(list: T[], id: string): T {
+  const found = list.find((m) => m.id === id);
+  if (!found) throw new Error(`unknown model id: ${id}`);
+  return found;
+}
+export async function runPipeline({
+  client,
+  imageBytes,
+  sample,
+  recognitionModel,
+  structuredModel,
+  nerModel,
+  emit,
+}: RunInput): Promise<TriageResult> {
+  const t0 = Date.now();
+  emit({
+    type: "models",
+    data: { extractor: nerModel, recognition: recognitionModel, structured: structuredModel },
+  });
+  // Recognition
+  const recOpt = lookup(RECOGNITION_MODELS, recognitionModel);
+  emit({ type: "recognition_start", data: { model: recognitionModel } });
+  const tRec = Date.now();
+  let markdown = "";
+  try {
+    markdown = await recognize(client, recOpt.id, imageBytes, recOpt.options);
+  } catch (err) {
+    emit({
+      type: "error",
+      data: { stage: "recognition", message: `${recognitionModel} failed: ${(err as Error).message}` },
+    });
+    throw err;
+  }
+  const recognitionMs = Date.now() - tRec;
+  emit({ type: "recognition_done", data: { markdown, ms: recognitionMs } });
+  // Structured (Donut variants, etc.)
+  const strOpt = lookup(STRUCTURED_MODELS, structuredModel);
+  emit({ type: "donut_start" });
+  const tDon = Date.now();
+  let donut = { entities: [] as { label: string; text: string }[], data: undefined as unknown };
+  try {
+    donut = await structuredExtract(client, strOpt.id, imageBytes, strOpt.options);
+  } catch (err) {
+    emit({
+      type: "error",
+      data: { stage: "donut", message: `${structuredModel} failed: ${(err as Error).message}` },
+    });
+  }
+  const donutMs = Date.now() - tDon;
+  emit({ type: "donut_done", data: { entities: donut.entities, rawData: donut.data, ms: donutMs } });
+  // NER (GLiNER variants)
+  const nerOpt = lookup(NER_MODELS, nerModel);
+  emit({ type: "gliner_start", data: { labels: sample.labels } });
+  const tGli = Date.now();
+  let fields: { label: string; text: string; score: number }[] = [];
+  try {
+    fields = await extractFields(client, nerOpt.id, markdown, sample.labels);
+  } catch (err) {
+    emit({
+      type: "error",
+      data: { stage: "gliner", message: `${nerModel} failed: ${(err as Error).message}` },
+    });
+  }
+  const glinerMs = Date.now() - tGli;
+  emit({ type: "gliner_done", data: { fields, ms: glinerMs } });
+  const totalMs = Date.now() - t0;
+  emit({ type: "done", data: { totalMs } });
+  return {
+    sampleId: sample.id,
+    recognitionModel,
+    markdown,
+    donutEntities: donut.entities,
+    donutData: donut.data,
+    glinerFields: fields,
+    timings: { recognitionMs, donutMs, glinerMs, totalMs },
+  };
+}

src/types.ts ADDED Viewed

	@@ -0,0 +1,33 @@

+export type SampleDoc = {
+  id: string;
+  filename: string;
+  label: string;
+  description: string;
+  labels: string[];
+};
+export type ExtractedField = {
+  label: string;
+  text: string;
+  score: number;
+};
+export type DonutEntity = {
+  label: string;
+  text: string;
+};
+export type TriageResult = {
+  sampleId: string;
+  recognitionModel: string;
+  markdown: string;
+  donutEntities: DonutEntity[];
+  donutData: unknown;
+  glinerFields: ExtractedField[];
+  timings: {
+    recognitionMs: number;
+    donutMs: number;
+    glinerMs: number;
+    totalMs: number;
+  };
+};

tsconfig.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "ESNext",
+    "moduleResolution": "bundler",
+    "esModuleInterop": true,
+    "allowSyntheticDefaultImports": true,
+    "strict": true,
+    "skipLibCheck": true,
+    "forceConsistentCasingInFileNames": true,
+    "resolveJsonModule": true,
+    "isolatedModules": true,
+    "noEmit": true
+  },
+  "include": ["src", "web"]
+}

web/public/app.js ADDED Viewed

	@@ -0,0 +1,236 @@

+const els = {
+  badge: document.getElementById("badge"),
+  models: document.getElementById("models"),
+  sieState: document.getElementById("sie-state"),
+  events: document.getElementById("events"),
+  selectRecognition: document.getElementById("select-recognition"),
+  selectStructured: document.getElementById("select-structured"),
+  selectNer: document.getElementById("select-ner"),
+  recognition: document.getElementById("recognition"),
+  recognitionMeta: document.getElementById("recognition-meta"),
+  extraction: document.getElementById("extraction"),
+  extractionMeta: document.getElementById("extraction-meta"),
+  footer: document.getElementById("footer"),
+  sieUrl: document.getElementById("sie-url"),
+  timings: document.getElementById("timings"),
+};
+let activeSampleId = null;
+let timings = { recognitionMs: 0, donutMs: 0, glinerMs: 0 };
+let donutBuf = { entities: [], data: null };
+let glinerBuf = [];
+let modelConfig = null;
+let registeredSet = new Set();
+function setBadge(text, cls) {
+  els.badge.textContent = text;
+  els.badge.className = "badge" + (cls ? " " + cls : "");
+}
+function shortModel(id) {
+  if (!id) return "";
+  const slash = id.indexOf("/");
+  return slash === -1 ? id : id.slice(slash + 1);
+}
+function escapeHtml(s) {
+  return String(s).replace(
+    /[&<>"']/g,
+    (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", '"': "&quot;", "'": "&#39;" })[c],
+  );
+}
+function populateDropdown(selectEl, options, defaultId) {
+  selectEl.innerHTML = "";
+  for (const opt of options) {
+    const node = document.createElement("option");
+    node.value = opt.id;
+    const available =
+      registeredSet.size === 0 || registeredSet.has(opt.id);
+    const labelSuffix = !available
+      ? opt.gpuRequired
+        ? " (GPU image needed)"
+        : " (not registered)"
+      : "";
+    node.textContent = opt.label + labelSuffix;
+    if (!available) node.disabled = true;
+    if (opt.id === defaultId) node.selected = true;
+    node.title = opt.description;
+    selectEl.appendChild(node);
+  }
+}
+function renderSamples(samples, onClick) {
+  if (!samples || samples.length === 0) {
+    els.events.innerHTML = '<p class="hint">no samples</p>';
+    return;
+  }
+  els.events.innerHTML = samples
+    .map(
+      (s) => `<div class="event" data-id="${escapeHtml(s.id)}">
+        <img src="/samples/${encodeURIComponent(s.filename)}" alt="${escapeHtml(s.label)}" />
+        <div>
+          <div class="label">${escapeHtml(s.label)}</div>
+          <div class="desc">${escapeHtml(s.description)}</div>
+        </div>
+      </div>`,
+    )
+    .join("");
+  for (const node of els.events.querySelectorAll(".event")) {
+    node.addEventListener("click", () => {
+      for (const n of els.events.querySelectorAll(".event")) n.classList.remove("active");
+      node.classList.add("active");
+      onClick(node.dataset.id);
+    });
+  }
+}
+function updateTimings() {
+  const total = timings.recognitionMs + timings.donutMs + timings.glinerMs;
+  els.timings.textContent =
+    total > 0
+      ? `recognition ${timings.recognitionMs}ms · structured ${timings.donutMs}ms · ner ${timings.glinerMs}ms · total ${total}ms`
+      : "";
+}
+function renderExtraction() {
+  let html = "";
+  if (glinerBuf.length > 0) {
+    html += `<div class="section"><h3>NER (${escapeHtml(shortModel(els.selectNer.value))})</h3>`;
+    for (const f of glinerBuf) {
+      html += `<div class="field">
+        <span class="label-name">${escapeHtml(f.label)}</span>
+        <span class="text">${escapeHtml(f.text)}</span>
+        <span class="score">${f.score.toFixed(2)}</span>
+      </div>`;
+    }
+    html += "</div>";
+  }
+  if (donutBuf.entities.length > 0) {
+    html += `<div class="section"><h3>Structured (${escapeHtml(shortModel(els.selectStructured.value))})</h3>`;
+    for (const e of donutBuf.entities.slice(0, 25)) {
+      html += `<div class="donut-row">
+        <span class="key">${escapeHtml(e.label)}</span>
+        <span class="val">${escapeHtml(e.text)}</span>
+      </div>`;
+    }
+    html += "</div>";
+  }
+  if (!html) html = '<p class="hint">running...</p>';
+  els.extraction.innerHTML = html;
+}
+function runSample(sampleId) {
+  activeSampleId = sampleId;
+  setBadge("running", "running");
+  els.recognition.innerHTML = '<p class="hint">running recognition...</p>';
+  els.extraction.innerHTML = '<p class="hint">waiting...</p>';
+  els.recognitionMeta.textContent = "";
+  els.extractionMeta.textContent = "";
+  timings = { recognitionMs: 0, donutMs: 0, glinerMs: 0 };
+  donutBuf = { entities: [], data: null };
+  glinerBuf = [];
+  updateTimings();
+  const recognition = els.selectRecognition.value;
+  const structured = els.selectStructured.value;
+  const ner = els.selectNer.value;
+  const url = `/api/run?id=${encodeURIComponent(sampleId)}&recognition=${encodeURIComponent(recognition)}&structured=${encodeURIComponent(structured)}&ner=${encodeURIComponent(ner)}`;
+  const es = new EventSource(url);
+  es.addEventListener("models", (e) => {
+    const d = JSON.parse(e.data);
+    els.models.innerHTML = `recognition: <code>${shortModel(d.recognition)}</code> · structured: <code>${shortModel(d.structured)}</code> · ner: <code>${shortModel(d.extractor)}</code>`;
+  });
+  es.addEventListener("recognition_start", () => {
+    els.recognitionMeta.textContent = "loading model + generating...";
+  });
+  es.addEventListener("recognition_done", (e) => {
+    const d = JSON.parse(e.data);
+    timings.recognitionMs = d.ms;
+    els.recognitionMeta.textContent = `${d.markdown.length} chars in ${d.ms}ms`;
+    els.recognition.textContent = d.markdown;
+    updateTimings();
+  });
+  es.addEventListener("donut_start", () => {
+    els.extractionMeta.textContent = "running structured...";
+  });
+  es.addEventListener("donut_done", (e) => {
+    const d = JSON.parse(e.data);
+    timings.donutMs = d.ms;
+    donutBuf = { entities: d.entities, data: d.rawData };
+    els.extractionMeta.textContent = `structured ${d.ms}ms`;
+    renderExtraction();
+    updateTimings();
+  });
+  es.addEventListener("gliner_start", () => {
+    els.extractionMeta.textContent = "running NER...";
+  });
+  es.addEventListener("gliner_done", (e) => {
+    const d = JSON.parse(e.data);
+    timings.glinerMs = d.ms;
+    glinerBuf = d.fields;
+    els.extractionMeta.textContent = `ner ${d.ms}ms · ${d.fields.length} fields`;
+    renderExtraction();
+    updateTimings();
+  });
+  es.addEventListener("done", (e) => {
+    const d = JSON.parse(e.data);
+    setBadge(`done ${d.totalMs}ms`, "green");
+    es.close();
+  });
+  es.addEventListener("error", (e) => {
+    setBadge("error", "red");
+    if (e.data) {
+      try {
+        const m = JSON.parse(e.data);
+        els.recognitionMeta.textContent = `${m.stage}: ${m.message}`;
+      } catch {
+        /* */
+      }
+    }
+    es.close();
+  });
+}
+async function init() {
+  // Fetch SIE health (and registered models)
+  let registered = [];
+  try {
+    const r = await fetch("/api/health");
+    const j = await r.json();
+    els.sieUrl.textContent = j.sieUrl;
+    if (!j.sie) {
+      els.sieState.textContent = "SIE not reachable yet (still preloading models?)";
+    } else {
+      els.sieState.textContent = `SIE healthy · ${j.registeredModels} models registered`;
+      registered = j.registered ?? [];
+    }
+  } catch {
+    els.sieState.textContent = "could not reach the local server";
+  }
+  registeredSet = new Set(registered);
+  // Fetch model menus (config-side)
+  try {
+    const r = await fetch("/api/models");
+    modelConfig = await r.json();
+    populateDropdown(els.selectRecognition, modelConfig.recognition, modelConfig.defaults.recognition);
+    populateDropdown(els.selectStructured, modelConfig.structured, modelConfig.defaults.structured);
+    populateDropdown(els.selectNer, modelConfig.ner, modelConfig.defaults.ner);
+  } catch (e) {
+    console.error("failed to load model config", e);
+  }
+  // Fetch sample documents
+  try {
+    const r = await fetch("/api/samples");
+    const samples = await r.json();
+    renderSamples(samples, runSample);
+  } catch {
+    els.events.innerHTML = '<p class="hint">failed to load samples</p>';
+  }
+}
+init();

web/public/index.html ADDED Viewed

	@@ -0,0 +1,80 @@

+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width,initial-scale=1" />
+    <title>document-ocr</title>
+    <link rel="stylesheet" href="/static/style.css" />
+  </head>
+  <body>
+    <header>
+      <div class="title">
+        <span class="logo">📄</span>
+        <h1>document-ocr</h1>
+        <span class="badge" id="badge">idle</span>
+      </div>
+      <div class="meta" id="models">SIE: <code>...</code></div>
+      <div class="meta" id="sie-state">checking SIE...</div>
+    </header>
+    <section class="hero">
+      <p>
+        OCR is rarely a single-model problem. This demo runs three model
+        classes through one SIE server: a <strong>VLM-OCR</strong> recognizes
+        the document into Markdown, a <strong>fine-tuned Donut</strong> emits
+        a JSON tree directly, and a <strong>zero-shot NER (GLiNER)</strong>
+        pulls typed fields out of the recognition output. Pick a sample on
+        the left, swap any of the three models on the right, watch SIE
+        hot-swap them with one identifier change.
+      </p>
+    </section>
+    <main>
+      <section class="panel" id="panel-events">
+        <header><h2>Sample documents</h2></header>
+        <div class="meta-row">
+          <label class="model-pick">
+            <span class="dropdown-label">Recognition</span>
+            <select id="select-recognition"></select>
+          </label>
+          <label class="model-pick">
+            <span class="dropdown-label">Structured</span>
+            <select id="select-structured"></select>
+          </label>
+          <label class="model-pick">
+            <span class="dropdown-label">NER</span>
+            <select id="select-ner"></select>
+          </label>
+        </div>
+        <div class="list" id="events">loading...</div>
+      </section>
+      <section class="panel" id="panel-recognition">
+        <header>
+          <h2>Recognition (Markdown)</h2>
+          <span class="hint" id="recognition-meta"></span>
+        </header>
+        <div class="markdown" id="recognition">
+          <p class="hint">Click a sample on the left.</p>
+        </div>
+      </section>
+      <section class="panel" id="panel-extraction">
+        <header>
+          <h2>Extraction</h2>
+          <span class="hint" id="extraction-meta"></span>
+        </header>
+        <div class="extraction" id="extraction">
+          <p class="hint">Typed fields will appear here.</p>
+        </div>
+      </section>
+    </main>
+    <footer>
+      <span id="footer">SIE on <code id="sie-url">http://localhost:8080</code></span>
+      <span id="timings"></span>
+    </footer>
+    <script src="/static/app.js"></script>
+  </body>
+</html>

web/public/style.css ADDED Viewed

	@@ -0,0 +1,148 @@

+:root {
+  --bg: #0e1014;
+  --panel: #161a22;
+  --line: #232936;
+  --text: #e6e8ec;
+  --muted: #8b94a7;
+  --accent: #7c5dff;
+  --accent-2: #62b6ff;
+  --green: #5fd28b;
+  --red: #ff6b6b;
+  --yellow: #f7c948;
+  --magenta: #c89cff;
+}
+* { box-sizing: border-box; }
+html, body {
+  margin: 0; padding: 0; background: var(--bg); color: var(--text);
+  font: 14px ui-monospace, SFMono-Regular, Menlo, monospace;
+  height: 100%;
+}
+body { display: flex; flex-direction: column; }
+header {
+  display: flex; align-items: center; gap: 16px;
+  padding: 12px 20px; border-bottom: 1px solid var(--line);
+}
+.title { display: flex; align-items: center; gap: 12px; }
+.logo { font-size: 18px; }
+h1 { font-size: 16px; margin: 0; font-weight: 600; }
+.badge {
+  padding: 3px 10px; border-radius: 999px; background: var(--line);
+  color: var(--muted); font-size: 11px; text-transform: uppercase;
+  letter-spacing: 0.5px;
+}
+.badge.running { background: rgba(98, 182, 255, 0.15); color: var(--accent-2); }
+.badge.green { background: rgba(95, 210, 139, 0.18); color: var(--green); }
+.badge.red { background: rgba(255, 107, 107, 0.18); color: var(--red); }
+.meta { color: var(--muted); font-size: 12px; }
+.hero {
+  padding: 14px 20px; background: linear-gradient(180deg, rgba(124,93,255,0.10), transparent);
+  border-bottom: 1px solid var(--line);
+}
+.hero p { margin: 0; color: var(--muted); max-width: 980px; line-height: 1.6; }
+.hero strong { color: var(--text); }
+main {
+  flex: 1; display: grid;
+  grid-template-columns: 0.95fr 1.4fr 1.2fr;
+  gap: 12px; padding: 12px 20px; overflow: hidden;
+}
+.panel {
+  display: flex; flex-direction: column;
+  background: var(--panel); border: 1px solid var(--line);
+  border-radius: 10px; overflow: hidden; min-height: 0;
+}
+.panel header {
+  padding: 10px 14px; border-bottom: 1px solid var(--line);
+  display: flex; justify-content: space-between; align-items: baseline;
+}
+.panel h2 {
+  font-size: 12px; letter-spacing: 0.6px; text-transform: uppercase;
+  margin: 0; color: var(--muted);
+}
+#panel-events h2 { color: var(--accent); }
+#panel-recognition h2 { color: var(--accent-2); }
+#panel-extraction h2 { color: var(--magenta); }
+.hint { color: var(--muted); font-size: 11px; }
+.list, .markdown, .extraction {
+  flex: 1; overflow: auto; padding: 12px 14px; margin: 0;
+}
+.meta-row {
+  padding: 10px 14px; border-bottom: 1px solid var(--line);
+  font-size: 11px; color: var(--muted);
+  display: flex; flex-direction: column; gap: 6px;
+}
+.model-pick {
+  display: flex; align-items: center; gap: 8px;
+  font-size: 11px;
+}
+.dropdown-label {
+  color: var(--muted); text-transform: uppercase;
+  letter-spacing: 0.5px; min-width: 80px;
+}
+.meta-row select {
+  flex: 1; min-width: 0;
+  background: var(--bg); color: var(--text); border: 1px solid var(--line);
+  border-radius: 6px; padding: 4px 8px; font: inherit;
+  font-size: 11px;
+}
+.meta-row select:disabled, .meta-row option:disabled {
+  color: var(--muted);
+}
+.event {
+  padding: 10px 12px; border: 1px solid var(--line); border-radius: 8px;
+  margin-bottom: 10px; cursor: pointer;
+  transition: border-color 0.1s;
+  display: flex; gap: 12px; align-items: center;
+}
+.event:hover { border-color: var(--accent); }
+.event.active { border-color: var(--accent); background: rgba(124,93,255,0.08); }
+.event img {
+  width: 64px; height: 64px; object-fit: cover;
+  border-radius: 4px; background: #fff;
+  flex-shrink: 0;
+}
+.event .label { font-weight: 600; color: var(--text); font-size: 13px; }
+.event .desc { color: var(--muted); font-size: 11px; margin-top: 2px; line-height: 1.4; }
+.markdown {
+  white-space: pre-wrap; word-break: break-word;
+  font-size: 13px; line-height: 1.55;
+}
+.extraction .section {
+  margin-bottom: 14px; padding-bottom: 10px;
+  border-bottom: 1px dashed var(--line);
+}
+.extraction .section:last-child { border-bottom: 0; margin-bottom: 0; }
+.extraction h3 {
+  font-size: 11px; letter-spacing: 0.6px; text-transform: uppercase;
+  color: var(--muted); margin: 0 0 6px 0; font-weight: 500;
+}
+.field {
+  display: flex; gap: 8px; padding: 4px 0;
+  font-size: 12px;
+}
+.field .label-name {
+  color: var(--accent-2); font-weight: 500; min-width: 90px;
+}
+.field .text { color: var(--text); flex: 1; }
+.field .score { color: var(--muted); font-size: 11px; }
+.donut-row { padding: 3px 0; font-size: 12px; }
+.donut-row .key { color: var(--magenta); }
+.donut-row .val { color: var(--text); margin-left: 8px; }
+footer {
+  padding: 10px 20px; border-top: 1px solid var(--line);
+  color: var(--muted); font-size: 12px;
+  display: flex; justify-content: space-between; gap: 12px;
+}
+code { background: var(--line); padding: 1px 6px; border-radius: 4px; }

web/server.ts ADDED Viewed

	@@ -0,0 +1,186 @@

+import fs from "node:fs";
+import http from "node:http";
+import path from "node:path";
+import { spawnSync } from "node:child_process";
+import { fileURLToPath } from "node:url";
+import { SIEClient } from "@superlinked/sie-sdk";
+import { NER_MODELS, RECOGNITION_MODELS, STRUCTURED_MODELS, config } from "../src/config.js";
+import type { PipelineEvent } from "../src/events.js";
+import { runPipeline } from "../src/pipeline.js";
+import type { SampleDoc } from "../src/types.js";
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+const ROOT = path.resolve(__dirname, "..");
+const PUBLIC_DIR = path.resolve(__dirname, "public");
+const SAMPLES_PATH = path.resolve(ROOT, config.paths.samples);
+const SAMPLE_DIR = path.resolve(ROOT, config.paths.sampleDir);
+const MIME: Record<string, string> = {
+  ".html": "text/html; charset=utf-8",
+  ".css": "text/css; charset=utf-8",
+  ".js": "text/javascript; charset=utf-8",
+  ".json": "application/json",
+  ".png": "image/png",
+  ".jpg": "image/jpeg",
+  ".jpeg": "image/jpeg",
+};
+function send(res: http.ServerResponse, status: number, body: string, type = "text/plain") {
+  res.writeHead(status, { "content-type": type });
+  res.end(body);
+}
+function serveFile(res: http.ServerResponse, file: string) {
+  if (!fs.existsSync(file)) return send(res, 404, "not found");
+  const ext = path.extname(file).toLowerCase();
+  res.writeHead(200, { "content-type": MIME[ext] ?? "application/octet-stream" });
+  fs.createReadStream(file).pipe(res);
+}
+function setupSse(res: http.ServerResponse) {
+  res.writeHead(200, {
+    "content-type": "text/event-stream",
+    "cache-control": "no-cache",
+    connection: "keep-alive",
+  });
+  return (event: { type: string; data?: unknown }) => {
+    res.write(`event: ${event.type}\n`);
+    res.write(`data: ${JSON.stringify(event.data ?? null)}\n\n`);
+  };
+}
+async function fetchModels(): Promise<{ ok: boolean; names: string[] }> {
+  try {
+    const r = await fetch(`${config.sieUrl}/v1/models`, { signal: AbortSignal.timeout(3000) });
+    if (!r.ok) return { ok: false, names: [] };
+    const json = (await r.json()) as { models?: { name: string }[] };
+    return { ok: true, names: (json.models ?? []).map((m) => m.name) };
+  } catch {
+    return { ok: false, names: [] };
+  }
+}
+function readJson<T>(p: string): T {
+  return JSON.parse(fs.readFileSync(p, "utf8")) as T;
+}
+async function handleRun(
+  req: http.IncomingMessage,
+  res: http.ServerResponse,
+  sampleId: string,
+  recognitionModel: string,
+  structuredModel: string,
+  nerModel: string,
+) {
+  const push = setupSse(res);
+  let closed = false;
+  req.on("close", () => {
+    closed = true;
+  });
+  const samples = readJson<SampleDoc[]>(SAMPLES_PATH);
+  const sample = samples.find((s) => s.id === sampleId);
+  if (!sample) {
+    push({ type: "error", data: { stage: "lookup", message: `unknown sample id: ${sampleId}` } });
+    return res.end();
+  }
+  const imagePath = path.resolve(SAMPLE_DIR, sample.filename);
+  if (!fs.existsSync(imagePath)) {
+    push({ type: "error", data: { stage: "lookup", message: `sample image not found: ${sample.filename}` } });
+    return res.end();
+  }
+  const imageBytes = fs.readFileSync(imagePath);
+  const client = new SIEClient(config.sieUrl, {
+    apiKey: config.sieApiKey,
+    timeout: 600_000,            // 10 min request timeout (CPU + Rosetta is slow)
+    waitForCapacity: true,        // retry while a model is warming up
+    provisionTimeout: 900_000,    // 15 min ceiling on cold-load polling
+  });
+  try {
+    await runPipeline({
+      client,
+      imageBytes,
+      sample,
+      recognitionModel,
+      structuredModel,
+      nerModel,
+      emit: (event: PipelineEvent) => {
+        if (closed) return;
+        push({ type: event.type, data: "data" in event ? event.data : null });
+      },
+    });
+  } catch (err) {
+    push({ type: "error", data: { stage: "pipeline", message: (err as Error).message } });
+  } finally {
+    await client.close().catch(() => {});
+    res.end();
+  }
+}
+const server = http.createServer(async (req, res) => {
+  const url = new URL(req.url ?? "/", `http://${req.headers.host}`);
+  const p = url.pathname;
+  if (p === "/" || p === "/index.html") return serveFile(res, path.join(PUBLIC_DIR, "index.html"));
+  if (p.startsWith("/static/")) return serveFile(res, path.join(PUBLIC_DIR, p.slice("/static/".length)));
+  if (p.startsWith("/samples/")) return serveFile(res, path.join(SAMPLE_DIR, p.slice("/samples/".length)));
+  if (p === "/api/health") {
+    const { ok, names } = await fetchModels();
+    return send(
+      res,
+      200,
+      JSON.stringify({
+        sie: ok,
+        sieUrl: config.sieUrl,
+        registeredModels: names.length,
+        registered: names,
+      }),
+      "application/json",
+    );
+  }
+  if (p === "/api/models") {
+    return send(
+      res,
+      200,
+      JSON.stringify({
+        recognition: RECOGNITION_MODELS,
+        structured: STRUCTURED_MODELS,
+        ner: NER_MODELS,
+        defaults: config.defaults,
+      }),
+      "application/json",
+    );
+  }
+  if (p === "/api/samples") {
+    return send(res, 200, fs.readFileSync(SAMPLES_PATH, "utf8"), "application/json");
+  }
+  if (p === "/api/run") {
+    const id = url.searchParams.get("id");
+    const recognitionModel =
+      url.searchParams.get("recognition") ?? config.defaults.recognition;
+    const structuredModel = url.searchParams.get("structured") ?? config.defaults.structured;
+    const nerModel = url.searchParams.get("ner") ?? config.defaults.ner;
+    if (!id) return send(res, 400, "missing id");
+    return handleRun(req, res, id, recognitionModel, structuredModel, nerModel);
+  }
+  return send(res, 404, "not found");
+});
+server.listen(config.port, () => {
+  const url = `http://localhost:${config.port}`;
+  console.log(`document-ocr ui: ${url}`);
+  if (process.env.OPEN_BROWSER !== "0") {
+    const opener =
+      process.platform === "darwin"
+        ? "open"
+        : process.platform === "win32"
+          ? "start"
+          : "xdg-open";
+    spawnSync(opener, [url], { stdio: "ignore" });
+  }
+});