fm1320 commited on
Commit
ffe59ba
·
1 Parent(s): c1df4e3

Initial: document-ocr demo for HF Spaces

Browse files

Single-container build of the document-ocr demo from
superlinked/brave-new-demos. Extends ghcr.io/superlinked/sie-server:latest-cpu-default
with Node 22, runs SIE on 127.0.0.1:8080 and the UI on 0.0.0.0:7860.

Preloads three small models from the default bundle (Florence-2-base,
Donut-CORD-v2, GLiNER-multi). Alternates are dropdown-selectable and
lazy-load on first click.

HF persistent /data is used as HF_HOME so model weights survive Space
restarts.

Dockerfile ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Spaces image. Extends the official SIE server image, adds
2
+ # Node 22 for the UI server, and runs both processes from one container.
3
+ #
4
+ # This Dockerfile is HF Spaces specific. Local Docker users should use
5
+ # compose.yml (which just runs the unmodified upstream SIE image with the
6
+ # Node UI on the host).
7
+ FROM ghcr.io/superlinked/sie-server:latest-cpu-default
8
+
9
+ USER root
10
+
11
+ # Node 22 for the UI server (tsx + node:http + our Express-free server.ts)
12
+ # The SIE base image is minimal; install curl + ca-certificates before
13
+ # pulling NodeSource's setup script.
14
+ RUN apt-get update \
15
+ && apt-get install -y --no-install-recommends curl ca-certificates gnupg \
16
+ && curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
17
+ && apt-get install -y --no-install-recommends nodejs \
18
+ && apt-get clean \
19
+ && rm -rf /var/lib/apt/lists/*
20
+
21
+ # HF Spaces' persistent storage lives at /data (50 GB free tier).
22
+ # Send the HuggingFace cache there so model weights survive Space restarts.
23
+ ENV HF_HOME=/data/.cache/huggingface
24
+
25
+ # UI server lives under /app/ui (the SIE base image already owns /app/...)
26
+ WORKDIR /app/ui
27
+
28
+ COPY package.json tsconfig.json ./
29
+ RUN npm install --silent
30
+
31
+ COPY src ./src
32
+ COPY web ./web
33
+ COPY data ./data
34
+
35
+ # Entrypoint script orchestrates both processes (SIE in background, UI in foreground).
36
+ COPY hf-entrypoint.sh /usr/local/bin/hf-entrypoint.sh
37
+ RUN chmod +x /usr/local/bin/hf-entrypoint.sh
38
+
39
+ # Make /app/ui world-readable so any HF Space user account can access it.
40
+ # Also make /data writable so the HF cache can be written there.
41
+ RUN chmod -R a+rx /app/ui && mkdir -p /data && chmod -R a+rwx /data
42
+
43
+ EXPOSE 7860
44
+
45
+ ENTRYPOINT ["/usr/local/bin/hf-entrypoint.sh"]
README.md CHANGED
@@ -1,10 +1,69 @@
1
  ---
2
- title: Document Ocr
3
- emoji: 📈
4
- colorFrom: indigo
5
  colorTo: blue
6
  sdk: docker
 
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Document OCR
3
+ emoji: 📄
4
+ colorFrom: purple
5
  colorTo: blue
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
9
+ short_description: Multi-model OCR pipeline running on SIE
10
  ---
11
 
12
+ # Document OCR
13
+
14
+ A small Hugging Face Space that runs three different OCR-class models through one
15
+ [SIE](https://github.com/superlinked/sie) inference engine. Pick a sample
16
+ document on the left, swap any of the three models in the dropdowns, watch
17
+ SIE hot-swap them with one identifier change.
18
+
19
+ ## What runs in this Space
20
+
21
+ A single Docker container with two processes:
22
+
23
+ - `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, preloading
24
+ three small models from the default bundle at boot.
25
+ - A small Node web server on `0.0.0.0:7860` that serves the UI and
26
+ proxies requests to SIE via SSE.
27
+
28
+ Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-default`.
29
+ HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
30
+ model weights survive Space restarts.
31
+
32
+ ## Model lineup
33
+
34
+ | Stage | Default (preloaded) | Alternates (lazy-load on click) |
35
+ |---|---|---|
36
+ | Recognition | `microsoft/Florence-2-base` (270M) | Florence-2-large, LightOnOCR-2-1B, GLM-OCR, PaddleOCR-VL |
37
+ | Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA |
38
+ | NER | `urchade/gliner_multi-v2.1` | GLiNER-large |
39
+
40
+ The default trio is ~1 GB total. Alternates are listed in the dropdowns but
41
+ only load when first clicked; some are GPU-only.
42
+
43
+ ## What SIE provides here
44
+
45
+ Three different model architectures, one API:
46
+
47
+ ```
48
+ client.extract(model_id, { images: [bytes] })
49
+ ```
50
+
51
+ The model ID alone decides whether you get VLM Markdown (Florence-2,
52
+ LightOnOCR), structured JSON (Donut), or typed entities (GLiNER). No
53
+ separate auth, no separate rate limit, no separate deployment story.
54
+
55
+ ## Source
56
+
57
+ Built from the `document-ocr` demo in
58
+ [superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr).
59
+ The local-Docker version uses `docker compose` against the same upstream
60
+ SIE image; this Space packages everything into one container for HF.
61
+
62
+ ## Performance note
63
+
64
+ This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). Per-sample
65
+ latency is in the 60-90 s range on the default Florence-2 + Donut + GLiNER
66
+ trio; recognition is the slow step. On a GPU Space (paid), Florence-2 drops
67
+ to a few seconds and the heavier models like GLM-OCR become tractable.
68
+
69
+ Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).
data/samples/README.md ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Bundled sample documents
2
+
3
+ Six synthetic, public-domain images covering the document shapes most
4
+ real-world OCR pipelines hit:
5
+
6
+ - `receipt.png`: printed grocery receipt, line items + totals
7
+ - `invoice.png`: vendor invoice, multi-column form layout
8
+ - `business-card.png`: tight contact card, mixed text sizes
9
+ - `table.png`: dense numerical table with totals row
10
+ - `handwritten.png`: jittered text that simulates informal handwriting
11
+ - `multi-column.png`: two-column newspaper-style layout where reading order
12
+ matters
13
+
14
+ `index.json` carries metadata for each: the GLiNER labels we ask for, plus a
15
+ short description shown in the UI.
16
+
17
+ Regenerate with `python scripts/generate_samples.py`. Pillow is the only
18
+ dep; no real customer data is involved.
data/samples/business-card.png ADDED
data/samples/handwritten.png ADDED
data/samples/index.json ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "id": "receipt",
4
+ "filename": "receipt.png",
5
+ "label": "Grocery receipt",
6
+ "description": "Printed receipt with line items, subtotal, tax, total. Clean text, simple table layout.",
7
+ "labels": [
8
+ "merchant",
9
+ "date",
10
+ "line_item",
11
+ "subtotal",
12
+ "tax",
13
+ "total",
14
+ "payment_method"
15
+ ]
16
+ },
17
+ {
18
+ "id": "invoice",
19
+ "filename": "invoice.png",
20
+ "label": "Vendor invoice",
21
+ "description": "Multi-column invoice with billing party, line items, subtotal, tax, total. Form-style layout.",
22
+ "labels": [
23
+ "vendor",
24
+ "invoice_number",
25
+ "date",
26
+ "due_date",
27
+ "billing_party",
28
+ "line_item",
29
+ "total"
30
+ ]
31
+ },
32
+ {
33
+ "id": "business-card",
34
+ "filename": "business-card.png",
35
+ "label": "Business card",
36
+ "description": "Tight layout, mixed text sizes, multiple contact fields.",
37
+ "labels": [
38
+ "company",
39
+ "person",
40
+ "role",
41
+ "email",
42
+ "phone",
43
+ "address",
44
+ "website"
45
+ ]
46
+ },
47
+ {
48
+ "id": "table",
49
+ "filename": "table.png",
50
+ "label": "Quarterly table",
51
+ "description": "Dense numerical table with totals row. Tests table-structure recognition.",
52
+ "labels": [
53
+ "department",
54
+ "headcount",
55
+ "amount",
56
+ "category"
57
+ ]
58
+ },
59
+ {
60
+ "id": "handwritten",
61
+ "filename": "handwritten.png",
62
+ "label": "Casual notes",
63
+ "description": "Jittered text simulating informal handwriting; non-template content.",
64
+ "labels": [
65
+ "task",
66
+ "person",
67
+ "place",
68
+ "amount"
69
+ ]
70
+ },
71
+ {
72
+ "id": "multi-column",
73
+ "filename": "multi-column.png",
74
+ "label": "Newspaper page",
75
+ "description": "Two-column newspaper-style layout. Reading order matters.",
76
+ "labels": [
77
+ "headline",
78
+ "person",
79
+ "organization",
80
+ "place",
81
+ "date"
82
+ ]
83
+ }
84
+ ]
data/samples/invoice.png ADDED
data/samples/multi-column.png ADDED
data/samples/receipt.png ADDED
data/samples/table.png ADDED
hf-entrypoint.sh ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # Boot SIE in the background, then run the UI server in the foreground.
3
+ # Single-container layout for Hugging Face Spaces.
4
+ set -euo pipefail
5
+
6
+ # Models to preload at SIE startup. Same trio the local compose uses.
7
+ PRELOAD="microsoft/Florence-2-base,naver-clova-ix/donut-base-finetuned-cord-v2,urchade/gliner_multi-v2.1"
8
+
9
+ echo "[hf-entrypoint] starting sie-server on 127.0.0.1:8080 with preload=$PRELOAD"
10
+ sie-server serve \
11
+ --host 127.0.0.1 \
12
+ --port 8080 \
13
+ --preload "$PRELOAD" &
14
+ SIE_PID=$!
15
+
16
+ # Wait up to 20 minutes for SIE to come up. First boot pulls model weights
17
+ # from HF; subsequent restarts reuse the /data cache and start in seconds.
18
+ echo "[hf-entrypoint] waiting for /healthz"
19
+ for i in $(seq 1 1200); do
20
+ if curl -fsS http://127.0.0.1:8080/healthz > /dev/null 2>&1; then
21
+ echo "[hf-entrypoint] sie healthy in ~${i}s"
22
+ break
23
+ fi
24
+ if ! kill -0 "$SIE_PID" 2>/dev/null; then
25
+ echo "[hf-entrypoint] sie-server died before becoming healthy"
26
+ exit 1
27
+ fi
28
+ sleep 1
29
+ done
30
+
31
+ cd /app/ui
32
+ export PORT="${PORT:-7860}"
33
+ export SIE_URL="${SIE_URL:-http://127.0.0.1:8080}"
34
+ export OPEN_BROWSER=0
35
+ echo "[hf-entrypoint] starting UI on 0.0.0.0:$PORT"
36
+ exec npx tsx web/server.ts
package.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "document-ocr",
3
+ "version": "0.1.0",
4
+ "private": true,
5
+ "type": "module",
6
+ "engines": {
7
+ "node": ">=22"
8
+ },
9
+ "scripts": {
10
+ "start": "docker compose up -d --build && tsx web/server.ts",
11
+ "start:gpu": "docker compose -f compose.gpu.yml up -d --build && tsx web/server.ts",
12
+ "ui": "tsx web/server.ts",
13
+ "typecheck": "tsc -p tsconfig.json --noEmit",
14
+ "regen-samples": "python3 scripts/generate_samples.py"
15
+ },
16
+ "dependencies": {
17
+ "@superlinked/sie-sdk": "^0.3.1"
18
+ },
19
+ "devDependencies": {
20
+ "@types/node": "^22.10.0",
21
+ "tsx": "^4.21.0",
22
+ "typescript": "^5.7.2"
23
+ }
24
+ }
src/config.ts ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ export type ModelOption = {
2
+ id: string;
3
+ label: string;
4
+ description: string;
5
+ gpuRequired?: boolean;
6
+ /** Adapter-specific options passed via SIE's `options` field. */
7
+ options?: Record<string, unknown>;
8
+ };
9
+
10
+ export const RECOGNITION_MODELS: ModelOption[] = [
11
+ {
12
+ id: "microsoft/Florence-2-base",
13
+ label: "Florence-2-base (small, fast)",
14
+ description: "Microsoft DaViT + decoder, 270M. Default OCR with the <OCR> task. Fast on CPU.",
15
+ options: { task: "<OCR>" },
16
+ },
17
+ {
18
+ id: "microsoft/Florence-2-large",
19
+ label: "Florence-2-large",
20
+ description: "Larger Florence-2 variant, 770M. Higher quality, still CPU-runnable; ~2x latency.",
21
+ options: { task: "<OCR>" },
22
+ },
23
+ {
24
+ id: "lightonai/LightOnOCR-2-1B",
25
+ label: "LightOnOCR-2-1B (premium, GPU recommended)",
26
+ description: "Pixtral encoder + Qwen3 decoder, 2.1B. Markdown output. Loads on CPU but slow under Rosetta.",
27
+ },
28
+ {
29
+ id: "PaddlePaddle/PaddleOCR-VL-1.5",
30
+ label: "PaddleOCR-VL-1.5 (GPU image)",
31
+ description: "Paddle's VLM-OCR, 1.5B. Six task modes. Available on the CUDA image.",
32
+ options: { task: "ocr" },
33
+ gpuRequired: true,
34
+ },
35
+ {
36
+ id: "zai-org/GLM-OCR",
37
+ label: "GLM-OCR (GPU only)",
38
+ description: "CogViT + GLM-0.5B decoder, 9B in bfloat16. Premium quality, needs ~18 GB VRAM.",
39
+ gpuRequired: true,
40
+ },
41
+ ];
42
+
43
+ export const STRUCTURED_MODELS: ModelOption[] = [
44
+ {
45
+ id: "naver-clova-ix/donut-base-finetuned-cord-v2",
46
+ label: "Donut on CORD (receipts)",
47
+ description: "Fine-tuned for the CORD receipt schema. Pixels in, nested JSON out.",
48
+ },
49
+ {
50
+ id: "naver-clova-ix/donut-base-finetuned-docvqa",
51
+ label: "Donut on DocVQA",
52
+ description: "Same Donut architecture, fine-tuned for visual question answering. Returns text answers.",
53
+ },
54
+ ];
55
+
56
+ export const NER_MODELS: ModelOption[] = [
57
+ {
58
+ id: "urchade/gliner_multi-v2.1",
59
+ label: "GLiNER multi (multilingual)",
60
+ description: "280M, zero-shot NER, 100+ languages. Good default.",
61
+ },
62
+ {
63
+ id: "urchade/gliner_large-v2.1",
64
+ label: "GLiNER large (English)",
65
+ description: "440M, English-focused, higher quality on English text.",
66
+ },
67
+ ];
68
+
69
+ export const config = {
70
+ sieUrl: process.env.SIE_URL ?? "http://localhost:8080",
71
+ sieApiKey: process.env.SIE_API_KEY,
72
+
73
+ defaults: {
74
+ recognition: RECOGNITION_MODELS[0].id,
75
+ structured: STRUCTURED_MODELS[0].id,
76
+ ner: NER_MODELS[0].id,
77
+ },
78
+
79
+ paths: {
80
+ samples: "data/samples/index.json",
81
+ sampleDir: "data/samples",
82
+ },
83
+
84
+ port: Number(process.env.PORT ?? 3032),
85
+ } as const;
src/donut.ts ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { detectImageFormat, type SIEClient } from "@superlinked/sie-sdk";
2
+ import type { DonutEntity } from "./types.js";
3
+
4
+ /** Run any image-input "structured" extractor (Donut variants, etc.). */
5
+ export async function structuredExtract(
6
+ client: SIEClient,
7
+ model: string,
8
+ imageBytes: Uint8Array,
9
+ options?: Record<string, unknown>,
10
+ ): Promise<{ entities: DonutEntity[]; data: unknown }> {
11
+ const format = detectImageFormat(imageBytes);
12
+ if (format === "unknown") throw new Error("could not detect image format");
13
+ const wire = { data: imageBytes, format };
14
+ const result = await client.extract(
15
+ model,
16
+ { images: [wire] as unknown as Uint8Array[] },
17
+ { labels: [], options } as unknown as Parameters<typeof client.extract>[2],
18
+ );
19
+ const entities = (result.entities ?? []).map((e) => ({
20
+ label: e.label,
21
+ text: e.text,
22
+ }));
23
+ const data = (result as unknown as { data?: unknown }).data;
24
+ return { entities, data };
25
+ }
src/events.ts ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { DonutEntity, ExtractedField } from "./types.js";
2
+
3
+ export type PipelineEvent =
4
+ | { type: "models"; data: { extractor: string; recognition: string; structured: string } }
5
+ | { type: "recognition_start"; data: { model: string } }
6
+ | { type: "recognition_chunk"; data: { textLen: number } }
7
+ | { type: "recognition_done"; data: { markdown: string; ms: number } }
8
+ | { type: "donut_start" }
9
+ | { type: "donut_done"; data: { entities: DonutEntity[]; rawData: unknown; ms: number } }
10
+ | { type: "gliner_start"; data: { labels: string[] } }
11
+ | { type: "gliner_done"; data: { fields: ExtractedField[]; ms: number } }
12
+ | { type: "done"; data: { totalMs: number } }
13
+ | { type: "error"; data: { message: string; stage: string } };
src/extract.ts ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { SIEClient } from "@superlinked/sie-sdk";
2
+ import type { ExtractedField } from "./types.js";
3
+
4
+ export async function extractFields(
5
+ client: SIEClient,
6
+ model: string,
7
+ text: string,
8
+ labels: string[],
9
+ ): Promise<ExtractedField[]> {
10
+ if (!text.trim()) return [];
11
+ const result = await client.extract(model, { text }, { labels, threshold: 0.4 });
12
+ return (result.entities ?? []).map((e) => ({
13
+ label: e.label,
14
+ text: e.text,
15
+ score: e.score,
16
+ }));
17
+ }
src/ocr.ts ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { detectImageFormat, type SIEClient } from "@superlinked/sie-sdk";
2
+
3
+ export async function recognize(
4
+ client: SIEClient,
5
+ model: string,
6
+ imageBytes: Uint8Array,
7
+ options?: Record<string, unknown>,
8
+ ): Promise<string> {
9
+ const format = detectImageFormat(imageBytes);
10
+ if (format === "unknown") throw new Error("could not detect image format");
11
+ const wire = { data: imageBytes, format };
12
+ // The TS SDK types declare images as Uint8Array[], but the wire format
13
+ // expects {data, format} dicts. Cast around the typing gap.
14
+ const result = await client.extract(
15
+ model,
16
+ { images: [wire] as unknown as Uint8Array[] },
17
+ // The TS SDK's ExtractOptions doesn't declare `options`, but the wire
18
+ // protocol forwards it to the adapter. Cast to bridge the typing gap.
19
+ { labels: [], options } as unknown as Parameters<typeof client.extract>[2],
20
+ );
21
+ if (!result.entities || result.entities.length === 0) return "";
22
+ const text = result.entities[0]?.text;
23
+ return typeof text === "string" ? text : "";
24
+ }
src/pipeline.ts ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import type { SIEClient } from "@superlinked/sie-sdk";
2
+ import { NER_MODELS, RECOGNITION_MODELS, STRUCTURED_MODELS } from "./config.js";
3
+ import { structuredExtract } from "./donut.js";
4
+ import type { PipelineEvent } from "./events.js";
5
+ import { extractFields } from "./extract.js";
6
+ import { recognize } from "./ocr.js";
7
+ import type { SampleDoc, TriageResult } from "./types.js";
8
+
9
+ export type RunInput = {
10
+ client: SIEClient;
11
+ imageBytes: Uint8Array;
12
+ sample: SampleDoc;
13
+ recognitionModel: string;
14
+ structuredModel: string;
15
+ nerModel: string;
16
+ emit: (event: PipelineEvent) => void;
17
+ };
18
+
19
+ function lookup<T extends { id: string }>(list: T[], id: string): T {
20
+ const found = list.find((m) => m.id === id);
21
+ if (!found) throw new Error(`unknown model id: ${id}`);
22
+ return found;
23
+ }
24
+
25
+ export async function runPipeline({
26
+ client,
27
+ imageBytes,
28
+ sample,
29
+ recognitionModel,
30
+ structuredModel,
31
+ nerModel,
32
+ emit,
33
+ }: RunInput): Promise<TriageResult> {
34
+ const t0 = Date.now();
35
+
36
+ emit({
37
+ type: "models",
38
+ data: { extractor: nerModel, recognition: recognitionModel, structured: structuredModel },
39
+ });
40
+
41
+ // Recognition
42
+ const recOpt = lookup(RECOGNITION_MODELS, recognitionModel);
43
+ emit({ type: "recognition_start", data: { model: recognitionModel } });
44
+ const tRec = Date.now();
45
+ let markdown = "";
46
+ try {
47
+ markdown = await recognize(client, recOpt.id, imageBytes, recOpt.options);
48
+ } catch (err) {
49
+ emit({
50
+ type: "error",
51
+ data: { stage: "recognition", message: `${recognitionModel} failed: ${(err as Error).message}` },
52
+ });
53
+ throw err;
54
+ }
55
+ const recognitionMs = Date.now() - tRec;
56
+ emit({ type: "recognition_done", data: { markdown, ms: recognitionMs } });
57
+
58
+ // Structured (Donut variants, etc.)
59
+ const strOpt = lookup(STRUCTURED_MODELS, structuredModel);
60
+ emit({ type: "donut_start" });
61
+ const tDon = Date.now();
62
+ let donut = { entities: [] as { label: string; text: string }[], data: undefined as unknown };
63
+ try {
64
+ donut = await structuredExtract(client, strOpt.id, imageBytes, strOpt.options);
65
+ } catch (err) {
66
+ emit({
67
+ type: "error",
68
+ data: { stage: "donut", message: `${structuredModel} failed: ${(err as Error).message}` },
69
+ });
70
+ }
71
+ const donutMs = Date.now() - tDon;
72
+ emit({ type: "donut_done", data: { entities: donut.entities, rawData: donut.data, ms: donutMs } });
73
+
74
+ // NER (GLiNER variants)
75
+ const nerOpt = lookup(NER_MODELS, nerModel);
76
+ emit({ type: "gliner_start", data: { labels: sample.labels } });
77
+ const tGli = Date.now();
78
+ let fields: { label: string; text: string; score: number }[] = [];
79
+ try {
80
+ fields = await extractFields(client, nerOpt.id, markdown, sample.labels);
81
+ } catch (err) {
82
+ emit({
83
+ type: "error",
84
+ data: { stage: "gliner", message: `${nerModel} failed: ${(err as Error).message}` },
85
+ });
86
+ }
87
+ const glinerMs = Date.now() - tGli;
88
+ emit({ type: "gliner_done", data: { fields, ms: glinerMs } });
89
+
90
+ const totalMs = Date.now() - t0;
91
+ emit({ type: "done", data: { totalMs } });
92
+
93
+ return {
94
+ sampleId: sample.id,
95
+ recognitionModel,
96
+ markdown,
97
+ donutEntities: donut.entities,
98
+ donutData: donut.data,
99
+ glinerFields: fields,
100
+ timings: { recognitionMs, donutMs, glinerMs, totalMs },
101
+ };
102
+ }
src/types.ts ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ export type SampleDoc = {
2
+ id: string;
3
+ filename: string;
4
+ label: string;
5
+ description: string;
6
+ labels: string[];
7
+ };
8
+
9
+ export type ExtractedField = {
10
+ label: string;
11
+ text: string;
12
+ score: number;
13
+ };
14
+
15
+ export type DonutEntity = {
16
+ label: string;
17
+ text: string;
18
+ };
19
+
20
+ export type TriageResult = {
21
+ sampleId: string;
22
+ recognitionModel: string;
23
+ markdown: string;
24
+ donutEntities: DonutEntity[];
25
+ donutData: unknown;
26
+ glinerFields: ExtractedField[];
27
+ timings: {
28
+ recognitionMs: number;
29
+ donutMs: number;
30
+ glinerMs: number;
31
+ totalMs: number;
32
+ };
33
+ };
tsconfig.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "compilerOptions": {
3
+ "target": "ES2022",
4
+ "module": "ESNext",
5
+ "moduleResolution": "bundler",
6
+ "esModuleInterop": true,
7
+ "allowSyntheticDefaultImports": true,
8
+ "strict": true,
9
+ "skipLibCheck": true,
10
+ "forceConsistentCasingInFileNames": true,
11
+ "resolveJsonModule": true,
12
+ "isolatedModules": true,
13
+ "noEmit": true
14
+ },
15
+ "include": ["src", "web"]
16
+ }
web/public/app.js ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ const els = {
2
+ badge: document.getElementById("badge"),
3
+ models: document.getElementById("models"),
4
+ sieState: document.getElementById("sie-state"),
5
+ events: document.getElementById("events"),
6
+ selectRecognition: document.getElementById("select-recognition"),
7
+ selectStructured: document.getElementById("select-structured"),
8
+ selectNer: document.getElementById("select-ner"),
9
+ recognition: document.getElementById("recognition"),
10
+ recognitionMeta: document.getElementById("recognition-meta"),
11
+ extraction: document.getElementById("extraction"),
12
+ extractionMeta: document.getElementById("extraction-meta"),
13
+ footer: document.getElementById("footer"),
14
+ sieUrl: document.getElementById("sie-url"),
15
+ timings: document.getElementById("timings"),
16
+ };
17
+
18
+ let activeSampleId = null;
19
+ let timings = { recognitionMs: 0, donutMs: 0, glinerMs: 0 };
20
+ let donutBuf = { entities: [], data: null };
21
+ let glinerBuf = [];
22
+ let modelConfig = null;
23
+ let registeredSet = new Set();
24
+
25
+ function setBadge(text, cls) {
26
+ els.badge.textContent = text;
27
+ els.badge.className = "badge" + (cls ? " " + cls : "");
28
+ }
29
+ function shortModel(id) {
30
+ if (!id) return "";
31
+ const slash = id.indexOf("/");
32
+ return slash === -1 ? id : id.slice(slash + 1);
33
+ }
34
+ function escapeHtml(s) {
35
+ return String(s).replace(
36
+ /[&<>"']/g,
37
+ (c) => ({ "&": "&amp;", "<": "&lt;", ">": "&gt;", '"': "&quot;", "'": "&#39;" })[c],
38
+ );
39
+ }
40
+
41
+ function populateDropdown(selectEl, options, defaultId) {
42
+ selectEl.innerHTML = "";
43
+ for (const opt of options) {
44
+ const node = document.createElement("option");
45
+ node.value = opt.id;
46
+ const available =
47
+ registeredSet.size === 0 || registeredSet.has(opt.id);
48
+ const labelSuffix = !available
49
+ ? opt.gpuRequired
50
+ ? " (GPU image needed)"
51
+ : " (not registered)"
52
+ : "";
53
+ node.textContent = opt.label + labelSuffix;
54
+ if (!available) node.disabled = true;
55
+ if (opt.id === defaultId) node.selected = true;
56
+ node.title = opt.description;
57
+ selectEl.appendChild(node);
58
+ }
59
+ }
60
+
61
+ function renderSamples(samples, onClick) {
62
+ if (!samples || samples.length === 0) {
63
+ els.events.innerHTML = '<p class="hint">no samples</p>';
64
+ return;
65
+ }
66
+ els.events.innerHTML = samples
67
+ .map(
68
+ (s) => `<div class="event" data-id="${escapeHtml(s.id)}">
69
+ <img src="/samples/${encodeURIComponent(s.filename)}" alt="${escapeHtml(s.label)}" />
70
+ <div>
71
+ <div class="label">${escapeHtml(s.label)}</div>
72
+ <div class="desc">${escapeHtml(s.description)}</div>
73
+ </div>
74
+ </div>`,
75
+ )
76
+ .join("");
77
+ for (const node of els.events.querySelectorAll(".event")) {
78
+ node.addEventListener("click", () => {
79
+ for (const n of els.events.querySelectorAll(".event")) n.classList.remove("active");
80
+ node.classList.add("active");
81
+ onClick(node.dataset.id);
82
+ });
83
+ }
84
+ }
85
+
86
+ function updateTimings() {
87
+ const total = timings.recognitionMs + timings.donutMs + timings.glinerMs;
88
+ els.timings.textContent =
89
+ total > 0
90
+ ? `recognition ${timings.recognitionMs}ms · structured ${timings.donutMs}ms · ner ${timings.glinerMs}ms · total ${total}ms`
91
+ : "";
92
+ }
93
+
94
+ function renderExtraction() {
95
+ let html = "";
96
+
97
+ if (glinerBuf.length > 0) {
98
+ html += `<div class="section"><h3>NER (${escapeHtml(shortModel(els.selectNer.value))})</h3>`;
99
+ for (const f of glinerBuf) {
100
+ html += `<div class="field">
101
+ <span class="label-name">${escapeHtml(f.label)}</span>
102
+ <span class="text">${escapeHtml(f.text)}</span>
103
+ <span class="score">${f.score.toFixed(2)}</span>
104
+ </div>`;
105
+ }
106
+ html += "</div>";
107
+ }
108
+
109
+ if (donutBuf.entities.length > 0) {
110
+ html += `<div class="section"><h3>Structured (${escapeHtml(shortModel(els.selectStructured.value))})</h3>`;
111
+ for (const e of donutBuf.entities.slice(0, 25)) {
112
+ html += `<div class="donut-row">
113
+ <span class="key">${escapeHtml(e.label)}</span>
114
+ <span class="val">${escapeHtml(e.text)}</span>
115
+ </div>`;
116
+ }
117
+ html += "</div>";
118
+ }
119
+
120
+ if (!html) html = '<p class="hint">running...</p>';
121
+ els.extraction.innerHTML = html;
122
+ }
123
+
124
+ function runSample(sampleId) {
125
+ activeSampleId = sampleId;
126
+ setBadge("running", "running");
127
+ els.recognition.innerHTML = '<p class="hint">running recognition...</p>';
128
+ els.extraction.innerHTML = '<p class="hint">waiting...</p>';
129
+ els.recognitionMeta.textContent = "";
130
+ els.extractionMeta.textContent = "";
131
+ timings = { recognitionMs: 0, donutMs: 0, glinerMs: 0 };
132
+ donutBuf = { entities: [], data: null };
133
+ glinerBuf = [];
134
+ updateTimings();
135
+
136
+ const recognition = els.selectRecognition.value;
137
+ const structured = els.selectStructured.value;
138
+ const ner = els.selectNer.value;
139
+ const url = `/api/run?id=${encodeURIComponent(sampleId)}&recognition=${encodeURIComponent(recognition)}&structured=${encodeURIComponent(structured)}&ner=${encodeURIComponent(ner)}`;
140
+ const es = new EventSource(url);
141
+
142
+ es.addEventListener("models", (e) => {
143
+ const d = JSON.parse(e.data);
144
+ els.models.innerHTML = `recognition: <code>${shortModel(d.recognition)}</code> · structured: <code>${shortModel(d.structured)}</code> · ner: <code>${shortModel(d.extractor)}</code>`;
145
+ });
146
+ es.addEventListener("recognition_start", () => {
147
+ els.recognitionMeta.textContent = "loading model + generating...";
148
+ });
149
+ es.addEventListener("recognition_done", (e) => {
150
+ const d = JSON.parse(e.data);
151
+ timings.recognitionMs = d.ms;
152
+ els.recognitionMeta.textContent = `${d.markdown.length} chars in ${d.ms}ms`;
153
+ els.recognition.textContent = d.markdown;
154
+ updateTimings();
155
+ });
156
+ es.addEventListener("donut_start", () => {
157
+ els.extractionMeta.textContent = "running structured...";
158
+ });
159
+ es.addEventListener("donut_done", (e) => {
160
+ const d = JSON.parse(e.data);
161
+ timings.donutMs = d.ms;
162
+ donutBuf = { entities: d.entities, data: d.rawData };
163
+ els.extractionMeta.textContent = `structured ${d.ms}ms`;
164
+ renderExtraction();
165
+ updateTimings();
166
+ });
167
+ es.addEventListener("gliner_start", () => {
168
+ els.extractionMeta.textContent = "running NER...";
169
+ });
170
+ es.addEventListener("gliner_done", (e) => {
171
+ const d = JSON.parse(e.data);
172
+ timings.glinerMs = d.ms;
173
+ glinerBuf = d.fields;
174
+ els.extractionMeta.textContent = `ner ${d.ms}ms · ${d.fields.length} fields`;
175
+ renderExtraction();
176
+ updateTimings();
177
+ });
178
+ es.addEventListener("done", (e) => {
179
+ const d = JSON.parse(e.data);
180
+ setBadge(`done ${d.totalMs}ms`, "green");
181
+ es.close();
182
+ });
183
+ es.addEventListener("error", (e) => {
184
+ setBadge("error", "red");
185
+ if (e.data) {
186
+ try {
187
+ const m = JSON.parse(e.data);
188
+ els.recognitionMeta.textContent = `${m.stage}: ${m.message}`;
189
+ } catch {
190
+ /* */
191
+ }
192
+ }
193
+ es.close();
194
+ });
195
+ }
196
+
197
+ async function init() {
198
+ // Fetch SIE health (and registered models)
199
+ let registered = [];
200
+ try {
201
+ const r = await fetch("/api/health");
202
+ const j = await r.json();
203
+ els.sieUrl.textContent = j.sieUrl;
204
+ if (!j.sie) {
205
+ els.sieState.textContent = "SIE not reachable yet (still preloading models?)";
206
+ } else {
207
+ els.sieState.textContent = `SIE healthy · ${j.registeredModels} models registered`;
208
+ registered = j.registered ?? [];
209
+ }
210
+ } catch {
211
+ els.sieState.textContent = "could not reach the local server";
212
+ }
213
+ registeredSet = new Set(registered);
214
+
215
+ // Fetch model menus (config-side)
216
+ try {
217
+ const r = await fetch("/api/models");
218
+ modelConfig = await r.json();
219
+ populateDropdown(els.selectRecognition, modelConfig.recognition, modelConfig.defaults.recognition);
220
+ populateDropdown(els.selectStructured, modelConfig.structured, modelConfig.defaults.structured);
221
+ populateDropdown(els.selectNer, modelConfig.ner, modelConfig.defaults.ner);
222
+ } catch (e) {
223
+ console.error("failed to load model config", e);
224
+ }
225
+
226
+ // Fetch sample documents
227
+ try {
228
+ const r = await fetch("/api/samples");
229
+ const samples = await r.json();
230
+ renderSamples(samples, runSample);
231
+ } catch {
232
+ els.events.innerHTML = '<p class="hint">failed to load samples</p>';
233
+ }
234
+ }
235
+
236
+ init();
web/public/index.html ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8" />
5
+ <meta name="viewport" content="width=device-width,initial-scale=1" />
6
+ <title>document-ocr</title>
7
+ <link rel="stylesheet" href="/static/style.css" />
8
+ </head>
9
+ <body>
10
+ <header>
11
+ <div class="title">
12
+ <span class="logo">📄</span>
13
+ <h1>document-ocr</h1>
14
+ <span class="badge" id="badge">idle</span>
15
+ </div>
16
+ <div class="meta" id="models">SIE: <code>...</code></div>
17
+ <div class="meta" id="sie-state">checking SIE...</div>
18
+ </header>
19
+
20
+ <section class="hero">
21
+ <p>
22
+ OCR is rarely a single-model problem. This demo runs three model
23
+ classes through one SIE server: a <strong>VLM-OCR</strong> recognizes
24
+ the document into Markdown, a <strong>fine-tuned Donut</strong> emits
25
+ a JSON tree directly, and a <strong>zero-shot NER (GLiNER)</strong>
26
+ pulls typed fields out of the recognition output. Pick a sample on
27
+ the left, swap any of the three models on the right, watch SIE
28
+ hot-swap them with one identifier change.
29
+ </p>
30
+ </section>
31
+
32
+ <main>
33
+ <section class="panel" id="panel-events">
34
+ <header><h2>Sample documents</h2></header>
35
+ <div class="meta-row">
36
+ <label class="model-pick">
37
+ <span class="dropdown-label">Recognition</span>
38
+ <select id="select-recognition"></select>
39
+ </label>
40
+ <label class="model-pick">
41
+ <span class="dropdown-label">Structured</span>
42
+ <select id="select-structured"></select>
43
+ </label>
44
+ <label class="model-pick">
45
+ <span class="dropdown-label">NER</span>
46
+ <select id="select-ner"></select>
47
+ </label>
48
+ </div>
49
+ <div class="list" id="events">loading...</div>
50
+ </section>
51
+
52
+ <section class="panel" id="panel-recognition">
53
+ <header>
54
+ <h2>Recognition (Markdown)</h2>
55
+ <span class="hint" id="recognition-meta"></span>
56
+ </header>
57
+ <div class="markdown" id="recognition">
58
+ <p class="hint">Click a sample on the left.</p>
59
+ </div>
60
+ </section>
61
+
62
+ <section class="panel" id="panel-extraction">
63
+ <header>
64
+ <h2>Extraction</h2>
65
+ <span class="hint" id="extraction-meta"></span>
66
+ </header>
67
+ <div class="extraction" id="extraction">
68
+ <p class="hint">Typed fields will appear here.</p>
69
+ </div>
70
+ </section>
71
+ </main>
72
+
73
+ <footer>
74
+ <span id="footer">SIE on <code id="sie-url">http://localhost:8080</code></span>
75
+ <span id="timings"></span>
76
+ </footer>
77
+
78
+ <script src="/static/app.js"></script>
79
+ </body>
80
+ </html>
web/public/style.css ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ :root {
2
+ --bg: #0e1014;
3
+ --panel: #161a22;
4
+ --line: #232936;
5
+ --text: #e6e8ec;
6
+ --muted: #8b94a7;
7
+ --accent: #7c5dff;
8
+ --accent-2: #62b6ff;
9
+ --green: #5fd28b;
10
+ --red: #ff6b6b;
11
+ --yellow: #f7c948;
12
+ --magenta: #c89cff;
13
+ }
14
+
15
+ * { box-sizing: border-box; }
16
+ html, body {
17
+ margin: 0; padding: 0; background: var(--bg); color: var(--text);
18
+ font: 14px ui-monospace, SFMono-Regular, Menlo, monospace;
19
+ height: 100%;
20
+ }
21
+ body { display: flex; flex-direction: column; }
22
+
23
+ header {
24
+ display: flex; align-items: center; gap: 16px;
25
+ padding: 12px 20px; border-bottom: 1px solid var(--line);
26
+ }
27
+ .title { display: flex; align-items: center; gap: 12px; }
28
+ .logo { font-size: 18px; }
29
+ h1 { font-size: 16px; margin: 0; font-weight: 600; }
30
+ .badge {
31
+ padding: 3px 10px; border-radius: 999px; background: var(--line);
32
+ color: var(--muted); font-size: 11px; text-transform: uppercase;
33
+ letter-spacing: 0.5px;
34
+ }
35
+ .badge.running { background: rgba(98, 182, 255, 0.15); color: var(--accent-2); }
36
+ .badge.green { background: rgba(95, 210, 139, 0.18); color: var(--green); }
37
+ .badge.red { background: rgba(255, 107, 107, 0.18); color: var(--red); }
38
+ .meta { color: var(--muted); font-size: 12px; }
39
+
40
+ .hero {
41
+ padding: 14px 20px; background: linear-gradient(180deg, rgba(124,93,255,0.10), transparent);
42
+ border-bottom: 1px solid var(--line);
43
+ }
44
+ .hero p { margin: 0; color: var(--muted); max-width: 980px; line-height: 1.6; }
45
+ .hero strong { color: var(--text); }
46
+
47
+ main {
48
+ flex: 1; display: grid;
49
+ grid-template-columns: 0.95fr 1.4fr 1.2fr;
50
+ gap: 12px; padding: 12px 20px; overflow: hidden;
51
+ }
52
+
53
+ .panel {
54
+ display: flex; flex-direction: column;
55
+ background: var(--panel); border: 1px solid var(--line);
56
+ border-radius: 10px; overflow: hidden; min-height: 0;
57
+ }
58
+ .panel header {
59
+ padding: 10px 14px; border-bottom: 1px solid var(--line);
60
+ display: flex; justify-content: space-between; align-items: baseline;
61
+ }
62
+ .panel h2 {
63
+ font-size: 12px; letter-spacing: 0.6px; text-transform: uppercase;
64
+ margin: 0; color: var(--muted);
65
+ }
66
+ #panel-events h2 { color: var(--accent); }
67
+ #panel-recognition h2 { color: var(--accent-2); }
68
+ #panel-extraction h2 { color: var(--magenta); }
69
+
70
+ .hint { color: var(--muted); font-size: 11px; }
71
+
72
+ .list, .markdown, .extraction {
73
+ flex: 1; overflow: auto; padding: 12px 14px; margin: 0;
74
+ }
75
+
76
+ .meta-row {
77
+ padding: 10px 14px; border-bottom: 1px solid var(--line);
78
+ font-size: 11px; color: var(--muted);
79
+ display: flex; flex-direction: column; gap: 6px;
80
+ }
81
+ .model-pick {
82
+ display: flex; align-items: center; gap: 8px;
83
+ font-size: 11px;
84
+ }
85
+ .dropdown-label {
86
+ color: var(--muted); text-transform: uppercase;
87
+ letter-spacing: 0.5px; min-width: 80px;
88
+ }
89
+ .meta-row select {
90
+ flex: 1; min-width: 0;
91
+ background: var(--bg); color: var(--text); border: 1px solid var(--line);
92
+ border-radius: 6px; padding: 4px 8px; font: inherit;
93
+ font-size: 11px;
94
+ }
95
+ .meta-row select:disabled, .meta-row option:disabled {
96
+ color: var(--muted);
97
+ }
98
+
99
+ .event {
100
+ padding: 10px 12px; border: 1px solid var(--line); border-radius: 8px;
101
+ margin-bottom: 10px; cursor: pointer;
102
+ transition: border-color 0.1s;
103
+ display: flex; gap: 12px; align-items: center;
104
+ }
105
+ .event:hover { border-color: var(--accent); }
106
+ .event.active { border-color: var(--accent); background: rgba(124,93,255,0.08); }
107
+ .event img {
108
+ width: 64px; height: 64px; object-fit: cover;
109
+ border-radius: 4px; background: #fff;
110
+ flex-shrink: 0;
111
+ }
112
+ .event .label { font-weight: 600; color: var(--text); font-size: 13px; }
113
+ .event .desc { color: var(--muted); font-size: 11px; margin-top: 2px; line-height: 1.4; }
114
+
115
+ .markdown {
116
+ white-space: pre-wrap; word-break: break-word;
117
+ font-size: 13px; line-height: 1.55;
118
+ }
119
+
120
+ .extraction .section {
121
+ margin-bottom: 14px; padding-bottom: 10px;
122
+ border-bottom: 1px dashed var(--line);
123
+ }
124
+ .extraction .section:last-child { border-bottom: 0; margin-bottom: 0; }
125
+ .extraction h3 {
126
+ font-size: 11px; letter-spacing: 0.6px; text-transform: uppercase;
127
+ color: var(--muted); margin: 0 0 6px 0; font-weight: 500;
128
+ }
129
+ .field {
130
+ display: flex; gap: 8px; padding: 4px 0;
131
+ font-size: 12px;
132
+ }
133
+ .field .label-name {
134
+ color: var(--accent-2); font-weight: 500; min-width: 90px;
135
+ }
136
+ .field .text { color: var(--text); flex: 1; }
137
+ .field .score { color: var(--muted); font-size: 11px; }
138
+
139
+ .donut-row { padding: 3px 0; font-size: 12px; }
140
+ .donut-row .key { color: var(--magenta); }
141
+ .donut-row .val { color: var(--text); margin-left: 8px; }
142
+
143
+ footer {
144
+ padding: 10px 20px; border-top: 1px solid var(--line);
145
+ color: var(--muted); font-size: 12px;
146
+ display: flex; justify-content: space-between; gap: 12px;
147
+ }
148
+ code { background: var(--line); padding: 1px 6px; border-radius: 4px; }
web/server.ts ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import fs from "node:fs";
2
+ import http from "node:http";
3
+ import path from "node:path";
4
+ import { spawnSync } from "node:child_process";
5
+ import { fileURLToPath } from "node:url";
6
+ import { SIEClient } from "@superlinked/sie-sdk";
7
+ import { NER_MODELS, RECOGNITION_MODELS, STRUCTURED_MODELS, config } from "../src/config.js";
8
+ import type { PipelineEvent } from "../src/events.js";
9
+ import { runPipeline } from "../src/pipeline.js";
10
+ import type { SampleDoc } from "../src/types.js";
11
+
12
+ const __dirname = path.dirname(fileURLToPath(import.meta.url));
13
+ const ROOT = path.resolve(__dirname, "..");
14
+ const PUBLIC_DIR = path.resolve(__dirname, "public");
15
+ const SAMPLES_PATH = path.resolve(ROOT, config.paths.samples);
16
+ const SAMPLE_DIR = path.resolve(ROOT, config.paths.sampleDir);
17
+
18
+ const MIME: Record<string, string> = {
19
+ ".html": "text/html; charset=utf-8",
20
+ ".css": "text/css; charset=utf-8",
21
+ ".js": "text/javascript; charset=utf-8",
22
+ ".json": "application/json",
23
+ ".png": "image/png",
24
+ ".jpg": "image/jpeg",
25
+ ".jpeg": "image/jpeg",
26
+ };
27
+
28
+ function send(res: http.ServerResponse, status: number, body: string, type = "text/plain") {
29
+ res.writeHead(status, { "content-type": type });
30
+ res.end(body);
31
+ }
32
+
33
+ function serveFile(res: http.ServerResponse, file: string) {
34
+ if (!fs.existsSync(file)) return send(res, 404, "not found");
35
+ const ext = path.extname(file).toLowerCase();
36
+ res.writeHead(200, { "content-type": MIME[ext] ?? "application/octet-stream" });
37
+ fs.createReadStream(file).pipe(res);
38
+ }
39
+
40
+ function setupSse(res: http.ServerResponse) {
41
+ res.writeHead(200, {
42
+ "content-type": "text/event-stream",
43
+ "cache-control": "no-cache",
44
+ connection: "keep-alive",
45
+ });
46
+ return (event: { type: string; data?: unknown }) => {
47
+ res.write(`event: ${event.type}\n`);
48
+ res.write(`data: ${JSON.stringify(event.data ?? null)}\n\n`);
49
+ };
50
+ }
51
+
52
+ async function fetchModels(): Promise<{ ok: boolean; names: string[] }> {
53
+ try {
54
+ const r = await fetch(`${config.sieUrl}/v1/models`, { signal: AbortSignal.timeout(3000) });
55
+ if (!r.ok) return { ok: false, names: [] };
56
+ const json = (await r.json()) as { models?: { name: string }[] };
57
+ return { ok: true, names: (json.models ?? []).map((m) => m.name) };
58
+ } catch {
59
+ return { ok: false, names: [] };
60
+ }
61
+ }
62
+
63
+ function readJson<T>(p: string): T {
64
+ return JSON.parse(fs.readFileSync(p, "utf8")) as T;
65
+ }
66
+
67
+ async function handleRun(
68
+ req: http.IncomingMessage,
69
+ res: http.ServerResponse,
70
+ sampleId: string,
71
+ recognitionModel: string,
72
+ structuredModel: string,
73
+ nerModel: string,
74
+ ) {
75
+ const push = setupSse(res);
76
+ let closed = false;
77
+ req.on("close", () => {
78
+ closed = true;
79
+ });
80
+
81
+ const samples = readJson<SampleDoc[]>(SAMPLES_PATH);
82
+ const sample = samples.find((s) => s.id === sampleId);
83
+ if (!sample) {
84
+ push({ type: "error", data: { stage: "lookup", message: `unknown sample id: ${sampleId}` } });
85
+ return res.end();
86
+ }
87
+
88
+ const imagePath = path.resolve(SAMPLE_DIR, sample.filename);
89
+ if (!fs.existsSync(imagePath)) {
90
+ push({ type: "error", data: { stage: "lookup", message: `sample image not found: ${sample.filename}` } });
91
+ return res.end();
92
+ }
93
+ const imageBytes = fs.readFileSync(imagePath);
94
+
95
+ const client = new SIEClient(config.sieUrl, {
96
+ apiKey: config.sieApiKey,
97
+ timeout: 600_000, // 10 min request timeout (CPU + Rosetta is slow)
98
+ waitForCapacity: true, // retry while a model is warming up
99
+ provisionTimeout: 900_000, // 15 min ceiling on cold-load polling
100
+ });
101
+
102
+ try {
103
+ await runPipeline({
104
+ client,
105
+ imageBytes,
106
+ sample,
107
+ recognitionModel,
108
+ structuredModel,
109
+ nerModel,
110
+ emit: (event: PipelineEvent) => {
111
+ if (closed) return;
112
+ push({ type: event.type, data: "data" in event ? event.data : null });
113
+ },
114
+ });
115
+ } catch (err) {
116
+ push({ type: "error", data: { stage: "pipeline", message: (err as Error).message } });
117
+ } finally {
118
+ await client.close().catch(() => {});
119
+ res.end();
120
+ }
121
+ }
122
+
123
+ const server = http.createServer(async (req, res) => {
124
+ const url = new URL(req.url ?? "/", `http://${req.headers.host}`);
125
+ const p = url.pathname;
126
+
127
+ if (p === "/" || p === "/index.html") return serveFile(res, path.join(PUBLIC_DIR, "index.html"));
128
+ if (p.startsWith("/static/")) return serveFile(res, path.join(PUBLIC_DIR, p.slice("/static/".length)));
129
+ if (p.startsWith("/samples/")) return serveFile(res, path.join(SAMPLE_DIR, p.slice("/samples/".length)));
130
+
131
+ if (p === "/api/health") {
132
+ const { ok, names } = await fetchModels();
133
+ return send(
134
+ res,
135
+ 200,
136
+ JSON.stringify({
137
+ sie: ok,
138
+ sieUrl: config.sieUrl,
139
+ registeredModels: names.length,
140
+ registered: names,
141
+ }),
142
+ "application/json",
143
+ );
144
+ }
145
+ if (p === "/api/models") {
146
+ return send(
147
+ res,
148
+ 200,
149
+ JSON.stringify({
150
+ recognition: RECOGNITION_MODELS,
151
+ structured: STRUCTURED_MODELS,
152
+ ner: NER_MODELS,
153
+ defaults: config.defaults,
154
+ }),
155
+ "application/json",
156
+ );
157
+ }
158
+ if (p === "/api/samples") {
159
+ return send(res, 200, fs.readFileSync(SAMPLES_PATH, "utf8"), "application/json");
160
+ }
161
+ if (p === "/api/run") {
162
+ const id = url.searchParams.get("id");
163
+ const recognitionModel =
164
+ url.searchParams.get("recognition") ?? config.defaults.recognition;
165
+ const structuredModel = url.searchParams.get("structured") ?? config.defaults.structured;
166
+ const nerModel = url.searchParams.get("ner") ?? config.defaults.ner;
167
+ if (!id) return send(res, 400, "missing id");
168
+ return handleRun(req, res, id, recognitionModel, structuredModel, nerModel);
169
+ }
170
+
171
+ return send(res, 404, "not found");
172
+ });
173
+
174
+ server.listen(config.port, () => {
175
+ const url = `http://localhost:${config.port}`;
176
+ console.log(`document-ocr ui: ${url}`);
177
+ if (process.env.OPEN_BROWSER !== "0") {
178
+ const opener =
179
+ process.platform === "darwin"
180
+ ? "open"
181
+ : process.platform === "win32"
182
+ ? "start"
183
+ : "xdg-open";
184
+ spawnSync(opener, [url], { stdio: "ignore" });
185
+ }
186
+ });