Spaces:
Running
Running
Initial: document-ocr demo for HF Spaces
Browse filesSingle-container build of the document-ocr demo from
superlinked/brave-new-demos. Extends ghcr.io/superlinked/sie-server:latest-cpu-default
with Node 22, runs SIE on 127.0.0.1:8080 and the UI on 0.0.0.0:7860.
Preloads three small models from the default bundle (Florence-2-base,
Donut-CORD-v2, GLiNER-multi). Alternates are dropdown-selectable and
lazy-load on first click.
HF persistent /data is used as HF_HOME so model weights survive Space
restarts.
- Dockerfile +45 -0
- README.md +63 -4
- data/samples/README.md +18 -0
- data/samples/business-card.png +0 -0
- data/samples/handwritten.png +0 -0
- data/samples/index.json +84 -0
- data/samples/invoice.png +0 -0
- data/samples/multi-column.png +0 -0
- data/samples/receipt.png +0 -0
- data/samples/table.png +0 -0
- hf-entrypoint.sh +36 -0
- package.json +24 -0
- src/config.ts +85 -0
- src/donut.ts +25 -0
- src/events.ts +13 -0
- src/extract.ts +17 -0
- src/ocr.ts +24 -0
- src/pipeline.ts +102 -0
- src/types.ts +33 -0
- tsconfig.json +16 -0
- web/public/app.js +236 -0
- web/public/index.html +80 -0
- web/public/style.css +148 -0
- web/server.ts +186 -0
Dockerfile
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hugging Face Spaces image. Extends the official SIE server image, adds
|
| 2 |
+
# Node 22 for the UI server, and runs both processes from one container.
|
| 3 |
+
#
|
| 4 |
+
# This Dockerfile is HF Spaces specific. Local Docker users should use
|
| 5 |
+
# compose.yml (which just runs the unmodified upstream SIE image with the
|
| 6 |
+
# Node UI on the host).
|
| 7 |
+
FROM ghcr.io/superlinked/sie-server:latest-cpu-default
|
| 8 |
+
|
| 9 |
+
USER root
|
| 10 |
+
|
| 11 |
+
# Node 22 for the UI server (tsx + node:http + our Express-free server.ts)
|
| 12 |
+
# The SIE base image is minimal; install curl + ca-certificates before
|
| 13 |
+
# pulling NodeSource's setup script.
|
| 14 |
+
RUN apt-get update \
|
| 15 |
+
&& apt-get install -y --no-install-recommends curl ca-certificates gnupg \
|
| 16 |
+
&& curl -fsSL https://deb.nodesource.com/setup_22.x | bash - \
|
| 17 |
+
&& apt-get install -y --no-install-recommends nodejs \
|
| 18 |
+
&& apt-get clean \
|
| 19 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 20 |
+
|
| 21 |
+
# HF Spaces' persistent storage lives at /data (50 GB free tier).
|
| 22 |
+
# Send the HuggingFace cache there so model weights survive Space restarts.
|
| 23 |
+
ENV HF_HOME=/data/.cache/huggingface
|
| 24 |
+
|
| 25 |
+
# UI server lives under /app/ui (the SIE base image already owns /app/...)
|
| 26 |
+
WORKDIR /app/ui
|
| 27 |
+
|
| 28 |
+
COPY package.json tsconfig.json ./
|
| 29 |
+
RUN npm install --silent
|
| 30 |
+
|
| 31 |
+
COPY src ./src
|
| 32 |
+
COPY web ./web
|
| 33 |
+
COPY data ./data
|
| 34 |
+
|
| 35 |
+
# Entrypoint script orchestrates both processes (SIE in background, UI in foreground).
|
| 36 |
+
COPY hf-entrypoint.sh /usr/local/bin/hf-entrypoint.sh
|
| 37 |
+
RUN chmod +x /usr/local/bin/hf-entrypoint.sh
|
| 38 |
+
|
| 39 |
+
# Make /app/ui world-readable so any HF Space user account can access it.
|
| 40 |
+
# Also make /data writable so the HF cache can be written there.
|
| 41 |
+
RUN chmod -R a+rx /app/ui && mkdir -p /data && chmod -R a+rwx /data
|
| 42 |
+
|
| 43 |
+
EXPOSE 7860
|
| 44 |
+
|
| 45 |
+
ENTRYPOINT ["/usr/local/bin/hf-entrypoint.sh"]
|
README.md
CHANGED
|
@@ -1,10 +1,69 @@
|
|
| 1 |
---
|
| 2 |
-
title: Document
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: blue
|
| 6 |
sdk: docker
|
|
|
|
| 7 |
pinned: false
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Document OCR
|
| 3 |
+
emoji: 📄
|
| 4 |
+
colorFrom: purple
|
| 5 |
colorTo: blue
|
| 6 |
sdk: docker
|
| 7 |
+
app_port: 7860
|
| 8 |
pinned: false
|
| 9 |
+
short_description: Multi-model OCR pipeline running on SIE
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Document OCR
|
| 13 |
+
|
| 14 |
+
A small Hugging Face Space that runs three different OCR-class models through one
|
| 15 |
+
[SIE](https://github.com/superlinked/sie) inference engine. Pick a sample
|
| 16 |
+
document on the left, swap any of the three models in the dropdowns, watch
|
| 17 |
+
SIE hot-swap them with one identifier change.
|
| 18 |
+
|
| 19 |
+
## What runs in this Space
|
| 20 |
+
|
| 21 |
+
A single Docker container with two processes:
|
| 22 |
+
|
| 23 |
+
- `sie-server` (the SIE inference engine) on `127.0.0.1:8080`, preloading
|
| 24 |
+
three small models from the default bundle at boot.
|
| 25 |
+
- A small Node web server on `0.0.0.0:7860` that serves the UI and
|
| 26 |
+
proxies requests to SIE via SSE.
|
| 27 |
+
|
| 28 |
+
Both are baked into one image extending `ghcr.io/superlinked/sie-server:latest-cpu-default`.
|
| 29 |
+
HF Spaces' persistent `/data` directory is used as the HuggingFace cache so
|
| 30 |
+
model weights survive Space restarts.
|
| 31 |
+
|
| 32 |
+
## Model lineup
|
| 33 |
+
|
| 34 |
+
| Stage | Default (preloaded) | Alternates (lazy-load on click) |
|
| 35 |
+
|---|---|---|
|
| 36 |
+
| Recognition | `microsoft/Florence-2-base` (270M) | Florence-2-large, LightOnOCR-2-1B, GLM-OCR, PaddleOCR-VL |
|
| 37 |
+
| Structured | `naver-clova-ix/donut-base-finetuned-cord-v2` | Donut-DocVQA |
|
| 38 |
+
| NER | `urchade/gliner_multi-v2.1` | GLiNER-large |
|
| 39 |
+
|
| 40 |
+
The default trio is ~1 GB total. Alternates are listed in the dropdowns but
|
| 41 |
+
only load when first clicked; some are GPU-only.
|
| 42 |
+
|
| 43 |
+
## What SIE provides here
|
| 44 |
+
|
| 45 |
+
Three different model architectures, one API:
|
| 46 |
+
|
| 47 |
+
```
|
| 48 |
+
client.extract(model_id, { images: [bytes] })
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
The model ID alone decides whether you get VLM Markdown (Florence-2,
|
| 52 |
+
LightOnOCR), structured JSON (Donut), or typed entities (GLiNER). No
|
| 53 |
+
separate auth, no separate rate limit, no separate deployment story.
|
| 54 |
+
|
| 55 |
+
## Source
|
| 56 |
+
|
| 57 |
+
Built from the `document-ocr` demo in
|
| 58 |
+
[superlinked/brave-new-demos](https://github.com/superlinked/brave-new-demos/tree/main/document-ocr).
|
| 59 |
+
The local-Docker version uses `docker compose` against the same upstream
|
| 60 |
+
SIE image; this Space packages everything into one container for HF.
|
| 61 |
+
|
| 62 |
+
## Performance note
|
| 63 |
+
|
| 64 |
+
This Space runs on HF's free CPU tier (2 vCPU, 16 GB RAM). Per-sample
|
| 65 |
+
latency is in the 60-90 s range on the default Florence-2 + Donut + GLiNER
|
| 66 |
+
trio; recognition is the slow step. On a GPU Space (paid), Florence-2 drops
|
| 67 |
+
to a few seconds and the heavier models like GLM-OCR become tractable.
|
| 68 |
+
|
| 69 |
+
Built on [SIE](https://github.com/superlinked/sie) (Apache 2.0).
|
data/samples/README.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Bundled sample documents
|
| 2 |
+
|
| 3 |
+
Six synthetic, public-domain images covering the document shapes most
|
| 4 |
+
real-world OCR pipelines hit:
|
| 5 |
+
|
| 6 |
+
- `receipt.png`: printed grocery receipt, line items + totals
|
| 7 |
+
- `invoice.png`: vendor invoice, multi-column form layout
|
| 8 |
+
- `business-card.png`: tight contact card, mixed text sizes
|
| 9 |
+
- `table.png`: dense numerical table with totals row
|
| 10 |
+
- `handwritten.png`: jittered text that simulates informal handwriting
|
| 11 |
+
- `multi-column.png`: two-column newspaper-style layout where reading order
|
| 12 |
+
matters
|
| 13 |
+
|
| 14 |
+
`index.json` carries metadata for each: the GLiNER labels we ask for, plus a
|
| 15 |
+
short description shown in the UI.
|
| 16 |
+
|
| 17 |
+
Regenerate with `python scripts/generate_samples.py`. Pillow is the only
|
| 18 |
+
dep; no real customer data is involved.
|
data/samples/business-card.png
ADDED
|
data/samples/handwritten.png
ADDED
|
data/samples/index.json
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "receipt",
|
| 4 |
+
"filename": "receipt.png",
|
| 5 |
+
"label": "Grocery receipt",
|
| 6 |
+
"description": "Printed receipt with line items, subtotal, tax, total. Clean text, simple table layout.",
|
| 7 |
+
"labels": [
|
| 8 |
+
"merchant",
|
| 9 |
+
"date",
|
| 10 |
+
"line_item",
|
| 11 |
+
"subtotal",
|
| 12 |
+
"tax",
|
| 13 |
+
"total",
|
| 14 |
+
"payment_method"
|
| 15 |
+
]
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
"id": "invoice",
|
| 19 |
+
"filename": "invoice.png",
|
| 20 |
+
"label": "Vendor invoice",
|
| 21 |
+
"description": "Multi-column invoice with billing party, line items, subtotal, tax, total. Form-style layout.",
|
| 22 |
+
"labels": [
|
| 23 |
+
"vendor",
|
| 24 |
+
"invoice_number",
|
| 25 |
+
"date",
|
| 26 |
+
"due_date",
|
| 27 |
+
"billing_party",
|
| 28 |
+
"line_item",
|
| 29 |
+
"total"
|
| 30 |
+
]
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"id": "business-card",
|
| 34 |
+
"filename": "business-card.png",
|
| 35 |
+
"label": "Business card",
|
| 36 |
+
"description": "Tight layout, mixed text sizes, multiple contact fields.",
|
| 37 |
+
"labels": [
|
| 38 |
+
"company",
|
| 39 |
+
"person",
|
| 40 |
+
"role",
|
| 41 |
+
"email",
|
| 42 |
+
"phone",
|
| 43 |
+
"address",
|
| 44 |
+
"website"
|
| 45 |
+
]
|
| 46 |
+
},
|
| 47 |
+
{
|
| 48 |
+
"id": "table",
|
| 49 |
+
"filename": "table.png",
|
| 50 |
+
"label": "Quarterly table",
|
| 51 |
+
"description": "Dense numerical table with totals row. Tests table-structure recognition.",
|
| 52 |
+
"labels": [
|
| 53 |
+
"department",
|
| 54 |
+
"headcount",
|
| 55 |
+
"amount",
|
| 56 |
+
"category"
|
| 57 |
+
]
|
| 58 |
+
},
|
| 59 |
+
{
|
| 60 |
+
"id": "handwritten",
|
| 61 |
+
"filename": "handwritten.png",
|
| 62 |
+
"label": "Casual notes",
|
| 63 |
+
"description": "Jittered text simulating informal handwriting; non-template content.",
|
| 64 |
+
"labels": [
|
| 65 |
+
"task",
|
| 66 |
+
"person",
|
| 67 |
+
"place",
|
| 68 |
+
"amount"
|
| 69 |
+
]
|
| 70 |
+
},
|
| 71 |
+
{
|
| 72 |
+
"id": "multi-column",
|
| 73 |
+
"filename": "multi-column.png",
|
| 74 |
+
"label": "Newspaper page",
|
| 75 |
+
"description": "Two-column newspaper-style layout. Reading order matters.",
|
| 76 |
+
"labels": [
|
| 77 |
+
"headline",
|
| 78 |
+
"person",
|
| 79 |
+
"organization",
|
| 80 |
+
"place",
|
| 81 |
+
"date"
|
| 82 |
+
]
|
| 83 |
+
}
|
| 84 |
+
]
|
data/samples/invoice.png
ADDED
|
data/samples/multi-column.png
ADDED
|
data/samples/receipt.png
ADDED
|
data/samples/table.png
ADDED
|
hf-entrypoint.sh
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Boot SIE in the background, then run the UI server in the foreground.
|
| 3 |
+
# Single-container layout for Hugging Face Spaces.
|
| 4 |
+
set -euo pipefail
|
| 5 |
+
|
| 6 |
+
# Models to preload at SIE startup. Same trio the local compose uses.
|
| 7 |
+
PRELOAD="microsoft/Florence-2-base,naver-clova-ix/donut-base-finetuned-cord-v2,urchade/gliner_multi-v2.1"
|
| 8 |
+
|
| 9 |
+
echo "[hf-entrypoint] starting sie-server on 127.0.0.1:8080 with preload=$PRELOAD"
|
| 10 |
+
sie-server serve \
|
| 11 |
+
--host 127.0.0.1 \
|
| 12 |
+
--port 8080 \
|
| 13 |
+
--preload "$PRELOAD" &
|
| 14 |
+
SIE_PID=$!
|
| 15 |
+
|
| 16 |
+
# Wait up to 20 minutes for SIE to come up. First boot pulls model weights
|
| 17 |
+
# from HF; subsequent restarts reuse the /data cache and start in seconds.
|
| 18 |
+
echo "[hf-entrypoint] waiting for /healthz"
|
| 19 |
+
for i in $(seq 1 1200); do
|
| 20 |
+
if curl -fsS http://127.0.0.1:8080/healthz > /dev/null 2>&1; then
|
| 21 |
+
echo "[hf-entrypoint] sie healthy in ~${i}s"
|
| 22 |
+
break
|
| 23 |
+
fi
|
| 24 |
+
if ! kill -0 "$SIE_PID" 2>/dev/null; then
|
| 25 |
+
echo "[hf-entrypoint] sie-server died before becoming healthy"
|
| 26 |
+
exit 1
|
| 27 |
+
fi
|
| 28 |
+
sleep 1
|
| 29 |
+
done
|
| 30 |
+
|
| 31 |
+
cd /app/ui
|
| 32 |
+
export PORT="${PORT:-7860}"
|
| 33 |
+
export SIE_URL="${SIE_URL:-http://127.0.0.1:8080}"
|
| 34 |
+
export OPEN_BROWSER=0
|
| 35 |
+
echo "[hf-entrypoint] starting UI on 0.0.0.0:$PORT"
|
| 36 |
+
exec npx tsx web/server.ts
|
package.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "document-ocr",
|
| 3 |
+
"version": "0.1.0",
|
| 4 |
+
"private": true,
|
| 5 |
+
"type": "module",
|
| 6 |
+
"engines": {
|
| 7 |
+
"node": ">=22"
|
| 8 |
+
},
|
| 9 |
+
"scripts": {
|
| 10 |
+
"start": "docker compose up -d --build && tsx web/server.ts",
|
| 11 |
+
"start:gpu": "docker compose -f compose.gpu.yml up -d --build && tsx web/server.ts",
|
| 12 |
+
"ui": "tsx web/server.ts",
|
| 13 |
+
"typecheck": "tsc -p tsconfig.json --noEmit",
|
| 14 |
+
"regen-samples": "python3 scripts/generate_samples.py"
|
| 15 |
+
},
|
| 16 |
+
"dependencies": {
|
| 17 |
+
"@superlinked/sie-sdk": "^0.3.1"
|
| 18 |
+
},
|
| 19 |
+
"devDependencies": {
|
| 20 |
+
"@types/node": "^22.10.0",
|
| 21 |
+
"tsx": "^4.21.0",
|
| 22 |
+
"typescript": "^5.7.2"
|
| 23 |
+
}
|
| 24 |
+
}
|
src/config.ts
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
export type ModelOption = {
|
| 2 |
+
id: string;
|
| 3 |
+
label: string;
|
| 4 |
+
description: string;
|
| 5 |
+
gpuRequired?: boolean;
|
| 6 |
+
/** Adapter-specific options passed via SIE's `options` field. */
|
| 7 |
+
options?: Record<string, unknown>;
|
| 8 |
+
};
|
| 9 |
+
|
| 10 |
+
export const RECOGNITION_MODELS: ModelOption[] = [
|
| 11 |
+
{
|
| 12 |
+
id: "microsoft/Florence-2-base",
|
| 13 |
+
label: "Florence-2-base (small, fast)",
|
| 14 |
+
description: "Microsoft DaViT + decoder, 270M. Default OCR with the <OCR> task. Fast on CPU.",
|
| 15 |
+
options: { task: "<OCR>" },
|
| 16 |
+
},
|
| 17 |
+
{
|
| 18 |
+
id: "microsoft/Florence-2-large",
|
| 19 |
+
label: "Florence-2-large",
|
| 20 |
+
description: "Larger Florence-2 variant, 770M. Higher quality, still CPU-runnable; ~2x latency.",
|
| 21 |
+
options: { task: "<OCR>" },
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
id: "lightonai/LightOnOCR-2-1B",
|
| 25 |
+
label: "LightOnOCR-2-1B (premium, GPU recommended)",
|
| 26 |
+
description: "Pixtral encoder + Qwen3 decoder, 2.1B. Markdown output. Loads on CPU but slow under Rosetta.",
|
| 27 |
+
},
|
| 28 |
+
{
|
| 29 |
+
id: "PaddlePaddle/PaddleOCR-VL-1.5",
|
| 30 |
+
label: "PaddleOCR-VL-1.5 (GPU image)",
|
| 31 |
+
description: "Paddle's VLM-OCR, 1.5B. Six task modes. Available on the CUDA image.",
|
| 32 |
+
options: { task: "ocr" },
|
| 33 |
+
gpuRequired: true,
|
| 34 |
+
},
|
| 35 |
+
{
|
| 36 |
+
id: "zai-org/GLM-OCR",
|
| 37 |
+
label: "GLM-OCR (GPU only)",
|
| 38 |
+
description: "CogViT + GLM-0.5B decoder, 9B in bfloat16. Premium quality, needs ~18 GB VRAM.",
|
| 39 |
+
gpuRequired: true,
|
| 40 |
+
},
|
| 41 |
+
];
|
| 42 |
+
|
| 43 |
+
export const STRUCTURED_MODELS: ModelOption[] = [
|
| 44 |
+
{
|
| 45 |
+
id: "naver-clova-ix/donut-base-finetuned-cord-v2",
|
| 46 |
+
label: "Donut on CORD (receipts)",
|
| 47 |
+
description: "Fine-tuned for the CORD receipt schema. Pixels in, nested JSON out.",
|
| 48 |
+
},
|
| 49 |
+
{
|
| 50 |
+
id: "naver-clova-ix/donut-base-finetuned-docvqa",
|
| 51 |
+
label: "Donut on DocVQA",
|
| 52 |
+
description: "Same Donut architecture, fine-tuned for visual question answering. Returns text answers.",
|
| 53 |
+
},
|
| 54 |
+
];
|
| 55 |
+
|
| 56 |
+
export const NER_MODELS: ModelOption[] = [
|
| 57 |
+
{
|
| 58 |
+
id: "urchade/gliner_multi-v2.1",
|
| 59 |
+
label: "GLiNER multi (multilingual)",
|
| 60 |
+
description: "280M, zero-shot NER, 100+ languages. Good default.",
|
| 61 |
+
},
|
| 62 |
+
{
|
| 63 |
+
id: "urchade/gliner_large-v2.1",
|
| 64 |
+
label: "GLiNER large (English)",
|
| 65 |
+
description: "440M, English-focused, higher quality on English text.",
|
| 66 |
+
},
|
| 67 |
+
];
|
| 68 |
+
|
| 69 |
+
export const config = {
|
| 70 |
+
sieUrl: process.env.SIE_URL ?? "http://localhost:8080",
|
| 71 |
+
sieApiKey: process.env.SIE_API_KEY,
|
| 72 |
+
|
| 73 |
+
defaults: {
|
| 74 |
+
recognition: RECOGNITION_MODELS[0].id,
|
| 75 |
+
structured: STRUCTURED_MODELS[0].id,
|
| 76 |
+
ner: NER_MODELS[0].id,
|
| 77 |
+
},
|
| 78 |
+
|
| 79 |
+
paths: {
|
| 80 |
+
samples: "data/samples/index.json",
|
| 81 |
+
sampleDir: "data/samples",
|
| 82 |
+
},
|
| 83 |
+
|
| 84 |
+
port: Number(process.env.PORT ?? 3032),
|
| 85 |
+
} as const;
|
src/donut.ts
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { detectImageFormat, type SIEClient } from "@superlinked/sie-sdk";
|
| 2 |
+
import type { DonutEntity } from "./types.js";
|
| 3 |
+
|
| 4 |
+
/** Run any image-input "structured" extractor (Donut variants, etc.). */
|
| 5 |
+
export async function structuredExtract(
|
| 6 |
+
client: SIEClient,
|
| 7 |
+
model: string,
|
| 8 |
+
imageBytes: Uint8Array,
|
| 9 |
+
options?: Record<string, unknown>,
|
| 10 |
+
): Promise<{ entities: DonutEntity[]; data: unknown }> {
|
| 11 |
+
const format = detectImageFormat(imageBytes);
|
| 12 |
+
if (format === "unknown") throw new Error("could not detect image format");
|
| 13 |
+
const wire = { data: imageBytes, format };
|
| 14 |
+
const result = await client.extract(
|
| 15 |
+
model,
|
| 16 |
+
{ images: [wire] as unknown as Uint8Array[] },
|
| 17 |
+
{ labels: [], options } as unknown as Parameters<typeof client.extract>[2],
|
| 18 |
+
);
|
| 19 |
+
const entities = (result.entities ?? []).map((e) => ({
|
| 20 |
+
label: e.label,
|
| 21 |
+
text: e.text,
|
| 22 |
+
}));
|
| 23 |
+
const data = (result as unknown as { data?: unknown }).data;
|
| 24 |
+
return { entities, data };
|
| 25 |
+
}
|
src/events.ts
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import type { DonutEntity, ExtractedField } from "./types.js";
|
| 2 |
+
|
| 3 |
+
export type PipelineEvent =
|
| 4 |
+
| { type: "models"; data: { extractor: string; recognition: string; structured: string } }
|
| 5 |
+
| { type: "recognition_start"; data: { model: string } }
|
| 6 |
+
| { type: "recognition_chunk"; data: { textLen: number } }
|
| 7 |
+
| { type: "recognition_done"; data: { markdown: string; ms: number } }
|
| 8 |
+
| { type: "donut_start" }
|
| 9 |
+
| { type: "donut_done"; data: { entities: DonutEntity[]; rawData: unknown; ms: number } }
|
| 10 |
+
| { type: "gliner_start"; data: { labels: string[] } }
|
| 11 |
+
| { type: "gliner_done"; data: { fields: ExtractedField[]; ms: number } }
|
| 12 |
+
| { type: "done"; data: { totalMs: number } }
|
| 13 |
+
| { type: "error"; data: { message: string; stage: string } };
|
src/extract.ts
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import type { SIEClient } from "@superlinked/sie-sdk";
|
| 2 |
+
import type { ExtractedField } from "./types.js";
|
| 3 |
+
|
| 4 |
+
export async function extractFields(
|
| 5 |
+
client: SIEClient,
|
| 6 |
+
model: string,
|
| 7 |
+
text: string,
|
| 8 |
+
labels: string[],
|
| 9 |
+
): Promise<ExtractedField[]> {
|
| 10 |
+
if (!text.trim()) return [];
|
| 11 |
+
const result = await client.extract(model, { text }, { labels, threshold: 0.4 });
|
| 12 |
+
return (result.entities ?? []).map((e) => ({
|
| 13 |
+
label: e.label,
|
| 14 |
+
text: e.text,
|
| 15 |
+
score: e.score,
|
| 16 |
+
}));
|
| 17 |
+
}
|
src/ocr.ts
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { detectImageFormat, type SIEClient } from "@superlinked/sie-sdk";
|
| 2 |
+
|
| 3 |
+
export async function recognize(
|
| 4 |
+
client: SIEClient,
|
| 5 |
+
model: string,
|
| 6 |
+
imageBytes: Uint8Array,
|
| 7 |
+
options?: Record<string, unknown>,
|
| 8 |
+
): Promise<string> {
|
| 9 |
+
const format = detectImageFormat(imageBytes);
|
| 10 |
+
if (format === "unknown") throw new Error("could not detect image format");
|
| 11 |
+
const wire = { data: imageBytes, format };
|
| 12 |
+
// The TS SDK types declare images as Uint8Array[], but the wire format
|
| 13 |
+
// expects {data, format} dicts. Cast around the typing gap.
|
| 14 |
+
const result = await client.extract(
|
| 15 |
+
model,
|
| 16 |
+
{ images: [wire] as unknown as Uint8Array[] },
|
| 17 |
+
// The TS SDK's ExtractOptions doesn't declare `options`, but the wire
|
| 18 |
+
// protocol forwards it to the adapter. Cast to bridge the typing gap.
|
| 19 |
+
{ labels: [], options } as unknown as Parameters<typeof client.extract>[2],
|
| 20 |
+
);
|
| 21 |
+
if (!result.entities || result.entities.length === 0) return "";
|
| 22 |
+
const text = result.entities[0]?.text;
|
| 23 |
+
return typeof text === "string" ? text : "";
|
| 24 |
+
}
|
src/pipeline.ts
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import type { SIEClient } from "@superlinked/sie-sdk";
|
| 2 |
+
import { NER_MODELS, RECOGNITION_MODELS, STRUCTURED_MODELS } from "./config.js";
|
| 3 |
+
import { structuredExtract } from "./donut.js";
|
| 4 |
+
import type { PipelineEvent } from "./events.js";
|
| 5 |
+
import { extractFields } from "./extract.js";
|
| 6 |
+
import { recognize } from "./ocr.js";
|
| 7 |
+
import type { SampleDoc, TriageResult } from "./types.js";
|
| 8 |
+
|
| 9 |
+
export type RunInput = {
|
| 10 |
+
client: SIEClient;
|
| 11 |
+
imageBytes: Uint8Array;
|
| 12 |
+
sample: SampleDoc;
|
| 13 |
+
recognitionModel: string;
|
| 14 |
+
structuredModel: string;
|
| 15 |
+
nerModel: string;
|
| 16 |
+
emit: (event: PipelineEvent) => void;
|
| 17 |
+
};
|
| 18 |
+
|
| 19 |
+
function lookup<T extends { id: string }>(list: T[], id: string): T {
|
| 20 |
+
const found = list.find((m) => m.id === id);
|
| 21 |
+
if (!found) throw new Error(`unknown model id: ${id}`);
|
| 22 |
+
return found;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
export async function runPipeline({
|
| 26 |
+
client,
|
| 27 |
+
imageBytes,
|
| 28 |
+
sample,
|
| 29 |
+
recognitionModel,
|
| 30 |
+
structuredModel,
|
| 31 |
+
nerModel,
|
| 32 |
+
emit,
|
| 33 |
+
}: RunInput): Promise<TriageResult> {
|
| 34 |
+
const t0 = Date.now();
|
| 35 |
+
|
| 36 |
+
emit({
|
| 37 |
+
type: "models",
|
| 38 |
+
data: { extractor: nerModel, recognition: recognitionModel, structured: structuredModel },
|
| 39 |
+
});
|
| 40 |
+
|
| 41 |
+
// Recognition
|
| 42 |
+
const recOpt = lookup(RECOGNITION_MODELS, recognitionModel);
|
| 43 |
+
emit({ type: "recognition_start", data: { model: recognitionModel } });
|
| 44 |
+
const tRec = Date.now();
|
| 45 |
+
let markdown = "";
|
| 46 |
+
try {
|
| 47 |
+
markdown = await recognize(client, recOpt.id, imageBytes, recOpt.options);
|
| 48 |
+
} catch (err) {
|
| 49 |
+
emit({
|
| 50 |
+
type: "error",
|
| 51 |
+
data: { stage: "recognition", message: `${recognitionModel} failed: ${(err as Error).message}` },
|
| 52 |
+
});
|
| 53 |
+
throw err;
|
| 54 |
+
}
|
| 55 |
+
const recognitionMs = Date.now() - tRec;
|
| 56 |
+
emit({ type: "recognition_done", data: { markdown, ms: recognitionMs } });
|
| 57 |
+
|
| 58 |
+
// Structured (Donut variants, etc.)
|
| 59 |
+
const strOpt = lookup(STRUCTURED_MODELS, structuredModel);
|
| 60 |
+
emit({ type: "donut_start" });
|
| 61 |
+
const tDon = Date.now();
|
| 62 |
+
let donut = { entities: [] as { label: string; text: string }[], data: undefined as unknown };
|
| 63 |
+
try {
|
| 64 |
+
donut = await structuredExtract(client, strOpt.id, imageBytes, strOpt.options);
|
| 65 |
+
} catch (err) {
|
| 66 |
+
emit({
|
| 67 |
+
type: "error",
|
| 68 |
+
data: { stage: "donut", message: `${structuredModel} failed: ${(err as Error).message}` },
|
| 69 |
+
});
|
| 70 |
+
}
|
| 71 |
+
const donutMs = Date.now() - tDon;
|
| 72 |
+
emit({ type: "donut_done", data: { entities: donut.entities, rawData: donut.data, ms: donutMs } });
|
| 73 |
+
|
| 74 |
+
// NER (GLiNER variants)
|
| 75 |
+
const nerOpt = lookup(NER_MODELS, nerModel);
|
| 76 |
+
emit({ type: "gliner_start", data: { labels: sample.labels } });
|
| 77 |
+
const tGli = Date.now();
|
| 78 |
+
let fields: { label: string; text: string; score: number }[] = [];
|
| 79 |
+
try {
|
| 80 |
+
fields = await extractFields(client, nerOpt.id, markdown, sample.labels);
|
| 81 |
+
} catch (err) {
|
| 82 |
+
emit({
|
| 83 |
+
type: "error",
|
| 84 |
+
data: { stage: "gliner", message: `${nerModel} failed: ${(err as Error).message}` },
|
| 85 |
+
});
|
| 86 |
+
}
|
| 87 |
+
const glinerMs = Date.now() - tGli;
|
| 88 |
+
emit({ type: "gliner_done", data: { fields, ms: glinerMs } });
|
| 89 |
+
|
| 90 |
+
const totalMs = Date.now() - t0;
|
| 91 |
+
emit({ type: "done", data: { totalMs } });
|
| 92 |
+
|
| 93 |
+
return {
|
| 94 |
+
sampleId: sample.id,
|
| 95 |
+
recognitionModel,
|
| 96 |
+
markdown,
|
| 97 |
+
donutEntities: donut.entities,
|
| 98 |
+
donutData: donut.data,
|
| 99 |
+
glinerFields: fields,
|
| 100 |
+
timings: { recognitionMs, donutMs, glinerMs, totalMs },
|
| 101 |
+
};
|
| 102 |
+
}
|
src/types.ts
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
export type SampleDoc = {
|
| 2 |
+
id: string;
|
| 3 |
+
filename: string;
|
| 4 |
+
label: string;
|
| 5 |
+
description: string;
|
| 6 |
+
labels: string[];
|
| 7 |
+
};
|
| 8 |
+
|
| 9 |
+
export type ExtractedField = {
|
| 10 |
+
label: string;
|
| 11 |
+
text: string;
|
| 12 |
+
score: number;
|
| 13 |
+
};
|
| 14 |
+
|
| 15 |
+
export type DonutEntity = {
|
| 16 |
+
label: string;
|
| 17 |
+
text: string;
|
| 18 |
+
};
|
| 19 |
+
|
| 20 |
+
export type TriageResult = {
|
| 21 |
+
sampleId: string;
|
| 22 |
+
recognitionModel: string;
|
| 23 |
+
markdown: string;
|
| 24 |
+
donutEntities: DonutEntity[];
|
| 25 |
+
donutData: unknown;
|
| 26 |
+
glinerFields: ExtractedField[];
|
| 27 |
+
timings: {
|
| 28 |
+
recognitionMs: number;
|
| 29 |
+
donutMs: number;
|
| 30 |
+
glinerMs: number;
|
| 31 |
+
totalMs: number;
|
| 32 |
+
};
|
| 33 |
+
};
|
tsconfig.json
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"compilerOptions": {
|
| 3 |
+
"target": "ES2022",
|
| 4 |
+
"module": "ESNext",
|
| 5 |
+
"moduleResolution": "bundler",
|
| 6 |
+
"esModuleInterop": true,
|
| 7 |
+
"allowSyntheticDefaultImports": true,
|
| 8 |
+
"strict": true,
|
| 9 |
+
"skipLibCheck": true,
|
| 10 |
+
"forceConsistentCasingInFileNames": true,
|
| 11 |
+
"resolveJsonModule": true,
|
| 12 |
+
"isolatedModules": true,
|
| 13 |
+
"noEmit": true
|
| 14 |
+
},
|
| 15 |
+
"include": ["src", "web"]
|
| 16 |
+
}
|
web/public/app.js
ADDED
|
@@ -0,0 +1,236 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
const els = {
|
| 2 |
+
badge: document.getElementById("badge"),
|
| 3 |
+
models: document.getElementById("models"),
|
| 4 |
+
sieState: document.getElementById("sie-state"),
|
| 5 |
+
events: document.getElementById("events"),
|
| 6 |
+
selectRecognition: document.getElementById("select-recognition"),
|
| 7 |
+
selectStructured: document.getElementById("select-structured"),
|
| 8 |
+
selectNer: document.getElementById("select-ner"),
|
| 9 |
+
recognition: document.getElementById("recognition"),
|
| 10 |
+
recognitionMeta: document.getElementById("recognition-meta"),
|
| 11 |
+
extraction: document.getElementById("extraction"),
|
| 12 |
+
extractionMeta: document.getElementById("extraction-meta"),
|
| 13 |
+
footer: document.getElementById("footer"),
|
| 14 |
+
sieUrl: document.getElementById("sie-url"),
|
| 15 |
+
timings: document.getElementById("timings"),
|
| 16 |
+
};
|
| 17 |
+
|
| 18 |
+
let activeSampleId = null;
|
| 19 |
+
let timings = { recognitionMs: 0, donutMs: 0, glinerMs: 0 };
|
| 20 |
+
let donutBuf = { entities: [], data: null };
|
| 21 |
+
let glinerBuf = [];
|
| 22 |
+
let modelConfig = null;
|
| 23 |
+
let registeredSet = new Set();
|
| 24 |
+
|
| 25 |
+
function setBadge(text, cls) {
|
| 26 |
+
els.badge.textContent = text;
|
| 27 |
+
els.badge.className = "badge" + (cls ? " " + cls : "");
|
| 28 |
+
}
|
| 29 |
+
function shortModel(id) {
|
| 30 |
+
if (!id) return "";
|
| 31 |
+
const slash = id.indexOf("/");
|
| 32 |
+
return slash === -1 ? id : id.slice(slash + 1);
|
| 33 |
+
}
|
| 34 |
+
function escapeHtml(s) {
|
| 35 |
+
return String(s).replace(
|
| 36 |
+
/[&<>"']/g,
|
| 37 |
+
(c) => ({ "&": "&", "<": "<", ">": ">", '"': """, "'": "'" })[c],
|
| 38 |
+
);
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
function populateDropdown(selectEl, options, defaultId) {
|
| 42 |
+
selectEl.innerHTML = "";
|
| 43 |
+
for (const opt of options) {
|
| 44 |
+
const node = document.createElement("option");
|
| 45 |
+
node.value = opt.id;
|
| 46 |
+
const available =
|
| 47 |
+
registeredSet.size === 0 || registeredSet.has(opt.id);
|
| 48 |
+
const labelSuffix = !available
|
| 49 |
+
? opt.gpuRequired
|
| 50 |
+
? " (GPU image needed)"
|
| 51 |
+
: " (not registered)"
|
| 52 |
+
: "";
|
| 53 |
+
node.textContent = opt.label + labelSuffix;
|
| 54 |
+
if (!available) node.disabled = true;
|
| 55 |
+
if (opt.id === defaultId) node.selected = true;
|
| 56 |
+
node.title = opt.description;
|
| 57 |
+
selectEl.appendChild(node);
|
| 58 |
+
}
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
function renderSamples(samples, onClick) {
|
| 62 |
+
if (!samples || samples.length === 0) {
|
| 63 |
+
els.events.innerHTML = '<p class="hint">no samples</p>';
|
| 64 |
+
return;
|
| 65 |
+
}
|
| 66 |
+
els.events.innerHTML = samples
|
| 67 |
+
.map(
|
| 68 |
+
(s) => `<div class="event" data-id="${escapeHtml(s.id)}">
|
| 69 |
+
<img src="/samples/${encodeURIComponent(s.filename)}" alt="${escapeHtml(s.label)}" />
|
| 70 |
+
<div>
|
| 71 |
+
<div class="label">${escapeHtml(s.label)}</div>
|
| 72 |
+
<div class="desc">${escapeHtml(s.description)}</div>
|
| 73 |
+
</div>
|
| 74 |
+
</div>`,
|
| 75 |
+
)
|
| 76 |
+
.join("");
|
| 77 |
+
for (const node of els.events.querySelectorAll(".event")) {
|
| 78 |
+
node.addEventListener("click", () => {
|
| 79 |
+
for (const n of els.events.querySelectorAll(".event")) n.classList.remove("active");
|
| 80 |
+
node.classList.add("active");
|
| 81 |
+
onClick(node.dataset.id);
|
| 82 |
+
});
|
| 83 |
+
}
|
| 84 |
+
}
|
| 85 |
+
|
| 86 |
+
function updateTimings() {
|
| 87 |
+
const total = timings.recognitionMs + timings.donutMs + timings.glinerMs;
|
| 88 |
+
els.timings.textContent =
|
| 89 |
+
total > 0
|
| 90 |
+
? `recognition ${timings.recognitionMs}ms · structured ${timings.donutMs}ms · ner ${timings.glinerMs}ms · total ${total}ms`
|
| 91 |
+
: "";
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
function renderExtraction() {
|
| 95 |
+
let html = "";
|
| 96 |
+
|
| 97 |
+
if (glinerBuf.length > 0) {
|
| 98 |
+
html += `<div class="section"><h3>NER (${escapeHtml(shortModel(els.selectNer.value))})</h3>`;
|
| 99 |
+
for (const f of glinerBuf) {
|
| 100 |
+
html += `<div class="field">
|
| 101 |
+
<span class="label-name">${escapeHtml(f.label)}</span>
|
| 102 |
+
<span class="text">${escapeHtml(f.text)}</span>
|
| 103 |
+
<span class="score">${f.score.toFixed(2)}</span>
|
| 104 |
+
</div>`;
|
| 105 |
+
}
|
| 106 |
+
html += "</div>";
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
if (donutBuf.entities.length > 0) {
|
| 110 |
+
html += `<div class="section"><h3>Structured (${escapeHtml(shortModel(els.selectStructured.value))})</h3>`;
|
| 111 |
+
for (const e of donutBuf.entities.slice(0, 25)) {
|
| 112 |
+
html += `<div class="donut-row">
|
| 113 |
+
<span class="key">${escapeHtml(e.label)}</span>
|
| 114 |
+
<span class="val">${escapeHtml(e.text)}</span>
|
| 115 |
+
</div>`;
|
| 116 |
+
}
|
| 117 |
+
html += "</div>";
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
if (!html) html = '<p class="hint">running...</p>';
|
| 121 |
+
els.extraction.innerHTML = html;
|
| 122 |
+
}
|
| 123 |
+
|
| 124 |
+
function runSample(sampleId) {
|
| 125 |
+
activeSampleId = sampleId;
|
| 126 |
+
setBadge("running", "running");
|
| 127 |
+
els.recognition.innerHTML = '<p class="hint">running recognition...</p>';
|
| 128 |
+
els.extraction.innerHTML = '<p class="hint">waiting...</p>';
|
| 129 |
+
els.recognitionMeta.textContent = "";
|
| 130 |
+
els.extractionMeta.textContent = "";
|
| 131 |
+
timings = { recognitionMs: 0, donutMs: 0, glinerMs: 0 };
|
| 132 |
+
donutBuf = { entities: [], data: null };
|
| 133 |
+
glinerBuf = [];
|
| 134 |
+
updateTimings();
|
| 135 |
+
|
| 136 |
+
const recognition = els.selectRecognition.value;
|
| 137 |
+
const structured = els.selectStructured.value;
|
| 138 |
+
const ner = els.selectNer.value;
|
| 139 |
+
const url = `/api/run?id=${encodeURIComponent(sampleId)}&recognition=${encodeURIComponent(recognition)}&structured=${encodeURIComponent(structured)}&ner=${encodeURIComponent(ner)}`;
|
| 140 |
+
const es = new EventSource(url);
|
| 141 |
+
|
| 142 |
+
es.addEventListener("models", (e) => {
|
| 143 |
+
const d = JSON.parse(e.data);
|
| 144 |
+
els.models.innerHTML = `recognition: <code>${shortModel(d.recognition)}</code> · structured: <code>${shortModel(d.structured)}</code> · ner: <code>${shortModel(d.extractor)}</code>`;
|
| 145 |
+
});
|
| 146 |
+
es.addEventListener("recognition_start", () => {
|
| 147 |
+
els.recognitionMeta.textContent = "loading model + generating...";
|
| 148 |
+
});
|
| 149 |
+
es.addEventListener("recognition_done", (e) => {
|
| 150 |
+
const d = JSON.parse(e.data);
|
| 151 |
+
timings.recognitionMs = d.ms;
|
| 152 |
+
els.recognitionMeta.textContent = `${d.markdown.length} chars in ${d.ms}ms`;
|
| 153 |
+
els.recognition.textContent = d.markdown;
|
| 154 |
+
updateTimings();
|
| 155 |
+
});
|
| 156 |
+
es.addEventListener("donut_start", () => {
|
| 157 |
+
els.extractionMeta.textContent = "running structured...";
|
| 158 |
+
});
|
| 159 |
+
es.addEventListener("donut_done", (e) => {
|
| 160 |
+
const d = JSON.parse(e.data);
|
| 161 |
+
timings.donutMs = d.ms;
|
| 162 |
+
donutBuf = { entities: d.entities, data: d.rawData };
|
| 163 |
+
els.extractionMeta.textContent = `structured ${d.ms}ms`;
|
| 164 |
+
renderExtraction();
|
| 165 |
+
updateTimings();
|
| 166 |
+
});
|
| 167 |
+
es.addEventListener("gliner_start", () => {
|
| 168 |
+
els.extractionMeta.textContent = "running NER...";
|
| 169 |
+
});
|
| 170 |
+
es.addEventListener("gliner_done", (e) => {
|
| 171 |
+
const d = JSON.parse(e.data);
|
| 172 |
+
timings.glinerMs = d.ms;
|
| 173 |
+
glinerBuf = d.fields;
|
| 174 |
+
els.extractionMeta.textContent = `ner ${d.ms}ms · ${d.fields.length} fields`;
|
| 175 |
+
renderExtraction();
|
| 176 |
+
updateTimings();
|
| 177 |
+
});
|
| 178 |
+
es.addEventListener("done", (e) => {
|
| 179 |
+
const d = JSON.parse(e.data);
|
| 180 |
+
setBadge(`done ${d.totalMs}ms`, "green");
|
| 181 |
+
es.close();
|
| 182 |
+
});
|
| 183 |
+
es.addEventListener("error", (e) => {
|
| 184 |
+
setBadge("error", "red");
|
| 185 |
+
if (e.data) {
|
| 186 |
+
try {
|
| 187 |
+
const m = JSON.parse(e.data);
|
| 188 |
+
els.recognitionMeta.textContent = `${m.stage}: ${m.message}`;
|
| 189 |
+
} catch {
|
| 190 |
+
/* */
|
| 191 |
+
}
|
| 192 |
+
}
|
| 193 |
+
es.close();
|
| 194 |
+
});
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
async function init() {
|
| 198 |
+
// Fetch SIE health (and registered models)
|
| 199 |
+
let registered = [];
|
| 200 |
+
try {
|
| 201 |
+
const r = await fetch("/api/health");
|
| 202 |
+
const j = await r.json();
|
| 203 |
+
els.sieUrl.textContent = j.sieUrl;
|
| 204 |
+
if (!j.sie) {
|
| 205 |
+
els.sieState.textContent = "SIE not reachable yet (still preloading models?)";
|
| 206 |
+
} else {
|
| 207 |
+
els.sieState.textContent = `SIE healthy · ${j.registeredModels} models registered`;
|
| 208 |
+
registered = j.registered ?? [];
|
| 209 |
+
}
|
| 210 |
+
} catch {
|
| 211 |
+
els.sieState.textContent = "could not reach the local server";
|
| 212 |
+
}
|
| 213 |
+
registeredSet = new Set(registered);
|
| 214 |
+
|
| 215 |
+
// Fetch model menus (config-side)
|
| 216 |
+
try {
|
| 217 |
+
const r = await fetch("/api/models");
|
| 218 |
+
modelConfig = await r.json();
|
| 219 |
+
populateDropdown(els.selectRecognition, modelConfig.recognition, modelConfig.defaults.recognition);
|
| 220 |
+
populateDropdown(els.selectStructured, modelConfig.structured, modelConfig.defaults.structured);
|
| 221 |
+
populateDropdown(els.selectNer, modelConfig.ner, modelConfig.defaults.ner);
|
| 222 |
+
} catch (e) {
|
| 223 |
+
console.error("failed to load model config", e);
|
| 224 |
+
}
|
| 225 |
+
|
| 226 |
+
// Fetch sample documents
|
| 227 |
+
try {
|
| 228 |
+
const r = await fetch("/api/samples");
|
| 229 |
+
const samples = await r.json();
|
| 230 |
+
renderSamples(samples, runSample);
|
| 231 |
+
} catch {
|
| 232 |
+
els.events.innerHTML = '<p class="hint">failed to load samples</p>';
|
| 233 |
+
}
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
init();
|
web/public/index.html
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!doctype html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="utf-8" />
|
| 5 |
+
<meta name="viewport" content="width=device-width,initial-scale=1" />
|
| 6 |
+
<title>document-ocr</title>
|
| 7 |
+
<link rel="stylesheet" href="/static/style.css" />
|
| 8 |
+
</head>
|
| 9 |
+
<body>
|
| 10 |
+
<header>
|
| 11 |
+
<div class="title">
|
| 12 |
+
<span class="logo">📄</span>
|
| 13 |
+
<h1>document-ocr</h1>
|
| 14 |
+
<span class="badge" id="badge">idle</span>
|
| 15 |
+
</div>
|
| 16 |
+
<div class="meta" id="models">SIE: <code>...</code></div>
|
| 17 |
+
<div class="meta" id="sie-state">checking SIE...</div>
|
| 18 |
+
</header>
|
| 19 |
+
|
| 20 |
+
<section class="hero">
|
| 21 |
+
<p>
|
| 22 |
+
OCR is rarely a single-model problem. This demo runs three model
|
| 23 |
+
classes through one SIE server: a <strong>VLM-OCR</strong> recognizes
|
| 24 |
+
the document into Markdown, a <strong>fine-tuned Donut</strong> emits
|
| 25 |
+
a JSON tree directly, and a <strong>zero-shot NER (GLiNER)</strong>
|
| 26 |
+
pulls typed fields out of the recognition output. Pick a sample on
|
| 27 |
+
the left, swap any of the three models on the right, watch SIE
|
| 28 |
+
hot-swap them with one identifier change.
|
| 29 |
+
</p>
|
| 30 |
+
</section>
|
| 31 |
+
|
| 32 |
+
<main>
|
| 33 |
+
<section class="panel" id="panel-events">
|
| 34 |
+
<header><h2>Sample documents</h2></header>
|
| 35 |
+
<div class="meta-row">
|
| 36 |
+
<label class="model-pick">
|
| 37 |
+
<span class="dropdown-label">Recognition</span>
|
| 38 |
+
<select id="select-recognition"></select>
|
| 39 |
+
</label>
|
| 40 |
+
<label class="model-pick">
|
| 41 |
+
<span class="dropdown-label">Structured</span>
|
| 42 |
+
<select id="select-structured"></select>
|
| 43 |
+
</label>
|
| 44 |
+
<label class="model-pick">
|
| 45 |
+
<span class="dropdown-label">NER</span>
|
| 46 |
+
<select id="select-ner"></select>
|
| 47 |
+
</label>
|
| 48 |
+
</div>
|
| 49 |
+
<div class="list" id="events">loading...</div>
|
| 50 |
+
</section>
|
| 51 |
+
|
| 52 |
+
<section class="panel" id="panel-recognition">
|
| 53 |
+
<header>
|
| 54 |
+
<h2>Recognition (Markdown)</h2>
|
| 55 |
+
<span class="hint" id="recognition-meta"></span>
|
| 56 |
+
</header>
|
| 57 |
+
<div class="markdown" id="recognition">
|
| 58 |
+
<p class="hint">Click a sample on the left.</p>
|
| 59 |
+
</div>
|
| 60 |
+
</section>
|
| 61 |
+
|
| 62 |
+
<section class="panel" id="panel-extraction">
|
| 63 |
+
<header>
|
| 64 |
+
<h2>Extraction</h2>
|
| 65 |
+
<span class="hint" id="extraction-meta"></span>
|
| 66 |
+
</header>
|
| 67 |
+
<div class="extraction" id="extraction">
|
| 68 |
+
<p class="hint">Typed fields will appear here.</p>
|
| 69 |
+
</div>
|
| 70 |
+
</section>
|
| 71 |
+
</main>
|
| 72 |
+
|
| 73 |
+
<footer>
|
| 74 |
+
<span id="footer">SIE on <code id="sie-url">http://localhost:8080</code></span>
|
| 75 |
+
<span id="timings"></span>
|
| 76 |
+
</footer>
|
| 77 |
+
|
| 78 |
+
<script src="/static/app.js"></script>
|
| 79 |
+
</body>
|
| 80 |
+
</html>
|
web/public/style.css
ADDED
|
@@ -0,0 +1,148 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
:root {
|
| 2 |
+
--bg: #0e1014;
|
| 3 |
+
--panel: #161a22;
|
| 4 |
+
--line: #232936;
|
| 5 |
+
--text: #e6e8ec;
|
| 6 |
+
--muted: #8b94a7;
|
| 7 |
+
--accent: #7c5dff;
|
| 8 |
+
--accent-2: #62b6ff;
|
| 9 |
+
--green: #5fd28b;
|
| 10 |
+
--red: #ff6b6b;
|
| 11 |
+
--yellow: #f7c948;
|
| 12 |
+
--magenta: #c89cff;
|
| 13 |
+
}
|
| 14 |
+
|
| 15 |
+
* { box-sizing: border-box; }
|
| 16 |
+
html, body {
|
| 17 |
+
margin: 0; padding: 0; background: var(--bg); color: var(--text);
|
| 18 |
+
font: 14px ui-monospace, SFMono-Regular, Menlo, monospace;
|
| 19 |
+
height: 100%;
|
| 20 |
+
}
|
| 21 |
+
body { display: flex; flex-direction: column; }
|
| 22 |
+
|
| 23 |
+
header {
|
| 24 |
+
display: flex; align-items: center; gap: 16px;
|
| 25 |
+
padding: 12px 20px; border-bottom: 1px solid var(--line);
|
| 26 |
+
}
|
| 27 |
+
.title { display: flex; align-items: center; gap: 12px; }
|
| 28 |
+
.logo { font-size: 18px; }
|
| 29 |
+
h1 { font-size: 16px; margin: 0; font-weight: 600; }
|
| 30 |
+
.badge {
|
| 31 |
+
padding: 3px 10px; border-radius: 999px; background: var(--line);
|
| 32 |
+
color: var(--muted); font-size: 11px; text-transform: uppercase;
|
| 33 |
+
letter-spacing: 0.5px;
|
| 34 |
+
}
|
| 35 |
+
.badge.running { background: rgba(98, 182, 255, 0.15); color: var(--accent-2); }
|
| 36 |
+
.badge.green { background: rgba(95, 210, 139, 0.18); color: var(--green); }
|
| 37 |
+
.badge.red { background: rgba(255, 107, 107, 0.18); color: var(--red); }
|
| 38 |
+
.meta { color: var(--muted); font-size: 12px; }
|
| 39 |
+
|
| 40 |
+
.hero {
|
| 41 |
+
padding: 14px 20px; background: linear-gradient(180deg, rgba(124,93,255,0.10), transparent);
|
| 42 |
+
border-bottom: 1px solid var(--line);
|
| 43 |
+
}
|
| 44 |
+
.hero p { margin: 0; color: var(--muted); max-width: 980px; line-height: 1.6; }
|
| 45 |
+
.hero strong { color: var(--text); }
|
| 46 |
+
|
| 47 |
+
main {
|
| 48 |
+
flex: 1; display: grid;
|
| 49 |
+
grid-template-columns: 0.95fr 1.4fr 1.2fr;
|
| 50 |
+
gap: 12px; padding: 12px 20px; overflow: hidden;
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
.panel {
|
| 54 |
+
display: flex; flex-direction: column;
|
| 55 |
+
background: var(--panel); border: 1px solid var(--line);
|
| 56 |
+
border-radius: 10px; overflow: hidden; min-height: 0;
|
| 57 |
+
}
|
| 58 |
+
.panel header {
|
| 59 |
+
padding: 10px 14px; border-bottom: 1px solid var(--line);
|
| 60 |
+
display: flex; justify-content: space-between; align-items: baseline;
|
| 61 |
+
}
|
| 62 |
+
.panel h2 {
|
| 63 |
+
font-size: 12px; letter-spacing: 0.6px; text-transform: uppercase;
|
| 64 |
+
margin: 0; color: var(--muted);
|
| 65 |
+
}
|
| 66 |
+
#panel-events h2 { color: var(--accent); }
|
| 67 |
+
#panel-recognition h2 { color: var(--accent-2); }
|
| 68 |
+
#panel-extraction h2 { color: var(--magenta); }
|
| 69 |
+
|
| 70 |
+
.hint { color: var(--muted); font-size: 11px; }
|
| 71 |
+
|
| 72 |
+
.list, .markdown, .extraction {
|
| 73 |
+
flex: 1; overflow: auto; padding: 12px 14px; margin: 0;
|
| 74 |
+
}
|
| 75 |
+
|
| 76 |
+
.meta-row {
|
| 77 |
+
padding: 10px 14px; border-bottom: 1px solid var(--line);
|
| 78 |
+
font-size: 11px; color: var(--muted);
|
| 79 |
+
display: flex; flex-direction: column; gap: 6px;
|
| 80 |
+
}
|
| 81 |
+
.model-pick {
|
| 82 |
+
display: flex; align-items: center; gap: 8px;
|
| 83 |
+
font-size: 11px;
|
| 84 |
+
}
|
| 85 |
+
.dropdown-label {
|
| 86 |
+
color: var(--muted); text-transform: uppercase;
|
| 87 |
+
letter-spacing: 0.5px; min-width: 80px;
|
| 88 |
+
}
|
| 89 |
+
.meta-row select {
|
| 90 |
+
flex: 1; min-width: 0;
|
| 91 |
+
background: var(--bg); color: var(--text); border: 1px solid var(--line);
|
| 92 |
+
border-radius: 6px; padding: 4px 8px; font: inherit;
|
| 93 |
+
font-size: 11px;
|
| 94 |
+
}
|
| 95 |
+
.meta-row select:disabled, .meta-row option:disabled {
|
| 96 |
+
color: var(--muted);
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
.event {
|
| 100 |
+
padding: 10px 12px; border: 1px solid var(--line); border-radius: 8px;
|
| 101 |
+
margin-bottom: 10px; cursor: pointer;
|
| 102 |
+
transition: border-color 0.1s;
|
| 103 |
+
display: flex; gap: 12px; align-items: center;
|
| 104 |
+
}
|
| 105 |
+
.event:hover { border-color: var(--accent); }
|
| 106 |
+
.event.active { border-color: var(--accent); background: rgba(124,93,255,0.08); }
|
| 107 |
+
.event img {
|
| 108 |
+
width: 64px; height: 64px; object-fit: cover;
|
| 109 |
+
border-radius: 4px; background: #fff;
|
| 110 |
+
flex-shrink: 0;
|
| 111 |
+
}
|
| 112 |
+
.event .label { font-weight: 600; color: var(--text); font-size: 13px; }
|
| 113 |
+
.event .desc { color: var(--muted); font-size: 11px; margin-top: 2px; line-height: 1.4; }
|
| 114 |
+
|
| 115 |
+
.markdown {
|
| 116 |
+
white-space: pre-wrap; word-break: break-word;
|
| 117 |
+
font-size: 13px; line-height: 1.55;
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
.extraction .section {
|
| 121 |
+
margin-bottom: 14px; padding-bottom: 10px;
|
| 122 |
+
border-bottom: 1px dashed var(--line);
|
| 123 |
+
}
|
| 124 |
+
.extraction .section:last-child { border-bottom: 0; margin-bottom: 0; }
|
| 125 |
+
.extraction h3 {
|
| 126 |
+
font-size: 11px; letter-spacing: 0.6px; text-transform: uppercase;
|
| 127 |
+
color: var(--muted); margin: 0 0 6px 0; font-weight: 500;
|
| 128 |
+
}
|
| 129 |
+
.field {
|
| 130 |
+
display: flex; gap: 8px; padding: 4px 0;
|
| 131 |
+
font-size: 12px;
|
| 132 |
+
}
|
| 133 |
+
.field .label-name {
|
| 134 |
+
color: var(--accent-2); font-weight: 500; min-width: 90px;
|
| 135 |
+
}
|
| 136 |
+
.field .text { color: var(--text); flex: 1; }
|
| 137 |
+
.field .score { color: var(--muted); font-size: 11px; }
|
| 138 |
+
|
| 139 |
+
.donut-row { padding: 3px 0; font-size: 12px; }
|
| 140 |
+
.donut-row .key { color: var(--magenta); }
|
| 141 |
+
.donut-row .val { color: var(--text); margin-left: 8px; }
|
| 142 |
+
|
| 143 |
+
footer {
|
| 144 |
+
padding: 10px 20px; border-top: 1px solid var(--line);
|
| 145 |
+
color: var(--muted); font-size: 12px;
|
| 146 |
+
display: flex; justify-content: space-between; gap: 12px;
|
| 147 |
+
}
|
| 148 |
+
code { background: var(--line); padding: 1px 6px; border-radius: 4px; }
|
web/server.ts
ADDED
|
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import fs from "node:fs";
|
| 2 |
+
import http from "node:http";
|
| 3 |
+
import path from "node:path";
|
| 4 |
+
import { spawnSync } from "node:child_process";
|
| 5 |
+
import { fileURLToPath } from "node:url";
|
| 6 |
+
import { SIEClient } from "@superlinked/sie-sdk";
|
| 7 |
+
import { NER_MODELS, RECOGNITION_MODELS, STRUCTURED_MODELS, config } from "../src/config.js";
|
| 8 |
+
import type { PipelineEvent } from "../src/events.js";
|
| 9 |
+
import { runPipeline } from "../src/pipeline.js";
|
| 10 |
+
import type { SampleDoc } from "../src/types.js";
|
| 11 |
+
|
| 12 |
+
const __dirname = path.dirname(fileURLToPath(import.meta.url));
|
| 13 |
+
const ROOT = path.resolve(__dirname, "..");
|
| 14 |
+
const PUBLIC_DIR = path.resolve(__dirname, "public");
|
| 15 |
+
const SAMPLES_PATH = path.resolve(ROOT, config.paths.samples);
|
| 16 |
+
const SAMPLE_DIR = path.resolve(ROOT, config.paths.sampleDir);
|
| 17 |
+
|
| 18 |
+
const MIME: Record<string, string> = {
|
| 19 |
+
".html": "text/html; charset=utf-8",
|
| 20 |
+
".css": "text/css; charset=utf-8",
|
| 21 |
+
".js": "text/javascript; charset=utf-8",
|
| 22 |
+
".json": "application/json",
|
| 23 |
+
".png": "image/png",
|
| 24 |
+
".jpg": "image/jpeg",
|
| 25 |
+
".jpeg": "image/jpeg",
|
| 26 |
+
};
|
| 27 |
+
|
| 28 |
+
function send(res: http.ServerResponse, status: number, body: string, type = "text/plain") {
|
| 29 |
+
res.writeHead(status, { "content-type": type });
|
| 30 |
+
res.end(body);
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
function serveFile(res: http.ServerResponse, file: string) {
|
| 34 |
+
if (!fs.existsSync(file)) return send(res, 404, "not found");
|
| 35 |
+
const ext = path.extname(file).toLowerCase();
|
| 36 |
+
res.writeHead(200, { "content-type": MIME[ext] ?? "application/octet-stream" });
|
| 37 |
+
fs.createReadStream(file).pipe(res);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
function setupSse(res: http.ServerResponse) {
|
| 41 |
+
res.writeHead(200, {
|
| 42 |
+
"content-type": "text/event-stream",
|
| 43 |
+
"cache-control": "no-cache",
|
| 44 |
+
connection: "keep-alive",
|
| 45 |
+
});
|
| 46 |
+
return (event: { type: string; data?: unknown }) => {
|
| 47 |
+
res.write(`event: ${event.type}\n`);
|
| 48 |
+
res.write(`data: ${JSON.stringify(event.data ?? null)}\n\n`);
|
| 49 |
+
};
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
async function fetchModels(): Promise<{ ok: boolean; names: string[] }> {
|
| 53 |
+
try {
|
| 54 |
+
const r = await fetch(`${config.sieUrl}/v1/models`, { signal: AbortSignal.timeout(3000) });
|
| 55 |
+
if (!r.ok) return { ok: false, names: [] };
|
| 56 |
+
const json = (await r.json()) as { models?: { name: string }[] };
|
| 57 |
+
return { ok: true, names: (json.models ?? []).map((m) => m.name) };
|
| 58 |
+
} catch {
|
| 59 |
+
return { ok: false, names: [] };
|
| 60 |
+
}
|
| 61 |
+
}
|
| 62 |
+
|
| 63 |
+
function readJson<T>(p: string): T {
|
| 64 |
+
return JSON.parse(fs.readFileSync(p, "utf8")) as T;
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
async function handleRun(
|
| 68 |
+
req: http.IncomingMessage,
|
| 69 |
+
res: http.ServerResponse,
|
| 70 |
+
sampleId: string,
|
| 71 |
+
recognitionModel: string,
|
| 72 |
+
structuredModel: string,
|
| 73 |
+
nerModel: string,
|
| 74 |
+
) {
|
| 75 |
+
const push = setupSse(res);
|
| 76 |
+
let closed = false;
|
| 77 |
+
req.on("close", () => {
|
| 78 |
+
closed = true;
|
| 79 |
+
});
|
| 80 |
+
|
| 81 |
+
const samples = readJson<SampleDoc[]>(SAMPLES_PATH);
|
| 82 |
+
const sample = samples.find((s) => s.id === sampleId);
|
| 83 |
+
if (!sample) {
|
| 84 |
+
push({ type: "error", data: { stage: "lookup", message: `unknown sample id: ${sampleId}` } });
|
| 85 |
+
return res.end();
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
const imagePath = path.resolve(SAMPLE_DIR, sample.filename);
|
| 89 |
+
if (!fs.existsSync(imagePath)) {
|
| 90 |
+
push({ type: "error", data: { stage: "lookup", message: `sample image not found: ${sample.filename}` } });
|
| 91 |
+
return res.end();
|
| 92 |
+
}
|
| 93 |
+
const imageBytes = fs.readFileSync(imagePath);
|
| 94 |
+
|
| 95 |
+
const client = new SIEClient(config.sieUrl, {
|
| 96 |
+
apiKey: config.sieApiKey,
|
| 97 |
+
timeout: 600_000, // 10 min request timeout (CPU + Rosetta is slow)
|
| 98 |
+
waitForCapacity: true, // retry while a model is warming up
|
| 99 |
+
provisionTimeout: 900_000, // 15 min ceiling on cold-load polling
|
| 100 |
+
});
|
| 101 |
+
|
| 102 |
+
try {
|
| 103 |
+
await runPipeline({
|
| 104 |
+
client,
|
| 105 |
+
imageBytes,
|
| 106 |
+
sample,
|
| 107 |
+
recognitionModel,
|
| 108 |
+
structuredModel,
|
| 109 |
+
nerModel,
|
| 110 |
+
emit: (event: PipelineEvent) => {
|
| 111 |
+
if (closed) return;
|
| 112 |
+
push({ type: event.type, data: "data" in event ? event.data : null });
|
| 113 |
+
},
|
| 114 |
+
});
|
| 115 |
+
} catch (err) {
|
| 116 |
+
push({ type: "error", data: { stage: "pipeline", message: (err as Error).message } });
|
| 117 |
+
} finally {
|
| 118 |
+
await client.close().catch(() => {});
|
| 119 |
+
res.end();
|
| 120 |
+
}
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
const server = http.createServer(async (req, res) => {
|
| 124 |
+
const url = new URL(req.url ?? "/", `http://${req.headers.host}`);
|
| 125 |
+
const p = url.pathname;
|
| 126 |
+
|
| 127 |
+
if (p === "/" || p === "/index.html") return serveFile(res, path.join(PUBLIC_DIR, "index.html"));
|
| 128 |
+
if (p.startsWith("/static/")) return serveFile(res, path.join(PUBLIC_DIR, p.slice("/static/".length)));
|
| 129 |
+
if (p.startsWith("/samples/")) return serveFile(res, path.join(SAMPLE_DIR, p.slice("/samples/".length)));
|
| 130 |
+
|
| 131 |
+
if (p === "/api/health") {
|
| 132 |
+
const { ok, names } = await fetchModels();
|
| 133 |
+
return send(
|
| 134 |
+
res,
|
| 135 |
+
200,
|
| 136 |
+
JSON.stringify({
|
| 137 |
+
sie: ok,
|
| 138 |
+
sieUrl: config.sieUrl,
|
| 139 |
+
registeredModels: names.length,
|
| 140 |
+
registered: names,
|
| 141 |
+
}),
|
| 142 |
+
"application/json",
|
| 143 |
+
);
|
| 144 |
+
}
|
| 145 |
+
if (p === "/api/models") {
|
| 146 |
+
return send(
|
| 147 |
+
res,
|
| 148 |
+
200,
|
| 149 |
+
JSON.stringify({
|
| 150 |
+
recognition: RECOGNITION_MODELS,
|
| 151 |
+
structured: STRUCTURED_MODELS,
|
| 152 |
+
ner: NER_MODELS,
|
| 153 |
+
defaults: config.defaults,
|
| 154 |
+
}),
|
| 155 |
+
"application/json",
|
| 156 |
+
);
|
| 157 |
+
}
|
| 158 |
+
if (p === "/api/samples") {
|
| 159 |
+
return send(res, 200, fs.readFileSync(SAMPLES_PATH, "utf8"), "application/json");
|
| 160 |
+
}
|
| 161 |
+
if (p === "/api/run") {
|
| 162 |
+
const id = url.searchParams.get("id");
|
| 163 |
+
const recognitionModel =
|
| 164 |
+
url.searchParams.get("recognition") ?? config.defaults.recognition;
|
| 165 |
+
const structuredModel = url.searchParams.get("structured") ?? config.defaults.structured;
|
| 166 |
+
const nerModel = url.searchParams.get("ner") ?? config.defaults.ner;
|
| 167 |
+
if (!id) return send(res, 400, "missing id");
|
| 168 |
+
return handleRun(req, res, id, recognitionModel, structuredModel, nerModel);
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
return send(res, 404, "not found");
|
| 172 |
+
});
|
| 173 |
+
|
| 174 |
+
server.listen(config.port, () => {
|
| 175 |
+
const url = `http://localhost:${config.port}`;
|
| 176 |
+
console.log(`document-ocr ui: ${url}`);
|
| 177 |
+
if (process.env.OPEN_BROWSER !== "0") {
|
| 178 |
+
const opener =
|
| 179 |
+
process.platform === "darwin"
|
| 180 |
+
? "open"
|
| 181 |
+
: process.platform === "win32"
|
| 182 |
+
? "start"
|
| 183 |
+
: "xdg-open";
|
| 184 |
+
spawnSync(opener, [url], { stdio: "ignore" });
|
| 185 |
+
}
|
| 186 |
+
});
|