Claude commited on
Commit
862cbed
·
unverified ·
1 Parent(s): cd57e24

Integrate PaddleOCR and redesign UI with Xerox Star aesthetic

Browse files

Image-to-XML pipeline:
- New /ocr endpoint: upload image -> PaddleOCR -> ALTO/PAGE XML
- No more manual JSON payload needed for the main workflow
- src/app/ocr/ module wraps PaddleOCR with lazy loading
- Auto-detects image dimensions via Pillow
- 20 MB upload limit with format validation

UI redesign (Xerox Star / early Mac inspired):
- Desktop icons for navigation (Scan Document, Documents, Advanced)
- Window chrome with titlebars, close boxes, and striped decorations
- Drag-and-drop zone for image upload
- Retro button styles, dithered background, monospace fonts
- Status bar at bottom of each window
- "Advanced" mode preserves the old raw payload workflow

Infrastructure:
- Add paddlepaddle + paddleocr to dependencies
- Add libgl1, libglib2.0-0 to Dockerfile for OpenCV (PaddleOCR dep)
- Pre-download OCR models at Docker build time
- Increase healthcheck start-period to 30s (model loading)

508 tests pass. 0 ruff warnings.

https://claude.ai/code/session_01AdT5qvpceqqRs1AGJNgD1Y

Dockerfile CHANGED
@@ -1,8 +1,9 @@
1
  FROM python:3.11-slim
2
 
3
- # System deps for lxml and Pillow
4
  RUN apt-get update && apt-get install -y --no-install-recommends \
5
  libxml2 libxslt1.1 libjpeg62-turbo libwebp7 libtiff6 \
 
6
  && rm -rf /var/lib/apt/lists/*
7
 
8
  WORKDIR /app
@@ -23,6 +24,9 @@ RUN useradd -m -u 1000 appuser \
23
 
24
  USER appuser
25
 
 
 
 
26
  # Default storage root — overridden in Space mode via /data
27
  ENV STORAGE_ROOT=/app/data
28
  ENV HOST=0.0.0.0
@@ -30,8 +34,7 @@ ENV PORT=7860
30
 
31
  EXPOSE 7860
32
 
33
- # Health check
34
- HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
35
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:7860/health')" || exit 1
36
 
37
  CMD ["uvicorn", "src.app.main:app", "--host", "0.0.0.0", "--port", "7860"]
 
1
  FROM python:3.11-slim
2
 
3
+ # System deps for lxml, Pillow, and PaddleOCR
4
  RUN apt-get update && apt-get install -y --no-install-recommends \
5
  libxml2 libxslt1.1 libjpeg62-turbo libwebp7 libtiff6 \
6
+ libgl1 libglib2.0-0 libgomp1 \
7
  && rm -rf /var/lib/apt/lists/*
8
 
9
  WORKDIR /app
 
24
 
25
  USER appuser
26
 
27
+ # Pre-download PaddleOCR models at build time (avoids first-request delay)
28
+ RUN python -c "from paddleocr import PaddleOCR; PaddleOCR(use_angle_cls=True, lang='fr', show_log=False)" 2>/dev/null || true
29
+
30
  # Default storage root — overridden in Space mode via /data
31
  ENV STORAGE_ROOT=/app/data
32
  ENV HOST=0.0.0.0
 
34
 
35
  EXPOSE 7860
36
 
37
+ HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 \
 
38
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:7860/health')" || exit 1
39
 
40
  CMD ["uvicorn", "src.app.main:app", "--host", "0.0.0.0", "--port", "7860"]
frontend/static/index.html CHANGED
@@ -5,270 +5,678 @@
5
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
  <title>XmLLM — Document Structure Engine</title>
7
  <style>
8
- :root { --bg: #1a1a2e; --surface: #16213e; --primary: #0f3460; --accent: #e94560; --text: #eee; --muted: #888; --border: #2a2a4a; }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  * { box-sizing: border-box; margin: 0; padding: 0; }
10
- body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; background: var(--bg); color: var(--text); min-height: 100vh; }
11
- header { background: var(--surface); border-bottom: 1px solid var(--border); padding: 1rem 2rem; display: flex; align-items: center; gap: 1rem; }
12
- header h1 { font-size: 1.4rem; font-weight: 600; }
13
- header h1 span { color: var(--accent); }
14
- nav { display: flex; gap: .5rem; margin-left: auto; }
15
- nav button { background: var(--primary); color: var(--text); border: 1px solid var(--border); padding: .4rem 1rem; border-radius: 4px; cursor: pointer; font-size: .85rem; }
16
- nav button.active { background: var(--accent); border-color: var(--accent); }
17
- main { max-width: 1200px; margin: 2rem auto; padding: 0 2rem; }
18
- .card { background: var(--surface); border: 1px solid var(--border); border-radius: 8px; padding: 1.5rem; margin-bottom: 1.5rem; }
19
- .card h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--accent); }
20
- label { display: block; font-size: .85rem; color: var(--muted); margin-bottom: .3rem; }
21
- input, select { background: var(--bg); color: var(--text); border: 1px solid var(--border); padding: .5rem; border-radius: 4px; width: 100%; margin-bottom: .8rem; font-size: .9rem; }
22
- .row { display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; }
23
- .row-3 { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 1rem; }
24
- button.primary { background: var(--accent); color: white; border: none; padding: .6rem 1.5rem; border-radius: 4px; cursor: pointer; font-size: .9rem; font-weight: 600; }
25
- button.primary:disabled { opacity: .5; cursor: not-allowed; }
26
- button.secondary { background: var(--primary); color: var(--text); border: 1px solid var(--border); padding: .4rem 1rem; border-radius: 4px; cursor: pointer; font-size: .85rem; }
27
- .status { display: inline-block; padding: .15rem .6rem; border-radius: 12px; font-size: .75rem; font-weight: 600; }
28
- .status.succeeded { background: #0a3d2a; color: #4ade80; }
29
- .status.failed { background: #3d0a0a; color: #f87171; }
30
- .status.partial_success { background: #3d2a0a; color: #fbbf24; }
31
- .status.running { background: #0a2a3d; color: #60a5fa; }
32
- .status.queued { background: #2a2a2a; color: #aaa; }
33
- table { width: 100%; border-collapse: collapse; font-size: .85rem; }
34
- th { text-align: left; color: var(--muted); font-weight: 500; padding: .5rem; border-bottom: 1px solid var(--border); }
35
- td { padding: .5rem; border-bottom: 1px solid var(--border); }
36
- tr:hover td { background: rgba(255,255,255,.03); }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  .hidden { display: none; }
38
- #log { background: var(--bg); border: 1px solid var(--border); border-radius: 4px; padding: 1rem; font-family: monospace; font-size: .8rem; max-height: 300px; overflow-y: auto; white-space: pre-wrap; color: var(--muted); }
39
- .exports { display: flex; gap: .5rem; flex-wrap: wrap; }
40
- .tag { background: var(--primary); padding: .2rem .5rem; border-radius: 3px; font-size: .75rem; }
41
- .tag.yes { background: #0a3d2a; color: #4ade80; }
42
- .tag.no { background: #2a2a2a; color: #666; }
43
- #viewer-area { background: var(--bg); border: 1px solid var(--border); border-radius: 4px; min-height: 400px; display: flex; align-items: center; justify-content: center; color: var(--muted); font-style: italic; }
44
- .stats { display: flex; gap: 2rem; margin-bottom: 1rem; }
45
- .stat { text-align: center; }
46
- .stat .value { font-size: 1.5rem; font-weight: 700; color: var(--accent); }
47
- .stat .label { font-size: .75rem; color: var(--muted); }
48
  </style>
49
  </head>
50
  <body>
51
- <header>
52
- <h1><span>XmLLM</span> Document Structure Engine</h1>
53
- <nav>
54
- <button class="active" onclick="showPage('upload')">Upload</button>
55
- <button onclick="showPage('jobs')">Jobs</button>
56
- <button onclick="showPage('providers')">Providers</button>
57
- </nav>
58
- </header>
59
-
60
- <main>
61
- <!-- UPLOAD PAGE -->
62
- <div id="page-upload">
63
- <div class="card">
64
- <h2>New Job</h2>
65
- <div class="row">
66
- <div>
67
- <label>Raw Payload JSON</label>
68
- <input type="file" id="payload-file" accept=".json">
69
- </div>
70
- <div>
71
- <label>Provider Family</label>
72
- <select id="provider-family">
73
- <option value="word_box_json">word_box_json (PaddleOCR)</option>
74
- <option value="line_box_json">line_box_json</option>
75
- <option value="text_only">text_only (mLLM)</option>
76
- </select>
77
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  </div>
79
- <div class="row-3">
80
- <div>
81
- <label>Provider ID</label>
82
- <input type="text" id="provider-id" value="paddleocr" placeholder="paddleocr">
83
- </div>
84
- <div>
85
- <label>Image Width (px)</label>
86
- <input type="number" id="img-width" value="2480">
87
- </div>
88
- <div>
89
- <label>Image Height (px)</label>
90
- <input type="number" id="img-height" value="3508">
91
- </div>
 
 
92
  </div>
93
- <button class="primary" id="btn-run" onclick="runJob()">Run Pipeline</button>
94
- <span id="run-status" style="margin-left:1rem;color:var(--muted)"></span>
95
  </div>
96
 
97
- <div class="card hidden" id="result-card">
98
- <h2>Result</h2>
99
- <div class="stats" id="result-stats"></div>
100
- <div class="exports" id="result-exports"></div>
101
- <div id="log" style="margin-top:1rem"></div>
102
- </div>
103
  </div>
 
 
 
 
 
104
 
105
- <!-- JOBS PAGE -->
106
- <div id="page-jobs" class="hidden">
107
- <div class="card">
108
- <h2>Job History</h2>
109
- <table>
110
- <thead><tr><th>ID</th><th>Status</th><th>Provider</th><th>File</th><th>ALTO</th><th>PAGE</th><th>Duration</th><th>Actions</th></tr></thead>
111
- <tbody id="jobs-table"></tbody>
112
- </table>
113
- </div>
114
- <div class="card hidden" id="job-detail-card">
115
- <h2>Job Detail</h2>
116
- <div id="job-detail"></div>
117
- <div id="job-exports" style="margin-top:1rem"></div>
118
- <div id="job-log" style="margin-top:1rem"></div>
119
- </div>
120
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
- <!-- PROVIDERS PAGE -->
123
- <div id="page-providers" class="hidden">
124
- <div class="card">
125
- <h2>Register Provider</h2>
126
- <div class="row-3">
127
- <div><label>Provider ID</label><input id="prov-id" placeholder="my_paddle"></div>
128
- <div><label>Display Name</label><input id="prov-name" placeholder="PaddleOCR Local"></div>
129
- <div><label>Family</label>
130
- <select id="prov-family">
131
- <option value="word_box_json">word_box_json</option>
132
- <option value="line_box_json">line_box_json</option>
133
- <option value="text_only">text_only</option>
134
- </select>
135
- </div>
 
 
 
 
 
136
  </div>
137
- <div class="row">
138
- <div><label>Runtime Type</label>
139
- <select id="prov-runtime"><option>local</option><option>hub</option><option>api</option></select>
140
- </div>
141
- <div><label>Model ID / Path</label><input id="prov-model" placeholder="/models/paddle"></div>
 
 
142
  </div>
143
- <button class="primary" onclick="registerProvider()">Register</button>
144
- <span id="prov-status" style="margin-left:1rem;color:var(--muted)"></span>
145
  </div>
146
- <div class="card">
147
- <h2>Registered Providers</h2>
148
- <table>
149
- <thead><tr><th>ID</th><th>Name</th><th>Family</th><th>Runtime</th><th>Actions</th></tr></thead>
150
- <tbody id="providers-table"></tbody>
151
- </table>
 
 
 
 
 
 
 
152
  </div>
 
 
 
 
 
 
153
  </div>
154
- </main>
155
-
156
- <script>
157
- const API = '';
158
-
159
- function showPage(name) {
160
- document.querySelectorAll('main > div').forEach(d => d.classList.add('hidden'));
161
- document.getElementById('page-' + name).classList.remove('hidden');
162
- document.querySelectorAll('nav button').forEach(b => b.classList.remove('active'));
163
- event.target.classList.add('active');
164
- if (name === 'jobs') loadJobs();
165
- if (name === 'providers') loadProviders();
166
- }
167
-
168
- async function runJob() {
169
- const fileInput = document.getElementById('payload-file');
170
- if (!fileInput.files.length) { alert('Select a payload JSON file'); return; }
171
- const btn = document.getElementById('btn-run');
172
- const status = document.getElementById('run-status');
173
- btn.disabled = true; status.textContent = 'Running...';
174
-
175
- const fd = new FormData();
176
- fd.append('raw_payload_file', fileInput.files[0]);
177
-
178
- const params = new URLSearchParams({
179
- provider_id: document.getElementById('provider-id').value,
180
- provider_family: document.getElementById('provider-family').value,
181
- image_width: document.getElementById('img-width').value,
182
- image_height: document.getElementById('img-height').value,
183
- });
184
-
185
- try {
186
- const r = await fetch(API + '/jobs?' + params, { method: 'POST', body: fd });
187
- const data = await r.json();
188
- btn.disabled = false; status.textContent = '';
189
- showResult(data);
190
- } catch(e) {
191
- btn.disabled = false; status.textContent = 'Error: ' + e.message;
192
- }
193
- }
194
-
195
- function showResult(data) {
196
- const card = document.getElementById('result-card');
197
- card.classList.remove('hidden');
198
- document.getElementById('result-stats').innerHTML = `
199
- <div class="stat"><div class="value"><span class="status ${data.status}">${data.status}</span></div><div class="label">Status</div></div>
200
- <div class="stat"><div class="value">${data.duration_ms ? Math.round(data.duration_ms) + 'ms' : '-'}</div><div class="label">Duration</div></div>
201
- `;
202
- const jobId = data.job_id;
203
- document.getElementById('result-exports').innerHTML = `
204
- <span class="tag ${data.has_alto ? 'yes' : 'no'}">ALTO ${data.has_alto ? '✓' : '✗'}</span>
205
- <span class="tag ${data.has_page_xml ? 'yes' : 'no'}">PAGE ${data.has_page_xml ? '✓' : '✗'}</span>
206
- ${data.has_alto ? `<a href="${API}/jobs/${jobId}/alto" class="secondary" style="color:var(--text);text-decoration:none;padding:.2rem .5rem;background:var(--primary);border-radius:3px;font-size:.75rem">Download ALTO</a>` : ''}
207
- ${data.has_page_xml ? `<a href="${API}/jobs/${jobId}/pagexml" class="secondary" style="color:var(--text);text-decoration:none;padding:.2rem .5rem;background:var(--primary);border-radius:3px;font-size:.75rem">Download PAGE</a>` : ''}
208
- <a href="${API}/jobs/${jobId}/canonical" target="_blank" class="secondary" style="color:var(--text);text-decoration:none;padding:.2rem .5rem;background:var(--primary);border-radius:3px;font-size:.75rem">Canonical JSON</a>
209
- `;
210
- if (data.error) {
211
- document.getElementById('log').textContent = 'ERROR: ' + data.error;
212
- } else {
213
- fetch(API + '/jobs/' + jobId + '/logs').then(r => r.json()).then(events => {
214
- document.getElementById('log').textContent = events.map(e =>
215
- `[${e.status.padEnd(9)}] ${e.step.padEnd(20)} ${e.duration_ms ? Math.round(e.duration_ms) + 'ms' : e.message || ''}`
216
- ).join('\n');
217
- });
218
- }
219
- }
220
-
221
- async function loadJobs() {
222
- const r = await fetch(API + '/jobs');
223
- const jobs = await r.json();
224
- const tbody = document.getElementById('jobs-table');
225
- tbody.innerHTML = jobs.map(j => `<tr>
226
- <td style="font-family:monospace;font-size:.8rem">${j.job_id}</td>
227
- <td><span class="status ${j.status}">${j.status}</span></td>
228
- <td>${j.provider_id}</td>
229
- <td>${j.source_filename || '-'}</td>
230
- <td><span class="tag ${j.has_alto ? 'yes' : 'no'}">${j.has_alto ? '✓' : '✗'}</span></td>
231
- <td><span class="tag ${j.has_page_xml ? 'yes' : 'no'}">${j.has_page_xml ? '✓' : '✗'}</span></td>
232
- <td>${j.duration_ms ? Math.round(j.duration_ms) + 'ms' : '-'}</td>
233
- <td>
234
- ${j.has_alto ? `<a href="${API}/jobs/${j.job_id}/alto" class="secondary" style="font-size:.75rem">ALTO</a> ` : ''}
235
- ${j.has_page_xml ? `<a href="${API}/jobs/${j.job_id}/pagexml" class="secondary" style="font-size:.75rem">PAGE</a>` : ''}
236
- </td>
237
- </tr>`).join('');
238
- }
239
-
240
- async function loadProviders() {
241
- const r = await fetch(API + '/providers');
242
- const provs = await r.json();
243
- const tbody = document.getElementById('providers-table');
244
- tbody.innerHTML = provs.map(p => `<tr>
245
- <td>${p.provider_id}</td>
246
- <td>${p.display_name}</td>
247
- <td><span class="tag">${p.family}</span></td>
248
- <td>${p.runtime_type}</td>
249
- <td><button class="secondary" onclick="deleteProvider('${p.provider_id}')" style="font-size:.75rem">Delete</button></td>
250
- </tr>`).join('');
251
- }
252
-
253
- async function registerProvider() {
254
- const body = {
255
- provider_id: document.getElementById('prov-id').value,
256
- display_name: document.getElementById('prov-name').value,
257
- family: document.getElementById('prov-family').value,
258
- runtime_type: document.getElementById('prov-runtime').value,
259
- model_id_or_path: document.getElementById('prov-model').value,
260
- };
261
- const r = await fetch(API + '/providers', {
262
- method: 'POST', headers: {'Content-Type': 'application/json'}, body: JSON.stringify(body),
263
- });
264
- document.getElementById('prov-status').textContent = r.ok ? 'Registered!' : 'Error';
265
- loadProviders();
266
- }
267
-
268
- async function deleteProvider(id) {
269
- await fetch(API + '/providers/' + id, { method: 'DELETE' });
270
- loadProviders();
271
- }
272
- </script>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
273
  </body>
274
  </html>
 
5
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
  <title>XmLLM — Document Structure Engine</title>
7
  <style>
8
+ @import url('https://fonts.googleapis.com/css2?family=Chicago&display=swap');
9
+
10
+ :root {
11
+ --bg: #a8a8a8;
12
+ --window-bg: #ffffff;
13
+ --titlebar: #000000;
14
+ --titlebar-text: #ffffff;
15
+ --border: #000000;
16
+ --text: #000000;
17
+ --button-bg: #ffffff;
18
+ --button-shadow: #555555;
19
+ --select-bg: #000000;
20
+ --select-text: #ffffff;
21
+ --disabled: #888888;
22
+ --success: #006600;
23
+ --error: #cc0000;
24
+ --accent: #000000;
25
+ }
26
+
27
  * { box-sizing: border-box; margin: 0; padding: 0; }
28
+
29
+ body {
30
+ font-family: "Geneva", "Chicago", "Charcoal", "Lucida Grande", "Helvetica", monospace;
31
+ font-size: 12px;
32
+ background: var(--bg);
33
+ background-image:
34
+ linear-gradient(45deg, #9e9e9e 25%, transparent 25%),
35
+ linear-gradient(-45deg, #9e9e9e 25%, transparent 25%),
36
+ linear-gradient(45deg, transparent 75%, #9e9e9e 75%),
37
+ linear-gradient(-45deg, transparent 75%, #9e9e9e 75%);
38
+ background-size: 4px 4px;
39
+ background-position: 0 0, 0 2px, 2px -2px, -2px 0px;
40
+ color: var(--text);
41
+ min-height: 100vh;
42
+ padding: 8px;
43
+ }
44
+
45
+ /* ── Desktop icons ─────────────────────────────────── */
46
+ .desktop {
47
+ display: flex;
48
+ gap: 8px;
49
+ margin-bottom: 12px;
50
+ flex-wrap: wrap;
51
+ }
52
+
53
+ .desktop-icon {
54
+ display: flex;
55
+ flex-direction: column;
56
+ align-items: center;
57
+ gap: 4px;
58
+ width: 80px;
59
+ padding: 6px 4px;
60
+ cursor: pointer;
61
+ border: 2px solid transparent;
62
+ user-select: none;
63
+ }
64
+ .desktop-icon:hover { border: 2px dotted var(--border); }
65
+ .desktop-icon.active {
66
+ background: var(--select-bg);
67
+ color: var(--select-text);
68
+ border: 2px solid var(--border);
69
+ }
70
+ .desktop-icon .icon-img {
71
+ width: 48px;
72
+ height: 48px;
73
+ border: 2px solid var(--border);
74
+ background: var(--window-bg);
75
+ display: flex;
76
+ align-items: center;
77
+ justify-content: center;
78
+ font-size: 22px;
79
+ }
80
+ .desktop-icon.active .icon-img {
81
+ background: var(--select-bg);
82
+ color: var(--select-text);
83
+ border-color: var(--select-text);
84
+ }
85
+ .desktop-icon .icon-label {
86
+ font-size: 10px;
87
+ text-align: center;
88
+ word-break: break-word;
89
+ }
90
+
91
+ /* ── Window chrome ─────────────────────────────────── */
92
+ .window {
93
+ background: var(--window-bg);
94
+ border: 2px solid var(--border);
95
+ box-shadow: 3px 3px 0 var(--border);
96
+ max-width: 760px;
97
+ margin: 0 auto;
98
+ }
99
+
100
+ .titlebar {
101
+ background: var(--titlebar);
102
+ color: var(--titlebar-text);
103
+ padding: 3px 8px;
104
+ font-size: 12px;
105
+ font-weight: bold;
106
+ display: flex;
107
+ align-items: center;
108
+ gap: 8px;
109
+ cursor: default;
110
+ user-select: none;
111
+ }
112
+ .titlebar .close-box {
113
+ width: 14px;
114
+ height: 14px;
115
+ border: 1px solid var(--titlebar-text);
116
+ display: inline-block;
117
+ cursor: pointer;
118
+ }
119
+ .titlebar .title-text { flex: 1; text-align: center; }
120
+ .titlebar .stripes {
121
+ flex: 0 0 40px;
122
+ height: 10px;
123
+ background: repeating-linear-gradient(
124
+ 0deg,
125
+ var(--titlebar-text) 0px, var(--titlebar-text) 1px,
126
+ var(--titlebar) 1px, var(--titlebar) 3px
127
+ );
128
+ }
129
+
130
+ .window-body { padding: 16px; }
131
+
132
+ /* ── Menu bar ──────────────────────────────────────── */
133
+ .menubar {
134
+ background: var(--window-bg);
135
+ border-bottom: 2px solid var(--border);
136
+ padding: 2px 8px;
137
+ font-size: 11px;
138
+ display: flex;
139
+ gap: 16px;
140
+ }
141
+ .menubar span { cursor: default; font-weight: bold; }
142
+
143
+ /* ── Buttons ───────────────────────────────────────── */
144
+ button, .btn {
145
+ font-family: inherit;
146
+ font-size: 12px;
147
+ background: var(--button-bg);
148
+ border: 2px solid var(--border);
149
+ padding: 4px 16px;
150
+ cursor: pointer;
151
+ box-shadow: 2px 2px 0 var(--button-shadow);
152
+ user-select: none;
153
+ }
154
+ button:active, .btn:active {
155
+ box-shadow: none;
156
+ transform: translate(2px, 2px);
157
+ }
158
+ button:disabled {
159
+ color: var(--disabled);
160
+ border-color: var(--disabled);
161
+ box-shadow: 1px 1px 0 var(--disabled);
162
+ cursor: not-allowed;
163
+ }
164
+ button.primary {
165
+ background: var(--accent);
166
+ color: var(--titlebar-text);
167
+ font-weight: bold;
168
+ border-radius: 6px;
169
+ padding: 6px 24px;
170
+ }
171
+ button.primary:active {
172
+ background: #333;
173
+ }
174
+
175
+ /* ── Form elements ─────────────────────────────────── */
176
+ label {
177
+ font-size: 11px;
178
+ font-weight: bold;
179
+ display: block;
180
+ margin-bottom: 2px;
181
+ }
182
+ input[type="file"] {
183
+ font-family: inherit;
184
+ font-size: 11px;
185
+ }
186
+
187
+ /* ── Drop zone ─────────────────────────────────────── */
188
+ .drop-zone {
189
+ border: 3px dashed var(--border);
190
+ padding: 32px 16px;
191
+ text-align: center;
192
+ cursor: pointer;
193
+ margin-bottom: 12px;
194
+ transition: background 0.15s;
195
+ }
196
+ .drop-zone:hover, .drop-zone.dragover {
197
+ background: #e0e0e0;
198
+ }
199
+ .drop-zone .icon { font-size: 36px; margin-bottom: 8px; }
200
+ .drop-zone .hint { font-size: 11px; color: var(--disabled); }
201
+ .drop-zone .filename {
202
+ font-weight: bold;
203
+ margin-top: 8px;
204
+ word-break: break-all;
205
+ }
206
+
207
+ /* ── Status / progress ─────────────────────────────── */
208
+ .status-bar {
209
+ background: var(--window-bg);
210
+ border-top: 2px solid var(--border);
211
+ padding: 3px 8px;
212
+ font-size: 10px;
213
+ color: var(--disabled);
214
+ display: flex;
215
+ justify-content: space-between;
216
+ }
217
+
218
+ .progress-text {
219
+ font-weight: bold;
220
+ padding: 8px;
221
+ text-align: center;
222
+ font-size: 11px;
223
+ }
224
+
225
+ /* ── Result card ───────────────────────────────────── */
226
+ .result-area {
227
+ border: 2px solid var(--border);
228
+ padding: 12px;
229
+ margin-top: 12px;
230
+ background: #f5f5f5;
231
+ }
232
+ .result-area h3 {
233
+ font-size: 12px;
234
+ border-bottom: 1px solid var(--border);
235
+ padding-bottom: 4px;
236
+ margin-bottom: 8px;
237
+ }
238
+
239
+ .result-row {
240
+ display: flex;
241
+ gap: 12px;
242
+ margin-bottom: 8px;
243
+ align-items: center;
244
+ }
245
+ .result-label { font-weight: bold; min-width: 80px; font-size: 11px; }
246
+ .result-value { font-size: 11px; }
247
+
248
+ .tag {
249
+ display: inline-block;
250
+ border: 1px solid var(--border);
251
+ padding: 1px 6px;
252
+ font-size: 10px;
253
+ font-weight: bold;
254
+ }
255
+ .tag.ok { background: #c6f5c6; }
256
+ .tag.no { background: #f0f0f0; color: var(--disabled); }
257
+
258
+ .download-btn {
259
+ display: inline-block;
260
+ font-family: inherit;
261
+ font-size: 11px;
262
+ background: var(--window-bg);
263
+ border: 2px solid var(--border);
264
+ padding: 3px 12px;
265
+ cursor: pointer;
266
+ text-decoration: none;
267
+ color: var(--text);
268
+ box-shadow: 2px 2px 0 var(--button-shadow);
269
+ margin: 2px;
270
+ }
271
+ .download-btn:active {
272
+ box-shadow: none;
273
+ transform: translate(2px, 2px);
274
+ }
275
+
276
+ /* ── Event log ─────────────────────────────────────── */
277
+ .event-log {
278
+ font-family: "Monaco", "Courier New", monospace;
279
+ font-size: 10px;
280
+ background: var(--window-bg);
281
+ border: 2px inset var(--border);
282
+ padding: 6px;
283
+ max-height: 200px;
284
+ overflow-y: auto;
285
+ white-space: pre;
286
+ margin-top: 8px;
287
+ }
288
+
289
+ /* ── Jobs table ────────────────────────────────────── */
290
+ table {
291
+ width: 100%;
292
+ border-collapse: collapse;
293
+ font-size: 11px;
294
+ }
295
+ th {
296
+ text-align: left;
297
+ font-weight: bold;
298
+ border-bottom: 2px solid var(--border);
299
+ padding: 4px 6px;
300
+ background: #e8e8e8;
301
+ }
302
+ td {
303
+ padding: 3px 6px;
304
+ border-bottom: 1px solid #ccc;
305
+ }
306
+ tr:hover td { background: #e8e8f0; }
307
+
308
+ .status-badge {
309
+ display: inline-block;
310
+ padding: 1px 6px;
311
+ font-size: 10px;
312
+ font-weight: bold;
313
+ border: 1px solid;
314
+ }
315
+ .status-badge.succeeded { background: #c6f5c6; border-color: var(--success); }
316
+ .status-badge.failed { background: #f5c6c6; border-color: var(--error); }
317
+ .status-badge.partial_success { background: #f5eec6; border-color: #886600; }
318
+ .status-badge.running { background: #c6e0f5; border-color: #004488; }
319
+ .status-badge.queued { background: #e8e8e8; border-color: var(--disabled); }
320
+
321
  .hidden { display: none; }
322
+
323
+ /* ── Responsive ────────────────────────────────────── */
324
+ @media (max-width: 600px) {
325
+ .desktop { justify-content: center; }
326
+ .window { margin: 0 4px; }
327
+ }
 
 
 
 
328
  </style>
329
  </head>
330
  <body>
331
+
332
+ <!-- ═══ Desktop Icons ═══ -->
333
+ <div class="desktop">
334
+ <div class="desktop-icon active" onclick="showPage('scan')">
335
+ <div class="icon-img">&#128196;</div>
336
+ <div class="icon-label">Scan Document</div>
337
+ </div>
338
+ <div class="desktop-icon" onclick="showPage('jobs')">
339
+ <div class="icon-img">&#128193;</div>
340
+ <div class="icon-label">Documents</div>
341
+ </div>
342
+ <div class="desktop-icon" onclick="showPage('advanced')">
343
+ <div class="icon-img">&#9881;</div>
344
+ <div class="icon-label">Advanced</div>
345
+ </div>
346
+ </div>
347
+
348
+ <!-- ═══ SCAN WINDOW ═══ -->
349
+ <div id="page-scan" class="window">
350
+ <div class="titlebar">
351
+ <span class="close-box"></span>
352
+ <span class="stripes"></span>
353
+ <span class="title-text">Scan Document</span>
354
+ <span class="stripes"></span>
355
+ </div>
356
+ <div class="menubar">
357
+ <span>File</span>
358
+ <span>Edit</span>
359
+ <span>View</span>
360
+ </div>
361
+ <div class="window-body">
362
+
363
+ <div class="drop-zone" id="drop-zone" onclick="document.getElementById('image-input').click()">
364
+ <div class="icon">&#128444;</div>
365
+ <div><b>Drop an image here</b></div>
366
+ <div class="hint">or click to browse &mdash; PNG, JPEG, TIFF, WebP</div>
367
+ <div class="filename hidden" id="file-label"></div>
368
+ <input type="file" id="image-input" accept=".png,.jpg,.jpeg,.tiff,.tif,.webp,.bmp" style="display:none">
369
+ </div>
370
+
371
+ <div style="text-align:center;">
372
+ <button class="primary" id="btn-scan" onclick="runScan()" disabled>
373
+ &#9654; Run OCR Pipeline
374
+ </button>
375
+ </div>
376
+
377
+ <div class="progress-text hidden" id="progress">Processing&hellip;</div>
378
+
379
+ <div class="result-area hidden" id="result-area">
380
+ <h3>&#9745; Result</h3>
381
+ <div class="result-row">
382
+ <span class="result-label">Status</span>
383
+ <span class="result-value" id="res-status"></span>
384
  </div>
385
+ <div class="result-row">
386
+ <span class="result-label">Duration</span>
387
+ <span class="result-value" id="res-duration"></span>
388
+ </div>
389
+ <div class="result-row">
390
+ <span class="result-label">Exports</span>
391
+ <span class="result-value" id="res-exports"></span>
392
+ </div>
393
+ <div class="result-row" id="res-downloads-row">
394
+ <span class="result-label">Download</span>
395
+ <span class="result-value" id="res-downloads"></span>
396
+ </div>
397
+ <div class="event-log hidden" id="event-log"></div>
398
+ <div style="margin-top:6px;text-align:right;">
399
+ <button onclick="toggleLog()">Show Log</button>
400
  </div>
 
 
401
  </div>
402
 
 
 
 
 
 
 
403
  </div>
404
+ <div class="status-bar">
405
+ <span id="statusbar-left">Ready</span>
406
+ <span>XmLLM v0.1.0</span>
407
+ </div>
408
+ </div>
409
 
410
+ <!-- ═══ JOBS WINDOW ═══ -->
411
+ <div id="page-jobs" class="window hidden">
412
+ <div class="titlebar">
413
+ <span class="close-box"></span>
414
+ <span class="stripes"></span>
415
+ <span class="title-text">Documents</span>
416
+ <span class="stripes"></span>
 
 
 
 
 
 
 
 
417
  </div>
418
+ <div class="menubar">
419
+ <span>File</span>
420
+ <span>View</span>
421
+ </div>
422
+ <div class="window-body">
423
+ <table>
424
+ <thead>
425
+ <tr>
426
+ <th>ID</th>
427
+ <th>Status</th>
428
+ <th>Source</th>
429
+ <th>ALTO</th>
430
+ <th>PAGE</th>
431
+ <th>Duration</th>
432
+ <th>Actions</th>
433
+ </tr>
434
+ </thead>
435
+ <tbody id="jobs-table">
436
+ <tr><td colspan="7" style="text-align:center;color:#888;">Loading&hellip;</td></tr>
437
+ </tbody>
438
+ </table>
439
+ </div>
440
+ <div class="status-bar">
441
+ <span id="jobs-count">-</span>
442
+ <span>XmLLM v0.1.0</span>
443
+ </div>
444
+ </div>
445
 
446
+ <!-- ═══ ADVANCED WINDOW (raw payload) ═══ -->
447
+ <div id="page-advanced" class="window hidden">
448
+ <div class="titlebar">
449
+ <span class="close-box"></span>
450
+ <span class="stripes"></span>
451
+ <span class="title-text">Advanced &mdash; Raw Payload</span>
452
+ <span class="stripes"></span>
453
+ </div>
454
+ <div class="menubar">
455
+ <span>File</span>
456
+ </div>
457
+ <div class="window-body">
458
+ <p style="margin-bottom:12px;font-size:11px;color:#555;">
459
+ Upload a raw OCR JSON payload manually (for providers other than PaddleOCR).
460
+ </p>
461
+ <div style="display:grid;grid-template-columns:1fr 1fr;gap:8px;margin-bottom:8px;">
462
+ <div>
463
+ <label>Raw Payload JSON</label>
464
+ <input type="file" id="payload-file" accept=".json">
465
  </div>
466
+ <div>
467
+ <label>Provider Family</label>
468
+ <select id="provider-family" style="font-family:inherit;font-size:11px;padding:2px;width:100%;border:2px solid var(--border);">
469
+ <option value="word_box_json">word_box_json (PaddleOCR)</option>
470
+ <option value="line_box_json">line_box_json</option>
471
+ <option value="text_only">text_only (mLLM)</option>
472
+ </select>
473
  </div>
 
 
474
  </div>
475
+ <div style="display:grid;grid-template-columns:1fr 1fr 1fr;gap:8px;margin-bottom:12px;">
476
+ <div>
477
+ <label>Provider ID</label>
478
+ <input type="text" id="provider-id" value="paddleocr" style="font-family:inherit;font-size:11px;padding:2px;width:100%;border:2px solid var(--border);">
479
+ </div>
480
+ <div>
481
+ <label>Image Width (px)</label>
482
+ <input type="number" id="img-width" value="2480" style="font-family:inherit;font-size:11px;padding:2px;width:100%;border:2px solid var(--border);">
483
+ </div>
484
+ <div>
485
+ <label>Image Height (px)</label>
486
+ <input type="number" id="img-height" value="3508" style="font-family:inherit;font-size:11px;padding:2px;width:100%;border:2px solid var(--border);">
487
+ </div>
488
  </div>
489
+ <button class="primary" id="btn-run-advanced" onclick="runAdvanced()">Run Pipeline</button>
490
+ <span id="adv-status" style="margin-left:8px;font-size:11px;color:#888;"></span>
491
+ </div>
492
+ <div class="status-bar">
493
+ <span>Manual mode</span>
494
+ <span>XmLLM v0.1.0</span>
495
  </div>
496
+ </div>
497
+
498
+ <script>
499
+ const API = '';
500
+
501
+ /* ── Page switching ── */
502
+ function showPage(name) {
503
+ ['scan','jobs','advanced'].forEach(p => {
504
+ document.getElementById('page-' + p).classList.add('hidden');
505
+ });
506
+ document.getElementById('page-' + name).classList.remove('hidden');
507
+ document.querySelectorAll('.desktop-icon').forEach(d => d.classList.remove('active'));
508
+ event.currentTarget.classList.add('active');
509
+ if (name === 'jobs') loadJobs();
510
+ }
511
+
512
+ /* ── File drop zone ── */
513
+ const dropZone = document.getElementById('drop-zone');
514
+ const imageInput = document.getElementById('image-input');
515
+
516
+ ['dragenter','dragover'].forEach(e => {
517
+ dropZone.addEventListener(e, ev => { ev.preventDefault(); dropZone.classList.add('dragover'); });
518
+ });
519
+ ['dragleave','drop'].forEach(e => {
520
+ dropZone.addEventListener(e, ev => { ev.preventDefault(); dropZone.classList.remove('dragover'); });
521
+ });
522
+ dropZone.addEventListener('drop', ev => {
523
+ if (ev.dataTransfer.files.length) {
524
+ imageInput.files = ev.dataTransfer.files;
525
+ onFileSelected();
526
+ }
527
+ });
528
+ imageInput.addEventListener('change', onFileSelected);
529
+
530
+ function onFileSelected() {
531
+ const f = imageInput.files[0];
532
+ if (!f) return;
533
+ const label = document.getElementById('file-label');
534
+ label.textContent = f.name + ' (' + (f.size / 1024).toFixed(0) + ' KB)';
535
+ label.classList.remove('hidden');
536
+ document.getElementById('btn-scan').disabled = false;
537
+ document.getElementById('statusbar-left').textContent = 'Image loaded: ' + f.name;
538
+ }
539
+
540
+ /* ── Main scan (image → OCR → XML) ── */
541
+ async function runScan() {
542
+ const f = imageInput.files[0];
543
+ if (!f) return;
544
+
545
+ const btn = document.getElementById('btn-scan');
546
+ const progress = document.getElementById('progress');
547
+ const result = document.getElementById('result-area');
548
+
549
+ btn.disabled = true;
550
+ progress.classList.remove('hidden');
551
+ result.classList.add('hidden');
552
+ document.getElementById('statusbar-left').textContent = 'Running OCR pipeline\u2026';
553
+
554
+ const fd = new FormData();
555
+ fd.append('image', f);
556
+
557
+ try {
558
+ const r = await fetch(API + '/ocr', { method: 'POST', body: fd });
559
+ const data = await r.json();
560
+
561
+ if (!r.ok) {
562
+ progress.textContent = 'Error: ' + (data.detail || r.statusText);
563
+ progress.style.color = 'var(--error)';
564
+ document.getElementById('statusbar-left').textContent = 'Error';
565
+ btn.disabled = false;
566
+ return;
567
+ }
568
+
569
+ progress.classList.add('hidden');
570
+ progress.style.color = '';
571
+ progress.textContent = 'Processing\u2026';
572
+ btn.disabled = false;
573
+ showResult(data);
574
+ document.getElementById('statusbar-left').textContent = 'Done \u2014 ' + data.status;
575
+ } catch(e) {
576
+ progress.textContent = 'Network error: ' + e.message;
577
+ progress.style.color = 'var(--error)';
578
+ btn.disabled = false;
579
+ }
580
+ }
581
+
582
+ function showResult(data) {
583
+ const area = document.getElementById('result-area');
584
+ area.classList.remove('hidden');
585
+
586
+ document.getElementById('res-status').innerHTML =
587
+ '<span class="status-badge ' + data.status + '">' + data.status.toUpperCase() + '</span>';
588
+
589
+ document.getElementById('res-duration').textContent =
590
+ data.duration_ms ? Math.round(data.duration_ms) + ' ms' : '\u2014';
591
+
592
+ document.getElementById('res-exports').innerHTML =
593
+ '<span class="tag ' + (data.has_alto ? 'ok' : 'no') + '">ALTO ' + (data.has_alto ? '\u2713' : '\u2717') + '</span> ' +
594
+ '<span class="tag ' + (data.has_page_xml ? 'ok' : 'no') + '">PAGE ' + (data.has_page_xml ? '\u2713' : '\u2717') + '</span>';
595
+
596
+ const dl = document.getElementById('res-downloads');
597
+ const id = data.job_id;
598
+ dl.innerHTML = '';
599
+ if (data.has_alto)
600
+ dl.innerHTML += '<a class="download-btn" href="' + API + '/jobs/' + id + '/alto">&#128196; ALTO XML</a>';
601
+ if (data.has_page_xml)
602
+ dl.innerHTML += '<a class="download-btn" href="' + API + '/jobs/' + id + '/pagexml">&#128196; PAGE XML</a>';
603
+ dl.innerHTML += '<a class="download-btn" href="' + API + '/jobs/' + id + '/canonical" target="_blank">&#128203; Canonical JSON</a>';
604
+
605
+ // Load event log
606
+ fetch(API + '/jobs/' + id + '/logs').then(r => r.json()).then(events => {
607
+ document.getElementById('event-log').textContent = events.map(e =>
608
+ '[' + e.status.padEnd(9) + '] ' + e.step.padEnd(20) + ' ' +
609
+ (e.duration_ms ? Math.round(e.duration_ms) + 'ms' : e.message || '')
610
+ ).join('\n');
611
+ }).catch(() => {});
612
+ }
613
+
614
+ function toggleLog() {
615
+ const log = document.getElementById('event-log');
616
+ log.classList.toggle('hidden');
617
+ }
618
+
619
+ /* ── Jobs list ── */
620
+ async function loadJobs() {
621
+ try {
622
+ const r = await fetch(API + '/jobs');
623
+ const jobs = await r.json();
624
+ const tbody = document.getElementById('jobs-table');
625
+ if (!jobs.length) {
626
+ tbody.innerHTML = '<tr><td colspan="7" style="text-align:center;color:#888;">No documents yet</td></tr>';
627
+ document.getElementById('jobs-count').textContent = '0 documents';
628
+ return;
629
+ }
630
+ tbody.innerHTML = jobs.map(j => `<tr>
631
+ <td style="font-family:monospace;font-size:10px;">${j.job_id.slice(-8)}</td>
632
+ <td><span class="status-badge ${j.status}">${j.status}</span></td>
633
+ <td>${j.source_filename || '\u2014'}</td>
634
+ <td><span class="tag ${j.has_alto ? 'ok' : 'no'}">${j.has_alto ? '\u2713' : '\u2717'}</span></td>
635
+ <td><span class="tag ${j.has_page_xml ? 'ok' : 'no'}">${j.has_page_xml ? '\u2713' : '\u2717'}</span></td>
636
+ <td>${j.duration_ms ? Math.round(j.duration_ms) + 'ms' : '\u2014'}</td>
637
+ <td>
638
+ ${j.has_alto ? `<a class="download-btn" href="${API}/jobs/${j.job_id}/alto" style="font-size:10px;">ALTO</a>` : ''}
639
+ ${j.has_page_xml ? `<a class="download-btn" href="${API}/jobs/${j.job_id}/pagexml" style="font-size:10px;">PAGE</a>` : ''}
640
+ </td>
641
+ </tr>`).join('');
642
+ document.getElementById('jobs-count').textContent = jobs.length + ' document' + (jobs.length > 1 ? 's' : '');
643
+ } catch(e) {
644
+ document.getElementById('jobs-table').innerHTML =
645
+ '<tr><td colspan="7" style="color:var(--error);">Failed to load</td></tr>';
646
+ }
647
+ }
648
+
649
+ /* ── Advanced (raw payload) ── */
650
+ async function runAdvanced() {
651
+ const fileInput = document.getElementById('payload-file');
652
+ if (!fileInput.files.length) { alert('Select a payload JSON file'); return; }
653
+ const btn = document.getElementById('btn-run-advanced');
654
+ const status = document.getElementById('adv-status');
655
+ btn.disabled = true;
656
+ status.textContent = 'Running\u2026';
657
+
658
+ const fd = new FormData();
659
+ fd.append('raw_payload_file', fileInput.files[0]);
660
+
661
+ const params = new URLSearchParams({
662
+ provider_id: document.getElementById('provider-id').value,
663
+ provider_family: document.getElementById('provider-family').value,
664
+ image_width: document.getElementById('img-width').value,
665
+ image_height: document.getElementById('img-height').value,
666
+ });
667
+
668
+ try {
669
+ const r = await fetch(API + '/jobs?' + params, { method: 'POST', body: fd });
670
+ const data = await r.json();
671
+ btn.disabled = false;
672
+ status.textContent = r.ok
673
+ ? '\u2713 ' + data.status + (data.duration_ms ? ' (' + Math.round(data.duration_ms) + 'ms)' : '')
674
+ : 'Error: ' + (data.detail || r.statusText);
675
+ } catch(e) {
676
+ btn.disabled = false;
677
+ status.textContent = 'Error: ' + e.message;
678
+ }
679
+ }
680
+ </script>
681
  </body>
682
  </html>
pyproject.toml CHANGED
@@ -23,6 +23,8 @@ dependencies = [
23
  "python-multipart>=0.0.9",
24
  "aiosqlite>=0.20,<1",
25
  "httpx>=0.27,<1",
 
 
26
  ]
27
 
28
  [project.optional-dependencies]
 
23
  "python-multipart>=0.0.9",
24
  "aiosqlite>=0.20,<1",
25
  "httpx>=0.27,<1",
26
+ "paddlepaddle>=2.6,<3",
27
+ "paddleocr>=2.8",
28
  ]
29
 
30
  [project.optional-dependencies]
src/app/api/routes_ocr.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Image upload route — accepts an image, runs OCR, produces XML."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import tempfile
6
+ from pathlib import Path
7
+ from typing import Any
8
+
9
+ from fastapi import APIRouter, HTTPException, UploadFile
10
+
11
+ from src.app.api import get_job_service
12
+ from src.app.domain.models import RawProviderPayload
13
+
14
+ router = APIRouter(tags=["ocr"])
15
+
16
+ ALLOWED_EXTENSIONS = {".png", ".jpg", ".jpeg", ".tiff", ".tif", ".webp", ".bmp"}
17
+ MAX_IMAGE_SIZE = 20 * 1024 * 1024 # 20 MB
18
+
19
+
20
+ @router.post("/ocr", status_code=201)
21
+ async def ocr_image(image: UploadFile) -> dict[str, Any]:
22
+ """Upload an image, run PaddleOCR, and produce ALTO/PAGE XML.
23
+
24
+ This is the main user-facing endpoint — no JSON payload needed.
25
+ """
26
+ if not image.filename:
27
+ raise HTTPException(status_code=422, detail="No filename provided")
28
+
29
+ ext = Path(image.filename).suffix.lower()
30
+ if ext not in ALLOWED_EXTENSIONS:
31
+ raise HTTPException(
32
+ status_code=422,
33
+ detail=f"Unsupported format '{ext}'. Use: {', '.join(sorted(ALLOWED_EXTENSIONS))}",
34
+ )
35
+
36
+ content = await image.read(MAX_IMAGE_SIZE + 1)
37
+ if len(content) > MAX_IMAGE_SIZE:
38
+ raise HTTPException(status_code=413, detail="Image too large (max 20 MB)")
39
+
40
+ with tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp:
41
+ tmp.write(content)
42
+ tmp_path = Path(tmp.name)
43
+
44
+ try:
45
+ from PIL import Image as PILImage
46
+ with PILImage.open(tmp_path) as img:
47
+ image_width, image_height = img.size
48
+
49
+ from src.app.ocr import paddle_result_to_payload, run_paddle_ocr
50
+ results = run_paddle_ocr(tmp_path)
51
+
52
+ if not results:
53
+ raise HTTPException(
54
+ status_code=422,
55
+ detail="PaddleOCR found no text in this image",
56
+ )
57
+
58
+ payload = paddle_result_to_payload(results)
59
+
60
+ raw = RawProviderPayload(
61
+ provider_id="paddleocr",
62
+ adapter_id="adapter.word_box_json.v1",
63
+ runtime_type="local",
64
+ payload=payload,
65
+ image_width=image_width,
66
+ image_height=image_height,
67
+ )
68
+
69
+ svc = get_job_service()
70
+ job = svc.create_job(
71
+ provider_id="paddleocr",
72
+ provider_family="word_box_json",
73
+ source_filename=image.filename,
74
+ )
75
+
76
+ result = svc.run_job(
77
+ job, raw,
78
+ image_width=image_width,
79
+ image_height=image_height,
80
+ image_path=tmp_path,
81
+ )
82
+
83
+ return result.to_summary()
84
+
85
+ finally:
86
+ tmp_path.unlink(missing_ok=True)
src/app/main.py CHANGED
@@ -18,6 +18,7 @@ from src.app.api import init_services, shutdown_services
18
  from src.app.api.routes_exports import router as exports_router
19
  from src.app.api.routes_health import router as health_router
20
  from src.app.api.routes_jobs import router as jobs_router
 
21
  from src.app.api.routes_providers import router as providers_router
22
  from src.app.api.routes_viewer import router as viewer_router
23
  from src.app.settings import get_settings
@@ -54,6 +55,7 @@ app.add_middleware(
54
  # -- Register routers --------------------------------------------------------
55
 
56
  app.include_router(health_router)
 
57
  app.include_router(providers_router)
58
  app.include_router(jobs_router)
59
  app.include_router(exports_router)
 
18
  from src.app.api.routes_exports import router as exports_router
19
  from src.app.api.routes_health import router as health_router
20
  from src.app.api.routes_jobs import router as jobs_router
21
+ from src.app.api.routes_ocr import router as ocr_router
22
  from src.app.api.routes_providers import router as providers_router
23
  from src.app.api.routes_viewer import router as viewer_router
24
  from src.app.settings import get_settings
 
55
  # -- Register routers --------------------------------------------------------
56
 
57
  app.include_router(health_router)
58
+ app.include_router(ocr_router)
59
  app.include_router(providers_router)
60
  app.include_router(jobs_router)
61
  app.include_router(exports_router)
src/app/ocr/__init__.py ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """PaddleOCR integration — runs OCR on an image and returns word_box_json payload."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import logging
6
+ from pathlib import Path
7
+ from typing import Any
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ _ocr_instance = None
12
+
13
+
14
+ def _get_ocr() -> Any:
15
+ """Lazy-load PaddleOCR (heavy import, ~2s first time)."""
16
+ global _ocr_instance
17
+ if _ocr_instance is None:
18
+ from paddleocr import PaddleOCR
19
+ _ocr_instance = PaddleOCR(use_angle_cls=True, lang="fr", show_log=False)
20
+ return _ocr_instance
21
+
22
+
23
+ def run_paddle_ocr(image_path: Path) -> list[Any]:
24
+ """Run PaddleOCR on an image file.
25
+
26
+ Returns the raw PaddleOCR result in word_box_json format:
27
+ [[[x1,y1],[x2,y2],[x3,y3],[x4,y4]], ("text", confidence)]
28
+ """
29
+ ocr = _get_ocr()
30
+ results = ocr.ocr(str(image_path), cls=True)
31
+ if not results or not results[0]:
32
+ return []
33
+ return results[0]
34
+
35
+
36
+ def paddle_result_to_payload(results: list[Any]) -> list[Any]:
37
+ """Convert PaddleOCR results to the word_box_json payload format."""
38
+ payload = []
39
+ for item in results:
40
+ points = item[0]
41
+ text, conf = item[1]
42
+ payload.append([points, [text, conf]])
43
+ return payload