MaxLeft commited on
Commit
91f934f
·
verified ·
1 Parent(s): 3cd28a8

Add 3 files

Browse files
Files changed (3) hide show
  1. README.md +7 -5
  2. index.html +1031 -19
  3. prompts.txt +18 -0
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
- title: Yolo Detection App
3
- emoji: 🚀
4
- colorFrom: blue
5
- colorTo: red
6
  sdk: static
7
  pinned: false
 
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: yolo-detection-app
3
+ emoji: 🐳
4
+ colorFrom: yellow
5
+ colorTo: yellow
6
  sdk: static
7
  pinned: false
8
+ tags:
9
+ - deepsite
10
  ---
11
 
12
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
index.html CHANGED
@@ -1,19 +1,1031 @@
1
- <!doctype html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8" />
5
- <meta name="viewport" content="width=device-width" />
6
- <title>My static Space</title>
7
- <link rel="stylesheet" href="style.css" />
8
- </head>
9
- <body>
10
- <div class="card">
11
- <h1>Welcome to your static Space!</h1>
12
- <p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
13
- <p>
14
- Also don't forget to check the
15
- <a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
16
- </p>
17
- </div>
18
- </body>
19
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>ONNX YOLO Segmentation Web Demo</title>
7
+ <script src="https://cdn.tailwindcss.com"></script>
8
+ <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
9
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
10
+ <style>
11
+ .detection-box {
12
+ position: absolute;
13
+ border: 2px solid #3B82F6;
14
+ background-color: rgba(59, 130, 246, 0.2);
15
+ display: flex;
16
+ flex-direction: column;
17
+ align-items: center;
18
+ justify-content: flex-end;
19
+ color: white;
20
+ font-weight: bold;
21
+ font-size: 12px;
22
+ }
23
+
24
+ .detection-label {
25
+ background-color: #3B82F6;
26
+ padding: 2px 5px;
27
+ border-radius: 3px;
28
+ margin-bottom: 2px;
29
+ }
30
+
31
+ .pulse {
32
+ animation: pulse 2s infinite;
33
+ }
34
+
35
+ @keyframes pulse {
36
+ 0% {
37
+ box-shadow: 0 0 0 0 rgba(59, 130, 246, 0.7);
38
+ }
39
+ 70% {
40
+ box-shadow: 0 0 0 10px rgba(59, 130, 246, 0);
41
+ }
42
+ 100% {
43
+ box-shadow: 0 0 0 0 rgba(59, 130, 246, 0);
44
+ }
45
+ }
46
+
47
+ #video-container {
48
+ position: relative;
49
+ width: 100%;
50
+ max-width: 640px;
51
+ margin: 0 auto;
52
+ border-radius: 8px;
53
+ overflow: hidden;
54
+ box-shadow: 0 10px 15px -3px rgba(0, 0, 0, 0.1), 0 4px 6px -2px rgba(0, 0, 0, 0.05);
55
+ }
56
+
57
+ #video, #canvas {
58
+ width: 100%;
59
+ height: auto;
60
+ display: block;
61
+ }
62
+
63
+ #canvas {
64
+ position: absolute;
65
+ top: 0;
66
+ left: 0;
67
+ z-index: 10;
68
+ }
69
+
70
+ #segmentation {
71
+ position: absolute;
72
+ top: 0;
73
+ left: 0;
74
+ z-index: 5;
75
+ opacity: 0.5;
76
+ }
77
+
78
+ .dropzone {
79
+ border: 2px dashed #4B5563;
80
+ border-radius: 8px;
81
+ padding: 20px;
82
+ text-align: center;
83
+ cursor: pointer;
84
+ transition: all 0.3s;
85
+ }
86
+
87
+ .dropzone:hover {
88
+ border-color: #3B82F6;
89
+ background-color: rgba(59, 130, 246, 0.1);
90
+ }
91
+
92
+ .dropzone.active {
93
+ border-color: #3B82F6;
94
+ background-color: rgba(59, 130, 246, 0.2);
95
+ }
96
+
97
+ .status-badge {
98
+ display: inline-flex;
99
+ align-items: center;
100
+ padding: 4px 8px;
101
+ border-radius: 9999px;
102
+ font-size: 12px;
103
+ font-weight: 600;
104
+ }
105
+
106
+ .status-badge.ready {
107
+ background-color: rgba(16, 185, 129, 0.2);
108
+ color: #10B981;
109
+ }
110
+
111
+ .status-badge.loading {
112
+ background-color: rgba(245, 158, 11, 0.2);
113
+ color: #F59E0B;
114
+ }
115
+
116
+ .status-badge.error {
117
+ background-color: rgba(239, 68, 68, 0.2);
118
+ color: #EF4444;
119
+ }
120
+
121
+ .status-badge.disabled {
122
+ background-color: rgba(75, 85, 99, 0.2);
123
+ color: #4B5563;
124
+ }
125
+
126
+ .output-log {
127
+ font-family: 'Courier New', Courier, monospace;
128
+ background-color: rgba(31, 41, 55, 0.8);
129
+ border-radius: 8px;
130
+ padding: 16px;
131
+ max-height: 200px;
132
+ overflow-y: auto;
133
+ }
134
+
135
+ .legend {
136
+ display: flex;
137
+ flex-wrap: wrap;
138
+ gap: 8px;
139
+ margin-top: 8px;
140
+ }
141
+
142
+ .legend-item {
143
+ display: flex;
144
+ align-items: center;
145
+ font-size: 12px;
146
+ }
147
+
148
+ .legend-color {
149
+ width: 16px;
150
+ height: 16px;
151
+ border-radius: 3px;
152
+ margin-right: 4px;
153
+ }
154
+
155
+ .confidence-bar {
156
+ height: 4px;
157
+ background-color: #4B5563;
158
+ border-radius: 2px;
159
+ margin-top: 2px;
160
+ overflow: hidden;
161
+ }
162
+
163
+ .confidence-fill {
164
+ height: 100%;
165
+ background-color: #10B981;
166
+ }
167
+
168
+ .debug-output {
169
+ font-family: 'Courier New', Courier, monospace;
170
+ background-color: rgba(31, 41, 55, 0.8);
171
+ border-radius: 8px;
172
+ padding: 16px;
173
+ max-height: 200px;
174
+ overflow-y: auto;
175
+ margin-top: 16px;
176
+ font-size: 12px;
177
+ white-space: pre-wrap;
178
+ }
179
+ </style>
180
+ </head>
181
+ <body class="bg-gray-900 text-gray-100 min-h-screen">
182
+ <div class="container mx-auto px-4 py-8">
183
+ <header class="text-center mb-8">
184
+ <h1 class="text-3xl md:text-4xl font-bold mb-2 text-blue-400">
185
+ <i class="fas fa-shapes mr-2"></i> YOLO Segmentation Web Demo
186
+ </h1>
187
+ <p class="text-gray-400 max-w-2xl mx-auto">
188
+ Real-time instance segmentation with YOLO ONNX models in your browser
189
+ </p>
190
+ </header>
191
+
192
+ <div class="max-w-4xl mx-auto">
193
+ <div class="grid grid-cols-1 md:grid-cols-2 gap-8">
194
+ <!-- Left column - Controls -->
195
+ <div class="space-y-6">
196
+ <!-- Model Selection -->
197
+ <div class="bg-gray-800 rounded-lg p-6 shadow-lg">
198
+ <h2 class="text-xl font-bold mb-4 text-blue-400">
199
+ <i class="fas fa-file-export mr-2"></i> Model Selection
200
+ </h2>
201
+
202
+ <div id="dropzone" class="dropzone mb-4">
203
+ <div class="flex flex-col items-center justify-center py-4">
204
+ <i class="fas fa-file-upload text-4xl text-blue-400 mb-2"></i>
205
+ <p class="text-gray-300">Drag & drop your YOLO ONNX model file here</p>
206
+ <p class="text-gray-400 text-sm mt-1">or click to browse</p>
207
+ <input type="file" id="modelFile" accept=".onnx" class="hidden" />
208
+ </div>
209
+ </div>
210
+
211
+ <div class="flex items-center justify-between">
212
+ <div>
213
+ <p id="modelStatusText" class="text-sm text-gray-400">No model selected</p>
214
+ <p id="modelSizeText" class="text-xs text-gray-500"></p>
215
+ </div>
216
+ <span id="modelStatusBadge" class="status-badge disabled">
217
+ <i class="fas fa-times-circle mr-1"></i> Not Loaded
218
+ </span>
219
+ </div>
220
+ </div>
221
+
222
+ <!-- Detection Settings -->
223
+ <div class="bg-gray-800 rounded-lg p-6 shadow-lg">
224
+ <h2 class="text-xl font-bold mb-4 text-blue-400">
225
+ <i class="fas fa-sliders-h mr-2"></i> Detection Settings
226
+ </h2>
227
+
228
+ <div class="space-y-4">
229
+ <div>
230
+ <label for="confidenceThreshold" class="block text-sm font-medium text-gray-300 mb-1">
231
+ Confidence Threshold: <span id="confidenceValue">0.5</span>
232
+ </label>
233
+ <input type="range" id="confidenceThreshold" min="0" max="1" step="0.05" value="0.5"
234
+ class="w-full h-2 bg-gray-700 rounded-lg appearance-none cursor-pointer">
235
+ </div>
236
+
237
+ <div>
238
+ <label for="iouThreshold" class="block text-sm font-medium text-gray-300 mb-1">
239
+ IOU Threshold: <span id="iouValue">0.45</span>
240
+ </label>
241
+ <input type="range" id="iouThreshold" min="0" max="1" step="0.05" value="0.45"
242
+ class="w-full h-2 bg-gray-700 rounded-lg appearance-none cursor-pointer">
243
+ </div>
244
+
245
+ <div class="flex items-center justify-between">
246
+ <label for="showMasks" class="text-sm font-medium text-gray-300">
247
+ Show Segmentation Masks
248
+ </label>
249
+ <label class="relative inline-flex items-center cursor-pointer">
250
+ <input type="checkbox" id="showMasks" class="sr-only peer" checked>
251
+ <div class="w-11 h-6 bg-gray-700 peer-focus:outline-none rounded-full peer peer-checked:after:translate-x-full peer-checked:after:border-white after:content-[''] after:absolute after:top-[2px] after:left-[2px] after:bg-white after:border-gray-300 after:border after:rounded-full after:h-5 after:w-5 after:transition-all peer-checked:bg-blue-600"></div>
252
+ </label>
253
+ </div>
254
+ </div>
255
+ </div>
256
+
257
+ <!-- Webcam Controls -->
258
+ <div class="bg-gray-800 rounded-lg p-6 shadow-lg">
259
+ <h2 class="text-xl font-bold mb-4 text-blue-400">
260
+ <i class="fas fa-video mr-2"></i> Webcam Controls
261
+ </h2>
262
+
263
+ <div class="flex flex-col space-y-4">
264
+ <button id="startBtn" class="bg-green-600 hover:bg-green-700 text-white font-bold py-3 px-6 rounded-lg flex items-center justify-center disabled:opacity-50 disabled:cursor-not-allowed" disabled>
265
+ <i class="fas fa-play mr-2"></i> Start Detection
266
+ </button>
267
+
268
+ <div class="flex items-center justify-between">
269
+ <div>
270
+ <p class="text-sm text-gray-400">Webcam Status</p>
271
+ </div>
272
+ <span id="webcamStatusBadge" class="status-badge disabled">
273
+ <i class="fas fa-times-circle mr-1"></i> Inactive
274
+ </span>
275
+ </div>
276
+ </div>
277
+ </div>
278
+
279
+ <!-- Performance Stats -->
280
+ <div class="bg-gray-800 rounded-lg p-6 shadow-lg">
281
+ <h2 class="text-xl font-bold mb-4 text-blue-400">
282
+ <i class="fas fa-tachometer-alt mr-2"></i> Performance
283
+ </h2>
284
+
285
+ <div class="grid grid-cols-2 gap-4">
286
+ <div class="bg-gray-700 p-4 rounded-lg text-center">
287
+ <div class="text-2xl font-bold text-blue-400" id="fpsCounter">-</div>
288
+ <div class="text-gray-300 text-sm">FPS</div>
289
+ </div>
290
+ <div class="bg-gray-700 p-4 rounded-lg text-center">
291
+ <div class="text-2xl font-bold text-green-400" id="inferenceTime">-</div>
292
+ <div class="text-gray-300 text-sm">ms/inference</div>
293
+ </div>
294
+ </div>
295
+ </div>
296
+ </div>
297
+
298
+ <!-- Right column - Output -->
299
+ <div class="space-y-6">
300
+ <!-- Video Feed -->
301
+ <div class="bg-gray-800 rounded-lg p-6 shadow-lg">
302
+ <h2 class="text-xl font-bold mb-4 text-blue-400">
303
+ <i class="fas fa-eye mr-2"></i> Live Detection
304
+ </h2>
305
+
306
+ <div id="video-container" class="relative">
307
+ <div id="videoPlaceholder" class="bg-gray-700 rounded-lg flex items-center justify-center aspect-square">
308
+ <div class="text-center p-8">
309
+ <i class="fas fa-camera text-4xl text-gray-500 mb-4"></i>
310
+ <p class="text-gray-400">Webcam feed will appear here</p>
311
+ </div>
312
+ </div>
313
+ <video id="video" autoplay playsinline muted class="hidden"></video>
314
+ <canvas id="segmentation" class="hidden"></canvas>
315
+ <canvas id="canvas" class="hidden"></canvas>
316
+ </div>
317
+
318
+ <div id="detectionLegend" class="legend mt-4 hidden">
319
+ <!-- Legend items will be added dynamically -->
320
+ </div>
321
+ </div>
322
+
323
+ <!-- Output Log -->
324
+ <div class="bg-gray-800 rounded-lg p-6 shadow-lg">
325
+ <h2 class="text-xl font-bold mb-4 text-blue-400">
326
+ <i class="fas fa-terminal mr-2"></i> Output Log
327
+ </h2>
328
+
329
+ <div class="output-log text-sm" id="log">
330
+ <p class="text-gray-400">Waiting for model to load...</p>
331
+ </div>
332
+ </div>
333
+
334
+ <!-- Debug Output -->
335
+ <div class="bg-gray-800 rounded-lg p-6 shadow-lg">
336
+ <h2 class="text-xl font-bold mb-4 text-blue-400">
337
+ <i class="fas fa-bug mr-2"></i> Debug Output
338
+ </h2>
339
+
340
+ <div class="debug-output" id="debugOutput">
341
+ <p class="text-gray-400">Raw tensor output will appear here</p>
342
+ </div>
343
+ </div>
344
+ </div>
345
+ </div>
346
+ </div>
347
+
348
+ <footer class="mt-12 text-center text-gray-500 text-sm">
349
+ <p>Powered by ONNX Runtime Web - All processing happens in your browser</p>
350
+ </footer>
351
+ </div>
352
+
353
+ <script>
354
+ // DOM elements
355
+ const video = document.getElementById('video');
356
+ const canvas = document.getElementById('canvas');
357
+ const segmentationCanvas = document.getElementById('segmentation');
358
+ const ctx = canvas.getContext('2d');
359
+ const segCtx = segmentationCanvas.getContext('2d');
360
+ const startBtn = document.getElementById('startBtn');
361
+ const logElement = document.getElementById('log');
362
+ const debugOutput = document.getElementById('debugOutput');
363
+ const modelFileInput = document.getElementById('modelFile');
364
+ const dropzone = document.getElementById('dropzone');
365
+ const modelStatusText = document.getElementById('modelStatusText');
366
+ const modelSizeText = document.getElementById('modelSizeText');
367
+ const modelStatusBadge = document.getElementById('modelStatusBadge');
368
+ const webcamStatusBadge = document.getElementById('webcamStatusBadge');
369
+ const fpsCounter = document.getElementById('fpsCounter');
370
+ const inferenceTime = document.getElementById('inferenceTime');
371
+ const videoPlaceholder = document.getElementById('videoPlaceholder');
372
+ const videoContainer = document.getElementById('video-container');
373
+ const confidenceThreshold = document.getElementById('confidenceThreshold');
374
+ const iouThreshold = document.getElementById('iouThreshold');
375
+ const confidenceValue = document.getElementById('confidenceValue');
376
+ const iouValue = document.getElementById('iouValue');
377
+ const showMasks = document.getElementById('showMasks');
378
+ const detectionLegend = document.getElementById('detectionLegend');
379
+
380
+ // App state
381
+ let session = null;
382
+ let modelBuffer = null;
383
+ let isRunning = false;
384
+ let frameCount = 0;
385
+ let lastFpsUpdate = 0;
386
+ let fps = 0;
387
+ let lastInferenceTime = 0;
388
+ let classColors = {};
389
+ let classNames = {}; // Will be populated based on model output
390
+
391
+ // Update log with timestamp
392
+ function log(message) {
393
+ const now = new Date();
394
+ const timestamp = now.toLocaleTimeString();
395
+ const logEntry = document.createElement('p');
396
+ logEntry.innerHTML = `<span class="text-gray-500">[${timestamp}]</span> ${message}`;
397
+ logElement.appendChild(logEntry);
398
+ logElement.scrollTop = logElement.scrollHeight;
399
+ }
400
+
401
+ // Update debug output with raw tensor data
402
+ function debugLog(message) {
403
+ const debugEntry = document.createElement('div');
404
+ debugEntry.textContent = message;
405
+ debugOutput.appendChild(debugEntry);
406
+ debugOutput.scrollTop = debugOutput.scrollHeight;
407
+ }
408
+
409
+ // Generate random colors for classes
410
+ function generateClassColors(count) {
411
+ const colors = {};
412
+ for (let i = 0; i < count; i++) {
413
+ // Generate a bright color
414
+ const hue = (i * 360 / count) % 360;
415
+ colors[i] = `hsl(${hue}, 80%, 60%)`;
416
+ }
417
+ return colors;
418
+ }
419
+
420
+ // Update settings UI
421
+ confidenceThreshold.addEventListener('input', () => {
422
+ confidenceValue.textContent = confidenceThreshold.value;
423
+ });
424
+
425
+ iouThreshold.addEventListener('input', () => {
426
+ iouValue.textContent = iouThreshold.value;
427
+ });
428
+
429
+ // Set up dropzone interactions
430
+ dropzone.addEventListener('click', () => {
431
+ modelFileInput.click();
432
+ });
433
+
434
+ ['dragenter', 'dragover', 'dragleave', 'drop'].forEach(eventName => {
435
+ dropzone.addEventListener(eventName, preventDefaults, false);
436
+ });
437
+
438
+ function preventDefaults(e) {
439
+ e.preventDefault();
440
+ e.stopPropagation();
441
+ }
442
+
443
+ ['dragenter', 'dragover'].forEach(eventName => {
444
+ dropzone.addEventListener(eventName, highlight, false);
445
+ });
446
+
447
+ ['dragleave', 'drop'].forEach(eventName => {
448
+ dropzone.addEventListener(eventName, unhighlight, false);
449
+ });
450
+
451
+ function highlight() {
452
+ dropzone.classList.add('active');
453
+ }
454
+
455
+ function unhighlight() {
456
+ dropzone.classList.remove('active');
457
+ }
458
+
459
+ dropzone.addEventListener('drop', handleDrop, false);
460
+
461
+ // Handle model file selection
462
+ function handleDrop(e) {
463
+ const dt = e.dataTransfer;
464
+ const files = dt.files;
465
+
466
+ if (files.length > 0 && files[0].name.endsWith('.onnx')) {
467
+ handleModelFile(files[0]);
468
+ }
469
+ }
470
+
471
+ modelFileInput.addEventListener('change', (e) => {
472
+ const files = e.target.files;
473
+ if (files.length > 0 && files[0].name.endsWith('.onnx')) {
474
+ handleModelFile(files[0]);
475
+ }
476
+ });
477
+
478
+ // Process the selected model file
479
+ async function handleModelFile(file) {
480
+ try {
481
+ // Update UI
482
+ modelStatusText.textContent = `Loading ${file.name}...`;
483
+ modelSizeText.textContent = `(${(file.size/1e6).toFixed(1)} MB)`;
484
+ modelStatusBadge.className = 'status-badge loading';
485
+ modelStatusBadge.innerHTML = '<i class="fas fa-spinner fa-spin mr-1"></i> Loading';
486
+ startBtn.disabled = true;
487
+
488
+ // Read the file
489
+ const reader = new FileReader();
490
+ reader.onload = async (ev) => {
491
+ modelBuffer = ev.target.result;
492
+
493
+ // Initialize ONNX session
494
+ log(`Initializing ONNX session for ${file.name}`);
495
+
496
+ try {
497
+ // Create session options with WebGL and WASM backends
498
+ const sessionOptions = {
499
+ executionProviders: ['webgl', 'wasm'],
500
+ graphOptimizationLevel: 'all'
501
+ };
502
+
503
+ // Try to create session with WebGL first, fall back to WASM if needed
504
+ try {
505
+ session = await ort.InferenceSession.create(modelBuffer, sessionOptions);
506
+ } catch (webglError) {
507
+ log(`WebGL backend failed, falling back to WASM: ${webglError.message}`);
508
+ sessionOptions.executionProviders = ['wasm'];
509
+ session = await ort.InferenceSession.create(modelBuffer, sessionOptions);
510
+ }
511
+
512
+ // Generate colors for classes (assuming 80 classes for YOLO)
513
+ classColors = generateClassColors(80);
514
+
515
+ // Success
516
+ modelStatusText.textContent = `Loaded: ${file.name}`;
517
+ modelStatusBadge.className = 'status-badge ready';
518
+ modelStatusBadge.innerHTML = '<i class="fas fa-check-circle mr-1"></i> Ready';
519
+ startBtn.disabled = false;
520
+
521
+ log(`Model loaded successfully with ${session.inputNames.length} inputs and ${session.outputNames.length} outputs`);
522
+ log(`Input shape: ${JSON.stringify(session.inputs[0].dims)}`);
523
+
524
+ // Check if this is a segmentation model
525
+ const isSegmentation = session.outputNames.some(name => name.includes('mask'));
526
+ log(`Model type: ${isSegmentation ? 'Segmentation' : 'Detection'}`);
527
+
528
+ } catch (error) {
529
+ modelStatusText.textContent = `Model loaded (${file.name})`;
530
+ modelStatusBadge.className = 'status-badge ready';
531
+ modelStatusBadge.innerHTML = '<i class="fas fa-check-circle mr-1"></i> Ready';
532
+ log(`Model initialization completed with warnings: ${error.message}`);
533
+ console.log('Model loaded but with warnings:', error);
534
+
535
+ // Try to create session anyway (some models might still work despite warnings)
536
+ session = await ort.InferenceSession.create(modelBuffer);
537
+ startBtn.disabled = false;
538
+ }
539
+ };
540
+ reader.onerror = (error) => {
541
+ modelStatusText.textContent = `Error reading file`;
542
+ modelStatusBadge.className = 'status-badge error';
543
+ modelStatusBadge.innerHTML = '<i class="fas fa-exclamation-circle mr-1"></i> Error';
544
+ log(`File read error: ${error.target.error}`);
545
+ };
546
+
547
+ reader.readAsArrayBuffer(file);
548
+
549
+ } catch (error) {
550
+ log(`Error handling model file: ${error.message}`);
551
+ console.error(error);
552
+ }
553
+ }
554
+
555
+ // Start webcam and detection
556
+ startBtn.addEventListener('click', async () => {
557
+ if (isRunning) {
558
+ // Stop detection
559
+ isRunning = false;
560
+ startBtn.innerHTML = '<i class="fas fa-play mr-2"></i> Start Detection';
561
+ startBtn.classList.remove('bg-red-600', 'hover:bg-red-700');
562
+ startBtn.classList.add('bg-green-600', 'hover:bg-green-700');
563
+ webcamStatusBadge.className = 'status-badge disabled';
564
+ webcamStatusBadge.innerHTML = '<i class="fas fa-times-circle mr-1"></i> Inactive';
565
+ log('Detection stopped');
566
+ return;
567
+ }
568
+
569
+ try {
570
+ // Get webcam access
571
+ log('Requesting webcam access...');
572
+ const stream = await navigator.mediaDevices.getUserMedia({
573
+ video: {
574
+ width: { ideal: 640 },
575
+ height: { ideal: 640 },
576
+ facingMode: 'environment'
577
+ },
578
+ audio: false
579
+ });
580
+
581
+ // Set up video element
582
+ video.srcObject = stream;
583
+ await video.play();
584
+
585
+ // Wait for video dimensions to be available
586
+ await new Promise(resolve => {
587
+ const checkDimensions = () => {
588
+ if (video.videoWidth > 0 && video.videoHeight > 0) {
589
+ resolve();
590
+ } else {
591
+ setTimeout(checkDimensions, 50);
592
+ }
593
+ };
594
+ checkDimensions();
595
+ });
596
+
597
+ // Set canvas dimensions to match video
598
+ const videoWidth = video.videoWidth;
599
+ const videoHeight = video.videoHeight;
600
+
601
+ canvas.width = videoWidth;
602
+ canvas.height = videoHeight;
603
+ segmentationCanvas.width = videoWidth;
604
+ segmentationCanvas.height = videoHeight;
605
+
606
+ // Adjust container aspect ratio
607
+ videoContainer.style.aspectRatio = `${videoWidth}/${videoHeight}`;
608
+
609
+ // Show video and canvas
610
+ videoPlaceholder.classList.add('hidden');
611
+ video.classList.remove('hidden');
612
+ canvas.classList.remove('hidden');
613
+ segmentationCanvas.classList.remove('hidden');
614
+ detectionLegend.classList.remove('hidden');
615
+
616
+ // Update UI
617
+ isRunning = true;
618
+ startBtn.innerHTML = '<i class="fas fa-stop mr-2"></i> Stop Detection';
619
+ startBtn.classList.remove('bg-green-600', 'hover:bg-green-700');
620
+ startBtn.classList.add('bg-red-600', 'hover:bg-red-700');
621
+ webcamStatusBadge.className = 'status-badge ready';
622
+ webcamStatusBadge.innerHTML = '<i class="fas fa-check-circle mr-1"></i> Active';
623
+ log(`Webcam started (${videoWidth}x${videoHeight}) - beginning detection`);
624
+
625
+ // Start detection loop
626
+ detectionLoop();
627
+
628
+ } catch (error) {
629
+ log(`Error accessing webcam: ${error.message}`);
630
+ console.error(error);
631
+ webcamStatusBadge.className = 'status-badge error';
632
+ webcamStatusBadge.innerHTML = '<i class="fas fa-exclamation-circle mr-1"></i> Error';
633
+ }
634
+ });
635
+
636
+ // Non-maximum suppression for YOLO outputs
637
+ function nonMaxSuppression(boxes, scores, iouThreshold) {
638
+ const selectedIndices = [];
639
+ const areas = boxes.map(box => (box[2] - box[0]) * (box[3] - box[1]));
640
+
641
+ // Sort boxes by score (descending)
642
+ const scoreIndices = scores.map((score, index) => ({score, index}))
643
+ .sort((a, b) => b.score - a.score)
644
+ .map(obj => obj.index);
645
+
646
+ while (scoreIndices.length > 0) {
647
+ const current = scoreIndices.shift();
648
+ selectedIndices.push(current);
649
+
650
+ const currentBox = boxes[current];
651
+
652
+ // Calculate IoU with remaining boxes
653
+ const remainingBoxes = scoreIndices.map(i => boxes[i]);
654
+ const ious = remainingBoxes.map(box => {
655
+ const x1 = Math.max(currentBox[0], box[0]);
656
+ const y1 = Math.max(currentBox[1], box[1]);
657
+ const x2 = Math.min(currentBox[2], box[2]);
658
+ const y2 = Math.min(currentBox[3], box[3]);
659
+
660
+ const intersection = Math.max(0, x2 - x1) * Math.max(0, y2 - y1);
661
+ const union = areas[current] + areas[box] - intersection;
662
+
663
+ return intersection / union;
664
+ });
665
+
666
+ // Filter out boxes with high IoU
667
+ for (let i = ious.length - 1; i >= 0; i--) {
668
+ if (ious[i] > iouThreshold) {
669
+ scoreIndices.splice(i, 1);
670
+ }
671
+ }
672
+ }
673
+
674
+ return selectedIndices;
675
+ }
676
+
677
+ // Process YOLO output tensor (updated for YOLOv8 format)
678
+ function processYoloOutput(output, imgWidth, imgHeight) {
679
+ const confThreshold = parseFloat(confidenceThreshold.value);
680
+ const iouThresh = parseFloat(iouThreshold.value);
681
+
682
+ // Get the output tensor (YOLOv8 uses 'output0' for detections)
683
+ const outputTensor = output.output0;
684
+ const outputData = outputTensor.data;
685
+
686
+ // Clear previous debug output
687
+ debugOutput.innerHTML = '';
688
+
689
+ // Log raw tensor shape and first few values
690
+ debugLog(`Output tensor shape: [${outputTensor.dims.join(', ')}]`);
691
+ debugLog(`First 20 values: ${Array.from(outputData.slice(0, 20)).map(v => v.toFixed(2)).join(', ')}`);
692
+
693
+ // YOLOv8 output format: [batch, num_detections, 4 (box) + 1 (conf) + num_classes]
694
+ const numDetections = outputTensor.dims[1];
695
+ const numFeatures = outputTensor.dims[2];
696
+
697
+ debugLog(`Num detections: ${numDetections}, Num features: ${numFeatures}`);
698
+
699
+ // Extract boxes, scores, and class IDs
700
+ const boxes = [];
701
+ const scores = [];
702
+ const classIds = [];
703
+
704
+ for (let i = 0; i < numDetections; i++) {
705
+ const offset = i * numFeatures;
706
+
707
+ // Get box in (x1, y1, x2, y2) format (already normalized to [0,1])
708
+ const x1 = outputData[offset];
709
+ const y1 = outputData[offset + 1];
710
+ const x2 = outputData[offset + 2];
711
+ const y2 = outputData[offset + 3];
712
+
713
+ // Get confidence score
714
+ const conf = outputData[offset + 4];
715
+
716
+ // Find class with maximum probability
717
+ let maxScore = -1;
718
+ let classId = -1;
719
+
720
+ // Start from offset + 4 (skip box coordinates and objectness)
721
+ for (let j = 4; j < numFeatures; j++) {
722
+ const score = outputData[offset + j];
723
+ if (score > maxScore) {
724
+ maxScore = score;
725
+ classId = j - 4; // Subtract 4 because first 4 elements are box coordinates
726
+ }
727
+ }
728
+
729
+ // Calculate final score (objectness * class probability)
730
+ const finalScore = conf * maxScore;
731
+
732
+ // Filter by confidence threshold
733
+ if (finalScore > confThreshold) {
734
+ // Scale box coordinates to image dimensions
735
+ const scaledBox = [
736
+ x1 * imgWidth,
737
+ y1 * imgHeight,
738
+ x2 * imgWidth,
739
+ y2 * imgHeight
740
+ ];
741
+
742
+ boxes.push(scaledBox);
743
+ scores.push(finalScore);
744
+ classIds.push(classId);
745
+
746
+ // Log detection details
747
+ debugLog(`Detection ${i}: [${scaledBox.map(v => v.toFixed(1)).join(', ')}] score=${finalScore.toFixed(2)} class=${classId}`);
748
+ }
749
+ }
750
+
751
+ // Apply non-max suppression
752
+ const selectedIndices = nonMaxSuppression(boxes, scores, iouThresh);
753
+
754
+ // Prepare final detections
755
+ const detections = selectedIndices.map(idx => ({
756
+ box: boxes[idx],
757
+ score: scores[idx],
758
+ classId: classIds[idx],
759
+ mask: output.output1 ? getMaskForDetection(output.output1.data, idx, output.output1.dims) : null
760
+ }));
761
+
762
+ return detections;
763
+ }
764
+
765
+ // Extract mask for a specific detection
766
+ function getMaskForDetection(masksData, detectionIdx, maskShape) {
767
+ // maskShape: [1, mask_dim, mask_height, mask_width]
768
+ const maskDim = maskShape[1];
769
+ const maskHeight = maskShape[2];
770
+ const maskWidth = maskShape[3];
771
+
772
+ const mask = new Array(maskHeight * maskWidth).fill(0);
773
+
774
+ // For each pixel, find the channel with max value
775
+ for (let y = 0; y < maskHeight; y++) {
776
+ for (let x = 0; x < maskWidth; x++) {
777
+ let maxVal = -Infinity;
778
+ let bestChannel = 0;
779
+
780
+ for (let c = 0; c < maskDim; c++) {
781
+ const idx = (c * maskHeight * maskWidth) + (y * maskWidth) + x;
782
+ const val = masksData[detectionIdx * maskDim * maskHeight * maskWidth + idx];
783
+
784
+ if (val > maxVal) {
785
+ maxVal = val;
786
+ bestChannel = c;
787
+ }
788
+ }
789
+
790
+ mask[y * maskWidth + x] = bestChannel;
791
+ }
792
+ }
793
+
794
+ return {
795
+ data: mask,
796
+ width: maskWidth,
797
+ height: maskHeight
798
+ };
799
+ }
800
+
801
+ // Draw detections on canvas
802
+ function drawDetections(detections, imgWidth, imgHeight) {
803
+ // Clear previous drawings
804
+ ctx.clearRect(0, 0, canvas.width, canvas.height);
805
+ segCtx.clearRect(0, 0, segmentationCanvas.width, segmentationCanvas.height);
806
+
807
+ // Draw video frame
808
+ ctx.drawImage(video, 0, 0, canvas.width, canvas.height);
809
+
810
+ // Clear legend and rebuild
811
+ detectionLegend.innerHTML = '';
812
+ const legendItems = new Set();
813
+
814
+ // Draw each detection
815
+ detections.forEach(det => {
816
+ const [x1, y1, x2, y2] = det.box;
817
+ const width = x2 - x1;
818
+ const height = y2 - y1;
819
+ const className = classNames[det.classId] || `Class ${det.classId}`;
820
+ const color = classColors[det.classId] || '#3B82F6';
821
+
822
+ // Add to legend
823
+ if (!legendItems.has(det.classId)) {
824
+ legendItems.add(det.classId);
825
+
826
+ const legendItem = document.createElement('div');
827
+ legendItem.className = 'legend-item';
828
+ legendItem.innerHTML = `
829
+ <div class="legend-color" style="background-color: ${color};"></div>
830
+ <span>${className}</span>
831
+ <div class="confidence-bar">
832
+ <div class="confidence-fill" style="width: ${det.score * 100}%;"></div>
833
+ </div>
834
+ `;
835
+ detectionLegend.appendChild(legendItem);
836
+ }
837
+
838
+ // Draw mask if available and enabled
839
+ if (det.mask && showMasks && showMasks.checked) {
840
+ const mask = det.mask;
841
+ const scaleX = width / mask.width;
842
+ const scaleY = height / mask.height;
843
+
844
+ // Create a temporary canvas for the mask
845
+ const maskCanvas = document.createElement('canvas');
846
+ maskCanvas.width = mask.width;
847
+ maskCanvas.height = mask.height;
848
+ const maskCtx = maskCanvas.getContext('2d');
849
+
850
+ // Draw mask data
851
+ const maskImageData = maskCtx.createImageData(mask.width, mask.height);
852
+ for (let i = 0; i < mask.data.length; i++) {
853
+ if (mask.data[i] > 0) { // Only draw non-zero mask values
854
+ const idx = i * 4;
855
+ const [r, g, b] = hexToRgb(color);
856
+ maskImageData.data[idx] = r;
857
+ maskImageData.data[idx + 1] = g;
858
+ maskImageData.data[idx + 2] = b;
859
+ maskImageData.data[idx + 3] = 150; // Alpha
860
+ }
861
+ }
862
+ maskCtx.putImageData(maskImageData, 0, 0);
863
+
864
+ // Draw the mask on the segmentation canvas
865
+ segCtx.save();
866
+ segCtx.translate(x1, y1);
867
+ segCtx.scale(scaleX, scaleY);
868
+ segCtx.drawImage(maskCanvas, 0, 0);
869
+ segCtx.restore();
870
+ }
871
+
872
+ // Draw bounding box
873
+ ctx.strokeStyle = color;
874
+ ctx.lineWidth = 2;
875
+ ctx.strokeRect(x1, y1, width, height);
876
+
877
+ // Draw label background
878
+ const label = `${className} ${(det.score * 100).toFixed(1)}%`;
879
+ const textWidth = ctx.measureText(label).width;
880
+
881
+ ctx.fillStyle = color;
882
+ ctx.fillRect(x1 - 2, y1 - 20, textWidth + 4, 20);
883
+
884
+ // Draw label text
885
+ ctx.fillStyle = 'white';
886
+ ctx.font = '12px Arial';
887
+ ctx.fillText(label, x1, y1 - 5);
888
+ });
889
+ }
890
+
891
+ // Helper to convert hex to RGB
892
+ function hexToRgb(hex) {
893
+ const result = /^#?([a-f\d]{2})([a-f\d]{2})([a-f\d]{2})$/i.exec(hex);
894
+ return result ? [
895
+ parseInt(result[1], 16),
896
+ parseInt(result[2], 16),
897
+ parseInt(result[3], 16)
898
+ ] : [0, 0, 0];
899
+ }
900
+
901
+ // Detection loop
902
+ async function detectionLoop() {
903
+ if (!isRunning) return;
904
+
905
+ const startTime = performance.now();
906
+
907
+ try {
908
+ // Preprocess frame
909
+ const inputTensor = await preprocessFrame(video);
910
+
911
+ // Run inference
912
+ const feeds = { [session.inputNames[0]]: inputTensor };
913
+ const inferenceStart = performance.now();
914
+ const output = await session.run(feeds);
915
+ lastInferenceTime = performance.now() - inferenceStart;
916
+
917
+ // Process YOLO output
918
+ const detections = processYoloOutput(output, video.videoWidth, video.videoHeight);
919
+
920
+ // Draw detections
921
+ drawDetections(detections, video.videoWidth, video.videoHeight);
922
+
923
+ // Log detection info
924
+ if (detections.length > 0) {
925
+ const topDetection = detections[0];
926
+ const className = classNames[topDetection.classId] || `Class ${topDetection.classId}`;
927
+ log(`Detected ${detections.length} objects (top: ${className} @ ${(topDetection.score * 100).toFixed(1)}%)`);
928
+ }
929
+
930
+ // Update performance counters
931
+ frameCount++;
932
+ const now = performance.now();
933
+ if (now - lastFpsUpdate >= 1000) {
934
+ fps = frameCount * 1000 / (now - lastFpsUpdate);
935
+ frameCount = 0;
936
+ lastFpsUpdate = now;
937
+
938
+ // Update UI
939
+ fpsCounter.textContent = Math.round(fps);
940
+ inferenceTime.textContent = lastInferenceTime.toFixed(1);
941
+ }
942
+
943
+ } catch (error) {
944
+ log(`Detection error: ${error.message}`);
945
+ console.error(error);
946
+ }
947
+
948
+ // Schedule next frame
949
+ requestAnimationFrame(detectionLoop);
950
+ }
951
+
952
+ // Preprocess video frame for model input
953
+ async function preprocessFrame(videoElement) {
954
+ // Create temporary canvas
955
+ const tempCanvas = document.createElement('canvas');
956
+ tempCanvas.width = videoElement.videoWidth;
957
+ tempCanvas.height = videoElement.videoHeight;
958
+ const tempCtx = tempCanvas.getContext('2d');
959
+
960
+ // Draw video frame to canvas
961
+ tempCtx.drawImage(videoElement, 0, 0, tempCanvas.width, tempCanvas.height);
962
+
963
+ // Get image data
964
+ const imageData = tempCtx.getImageData(0, 0, tempCanvas.width, tempCanvas.height);
965
+
966
+ // Convert to Float32Array and normalize (assuming model expects [0,1] range)
967
+ const float32Data = new Float32Array(tempCanvas.width * tempCanvas.height * 3);
968
+
969
+ // Convert from RGBA to RGB and normalize
970
+ for (let i = 0, j = 0; i < imageData.data.length; i += 4) {
971
+ float32Data[j++] = imageData.data[i] / 255.0; // R
972
+ float32Data[j++] = imageData.data[i + 1] / 255.0; // G
973
+ float32Data[j++] = imageData.data[i + 2] / 255.0; // B
974
+ }
975
+
976
+ // Convert from HWC to CHW format (channels first)
977
+ const chwData = new Float32Array(float32Data.length);
978
+ const channelSize = tempCanvas.width * tempCanvas.height;
979
+
980
+ for (let c = 0; c < 3; ++c) {
981
+ for (let i = 0; i < channelSize; ++i) {
982
+ chwData[c * channelSize + i] = float32Data[i * 3 + c];
983
+ }
984
+ }
985
+
986
+ // Create tensor with shape [1, 3, height, width]
987
+ return new ort.Tensor('float32', chwData, [1, 3, tempCanvas.height, tempCanvas.width]);
988
+ }
989
+
990
+ // Initialize class names (simplified COCO classes for demo)
991
+ function initClassNames() {
992
+ classNames = {
993
+ 0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane',
994
+ 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light',
995
+ 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench',
996
+ 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow',
997
+ 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack',
998
+ 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee',
999
+ 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat',
1000
+ 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket',
1001
+ 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife',
1002
+ 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich',
1003
+ 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza',
1004
+ 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant',
1005
+ 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop',
1006
+ 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave',
1007
+ 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book',
1008
+ 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier',
1009
+ 79: 'toothbrush'
1010
+ };
1011
+ }
1012
+
1013
+ // Initialize on page load
1014
+ window.addEventListener('DOMContentLoaded', () => {
1015
+ initClassNames();
1016
+ });
1017
+
1018
+ // Clean up on page unload
1019
+ window.addEventListener('beforeunload', () => {
1020
+ if (session) {
1021
+ // Clean up ONNX session if needed
1022
+ }
1023
+
1024
+ // Stop webcam stream
1025
+ if (video.srcObject) {
1026
+ video.srcObject.getTracks().forEach(track => track.stop());
1027
+ }
1028
+ });
1029
+ </script>
1030
+ <p style="border-radius: 8px; text-align: center; font-size: 12px; color: #fff; margin-top: 16px;position: fixed; left: 8px; bottom: 8px; z-index: 10; background: rgba(0, 0, 0, 0.8); padding: 4px 8px;">Made with <img src="https://enzostvs-deepsite.hf.space/logo.svg" alt="DeepSite Logo" style="width: 16px; height: 16px; vertical-align: middle;display:inline-block;margin-right:3px;filter:brightness(0) invert(1);"><a href="https://enzostvs-deepsite.hf.space" style="color: #fff;text-decoration: underline;" target="_blank" >DeepSite</a> - 🧬 <a href="https://enzostvs-deepsite.hf.space?remix=MaxLeft/yolo-detection-app" style="color: #fff;text-decoration: underline;" target="_blank" >Remix</a></p></body>
1031
+ </html>
prompts.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Make me a website which takes a YOLO model from a hugging face repository, accesses the users webcam, and then runs the model in real time on the webcam
2
+ This is a good start, but I want to be able to load a custom model from hugging face.
3
+ Can you update this to instead accept a folder location for a model in the 'tfjs' format, which I think should work with javascript. The user will need to identify that folder location. Also, please make the real time processing of the video visible along with the detection results.
4
+ Can you keep everything the same but adjust the model to use an ONNX model, as described by this repo?
5
+ I got this error:
6
+ I'm getting this error: Error loading model: session.inputs is undefined
7
+ I think its working but I can't tell. Can you update this to test on this image: https://img.freepik.com/premium-photo/exterior-white-residential-building-with-stairs-located-near-green-coniferous-trees-sidewalk_195114-54739.jpg after loading, and then display the results to confirm its working?
8
+ So the below code works in that it at least gets a tensor of results. Please review it for how it implements the onnx model. Then steal the good parts, which work, and adapt them to the current UI.
9
+ This code almost works, but its a bad UI. <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <title>Pure-browser ONNX + Webcam demo</title> <style> #view {position:relative;width:640px;height:640px} canvas,video {position:absolute;top:0;left:0} pre {font:14px monospace;margin-top:8px} </style> <!-- ONNX Runtime-Web (WASM + WebGL) --> <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script> </head> <body> <h3>1 . Load your ONNX file</h3> <input type="file" id="modelFile" accept=".onnx" /> <h3>2 . Grant webcam</h3> <button id="startBtn" disabled>Start demo</button> <div id="view" hidden> <video id="cam" autoplay playsinline muted></video> <canvas id="overlay" width="640" height="640"></canvas> </div> <pre id="log">waiting…</pre> <script type="module"> const log = m => document.getElementById('log').textContent = m; // ----------------------------------------------------- step 1: user selects model let modelBuffer = null; document.getElementById('modelFile').addEventListener('change', e => { const file = e.target.files[0]; if (!file) return; const reader = new FileReader(); reader.onload = ev => { modelBuffer = ev.target.result; log(`loaded ${file.name} (${(file.size/1e6).toFixed(1)} MB)`); document.getElementById('startBtn').disabled = false; }; reader.readAsArrayBuffer(file); }); // ----------------------------------------------------- step 2: user clicks “Start” document.getElementById('startBtn').addEventListener('click', async () => { try{ /* webcam */ const cam = document.getElementById('cam'); const stream = await navigator.mediaDevices.getUserMedia( {video:{width:640,height:640}}); cam.srcObject = stream; await cam.play(); /* show viewer */ document.getElementById('view').hidden = false; /* model */ log('initialising session…'); const session = await ort.InferenceSession.create( modelBuffer, {executionProviders:['wasm','webgl']}); log('model ready – running…'); /* helpers */ const tmp = new OffscreenCanvas(640,640); const tctx = tmp.getContext('2d',{willReadFrequently:true}); const cvs = document.getElementById('overlay'); const ctx = cvs.getContext('2d'); const area = 640*640; function toTensor(){ tctx.drawImage(cam,0,0,640,640); const img = tctx.getImageData(0,0,640,640).data; const arr = new Float32Array(area*3); for(let i=0,j=0;i<img.length;i+=4){ arr[j++] = img[i ]/255; // R arr[j++] = img[i+1]/255; // G arr[j++] = img[i+2]/255; // B } /* NHWC → NCHW */ const chw = new Float32Array(arr.length); for(let c=0;c<3;++c) for(let k=0;k<area;++k) chw[c*area+k] = arr[k*3+c]; return new ort.Tensor('float32', chw, [1,3,640,640]); } async function loop(){ const feeds = {[session.inputNames[0]]: toTensor()}; const t0 = performance.now(); const out = await session.run(feeds); const dt = (performance.now()-t0).toFixed(1); ctx.drawImage(cam,0,0,640,640); // simple preview // quick print of first 5 outputs from first tensor const oName = session.outputNames[0]; const sample = Array.from(out[oName].data).slice(0,5) .map(x=>x.toFixed(3)).join(', '); log(`${oName}: [${sample} …] | ${dt} ms`); requestAnimationFrame(loop); } loop(); }catch(err){ console.error(err); log(err.message); } }); </script> </body> </html>
10
+ This seems like its very close to working. However, the output is just the raw tensor of results. These need to be processed into boxes. Keep in mind that this is a segmentation YOLO.
11
+ We're incredibly close to having this working. However, the detections all seem to be lined up along the diagonal of the image. Is it possible there is an issue with how its being resized and set to the model?
12
+ We're incredibly close to having this working. However, the detections all seem to be lined up along the diagonal of the image. Is it possible there is an issue with how its being resized and set to the model?
13
+ We're incredibly close to having this working. However, the detections all seem to be lined up along the diagonal of the image. Is it possible there is an issue with how its being resized and set to the model?
14
+ This is working really well. I really like the UI and design, and basically evertyhing. I want to keep all that the same. However, I think the results of the model are being handelled incorrectly. I believe the model outputs results as YOLO exports give boxes as (cx, cy, w, h), and I think this is mistaking the object ID as its class.
15
+ This is working really well. I really like the UI and design, and basically evertyhing. I want to keep all that the same. However, I think the results of the model are being handelled incorrectly. I believe the model outputs results as YOLO exports give boxes as (cx, cy, w, h), and I think this is mistaking the object ID as its class.
16
+ So I'm still getting two errors: when I load the model I get a "Error loading model" error from the model selection panel. I dont know what that means. Also, when I run the detections I get "Detection error: showMasks is null"
17
+ We're extremely close. I think its still parsing class wrong. The numbers look like the box id's. Also, I think there is a resizing issue, and some of the boxes look like they are probably bigger than the web cam? So it doesn't look like they are being resized correctly. Also, I'm still getting the "Error loading model", but it also still works.
18
+ Something still isn't right with how the boxes are getting parsed. They are always arranged on an axis from the upper right to the lower left. Can we directly print the outputs so I can see how they look?