yolo-detection-app / prompts.txt
MaxLeft's picture
Add 3 files
91f934f verified
Make me a website which takes a YOLO model from a hugging face repository, accesses the users webcam, and then runs the model in real time on the webcam
This is a good start, but I want to be able to load a custom model from hugging face.
Can you update this to instead accept a folder location for a model in the 'tfjs' format, which I think should work with javascript. The user will need to identify that folder location. Also, please make the real time processing of the video visible along with the detection results.
Can you keep everything the same but adjust the model to use an ONNX model, as described by this repo?
I got this error:
I'm getting this error: Error loading model: session.inputs is undefined
I think its working but I can't tell. Can you update this to test on this image: https://img.freepik.com/premium-photo/exterior-white-residential-building-with-stairs-located-near-green-coniferous-trees-sidewalk_195114-54739.jpg after loading, and then display the results to confirm its working?
So the below code works in that it at least gets a tensor of results. Please review it for how it implements the onnx model. Then steal the good parts, which work, and adapt them to the current UI.
This code almost works, but its a bad UI. <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <title>Pure-browser ONNX + Webcam demo</title> <style> #view {position:relative;width:640px;height:640px} canvas,video {position:absolute;top:0;left:0} pre {font:14px monospace;margin-top:8px} </style> <!-- ONNX Runtime-Web (WASM + WebGL) --> <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script> </head> <body> <h3>1 . Load your ONNX file</h3> <input type="file" id="modelFile" accept=".onnx" /> <h3>2 . Grant webcam</h3> <button id="startBtn" disabled>Start demo</button> <div id="view" hidden> <video id="cam" autoplay playsinline muted></video> <canvas id="overlay" width="640" height="640"></canvas> </div> <pre id="log">waiting…</pre> <script type="module"> const log = m => document.getElementById('log').textContent = m; // ----------------------------------------------------- step 1: user selects model let modelBuffer = null; document.getElementById('modelFile').addEventListener('change', e => { const file = e.target.files[0]; if (!file) return; const reader = new FileReader(); reader.onload = ev => { modelBuffer = ev.target.result; log(`loaded ${file.name} (${(file.size/1e6).toFixed(1)} MB)`); document.getElementById('startBtn').disabled = false; }; reader.readAsArrayBuffer(file); }); // ----------------------------------------------------- step 2: user clicks “Start” document.getElementById('startBtn').addEventListener('click', async () => { try{ /* webcam */ const cam = document.getElementById('cam'); const stream = await navigator.mediaDevices.getUserMedia( {video:{width:640,height:640}}); cam.srcObject = stream; await cam.play(); /* show viewer */ document.getElementById('view').hidden = false; /* model */ log('initialising session…'); const session = await ort.InferenceSession.create( modelBuffer, {executionProviders:['wasm','webgl']}); log('model ready – running…'); /* helpers */ const tmp = new OffscreenCanvas(640,640); const tctx = tmp.getContext('2d',{willReadFrequently:true}); const cvs = document.getElementById('overlay'); const ctx = cvs.getContext('2d'); const area = 640*640; function toTensor(){ tctx.drawImage(cam,0,0,640,640); const img = tctx.getImageData(0,0,640,640).data; const arr = new Float32Array(area*3); for(let i=0,j=0;i<img.length;i+=4){ arr[j++] = img[i ]/255; // R arr[j++] = img[i+1]/255; // G arr[j++] = img[i+2]/255; // B } /* NHWC → NCHW */ const chw = new Float32Array(arr.length); for(let c=0;c<3;++c) for(let k=0;k<area;++k) chw[c*area+k] = arr[k*3+c]; return new ort.Tensor('float32', chw, [1,3,640,640]); } async function loop(){ const feeds = {[session.inputNames[0]]: toTensor()}; const t0 = performance.now(); const out = await session.run(feeds); const dt = (performance.now()-t0).toFixed(1); ctx.drawImage(cam,0,0,640,640); // simple preview // quick print of first 5 outputs from first tensor const oName = session.outputNames[0]; const sample = Array.from(out[oName].data).slice(0,5) .map(x=>x.toFixed(3)).join(', '); log(`${oName}: [${sample} …] | ${dt} ms`); requestAnimationFrame(loop); } loop(); }catch(err){ console.error(err); log(err.message); } }); </script> </body> </html>
This seems like its very close to working. However, the output is just the raw tensor of results. These need to be processed into boxes. Keep in mind that this is a segmentation YOLO.
We're incredibly close to having this working. However, the detections all seem to be lined up along the diagonal of the image. Is it possible there is an issue with how its being resized and set to the model?
We're incredibly close to having this working. However, the detections all seem to be lined up along the diagonal of the image. Is it possible there is an issue with how its being resized and set to the model?
We're incredibly close to having this working. However, the detections all seem to be lined up along the diagonal of the image. Is it possible there is an issue with how its being resized and set to the model?
This is working really well. I really like the UI and design, and basically evertyhing. I want to keep all that the same. However, I think the results of the model are being handelled incorrectly. I believe the model outputs results as YOLO exports give boxes as (cx, cy, w, h), and I think this is mistaking the object ID as its class.
This is working really well. I really like the UI and design, and basically evertyhing. I want to keep all that the same. However, I think the results of the model are being handelled incorrectly. I believe the model outputs results as YOLO exports give boxes as (cx, cy, w, h), and I think this is mistaking the object ID as its class.
So I'm still getting two errors: when I load the model I get a "Error loading model" error from the model selection panel. I dont know what that means. Also, when I run the detections I get "Detection error: showMasks is null"
We're extremely close. I think its still parsing class wrong. The numbers look like the box id's. Also, I think there is a resizing issue, and some of the boxes look like they are probably bigger than the web cam? So it doesn't look like they are being resized correctly. Also, I'm still getting the "Error loading model", but it also still works.
Something still isn't right with how the boxes are getting parsed. They are always arranged on an axis from the upper right to the lower left. Can we directly print the outputs so I can see how they look?