Reza2kn's picture
In-browser WebGPU LocateAnything-3B (INT4 + KV cache + custom 4-bit embedding gather)
08c93d8 verified
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>LocateAnything-3B · In-Browser WebGPU (INT4)</title>
<style>
:root { color-scheme: dark; }
* { box-sizing: border-box; }
body { margin: 0; font-family: ui-sans-serif, system-ui, -apple-system, "Segoe UI", Roboto, sans-serif;
background: #0b0f17; color: #e6edf3; }
header { padding: 18px 22px; border-bottom: 1px solid #1d2633; }
header h1 { margin: 0; font-size: 18px; font-weight: 650; }
header p { margin: 6px 0 0; color: #9bb0c9; font-size: 13px; }
header a { color: #6cb6ff; text-decoration: none; }
main { display: grid; grid-template-columns: 340px 1fr; gap: 0; min-height: calc(100vh - 70px); }
.panel { padding: 18px 20px; }
.left { border-right: 1px solid #1d2633; }
label { display: block; font-size: 12px; color: #9bb0c9; margin: 14px 0 6px; text-transform: uppercase; letter-spacing: .04em; }
input[type=text] { width: 100%; padding: 9px 11px; background: #0f1622; border: 1px solid #25303f;
border-radius: 8px; color: #e6edf3; font-size: 14px; }
.samples { display: flex; gap: 8px; flex-wrap: wrap; }
.samples img { width: 66px; height: 66px; object-fit: cover; border-radius: 8px; cursor: pointer;
border: 2px solid transparent; }
.samples img:hover, .samples img.sel { border-color: #6cb6ff; }
button { cursor: pointer; border: 0; border-radius: 8px; font-size: 14px; font-weight: 600; }
#run { width: 100%; padding: 12px; margin-top: 16px; background: #2f81f7; color: #fff; }
#run:disabled { background: #2a3645; color: #7d8da0; cursor: not-allowed; }
.row { display: flex; align-items: center; gap: 10px; }
.badge { font-size: 12px; padding: 3px 8px; border-radius: 999px; background: #1b2433; color: #9bb0c9; }
.badge.ok { background: #14331f; color: #6ee7a0; }
.badge.warn { background: #3a2a12; color: #f5c451; }
.badge.err { background: #3a1717; color: #ff8585; }
#stage { position: relative; display: inline-block; max-width: 100%; }
canvas { max-width: 100%; height: auto; border-radius: 10px; background: #0f1622; }
#log { margin-top: 14px; font-family: ui-monospace, SFMono-Regular, Menlo, monospace; font-size: 12px;
white-space: pre-wrap; color: #93c2ff; max-height: 180px; overflow:auto;
background:#0d131e; border:1px solid #1d2633; border-radius:8px; padding:10px; }
#raw { margin-top:10px; font-family: ui-monospace, monospace; font-size:12px; color:#c8d6e5;
word-break: break-all; }
.muted { color:#7d8da0; font-size:12px; }
input[type=range] { width: 100%; }
progress { width: 100%; height: 8px; }
</style>
</head>
<body>
<header>
<h1>LocateAnything-3B — fully in-browser, WebGPU, INT4</h1>
<p>Open-vocabulary detection running 100% client-side via
<a href="https://onnxruntime.ai/docs/tutorials/web/" target="_blank">onnxruntime-web</a> (WebGPU).
Model: <a href="https://huggingface.co/Reza2kn/LocateAnything-3B-ONNX-WebGPU-INT4" target="_blank">Reza2kn/LocateAnything-3B-ONNX-WebGPU-INT4</a>
· source <a href="https://huggingface.co/nvidia/LocateAnything-3B" target="_blank">nvidia/LocateAnything-3B</a>.
INT4 language tower + custom 4-bit embedding gather + KV cache. No server inference.</p>
</header>
<main>
<section class="panel left">
<div class="row"><span id="gpu" class="badge">checking WebGPU…</span><span id="load" class="badge">model not loaded</span></div>
<progress id="prog" value="0" max="100" style="display:none"></progress>
<label>Sample images</label>
<div class="samples" id="samples"></div>
<label>Or upload your own</label>
<input type="file" id="file" accept="image/*" />
<label>Category prompt</label>
<input type="text" id="cat" value="person" placeholder="e.g. person, dog, red car" />
<label>Max new tokens: <span id="mntv">96</span></label>
<input type="range" id="mnt" min="16" max="256" step="8" value="96" />
<button id="run" disabled>Detect</button>
<div id="log"></div>
</section>
<section class="panel">
<div id="stage"><canvas id="cv" width="640" height="480"></canvas></div>
<div class="muted">Decoded output</div>
<div id="raw"></div>
</section>
</main>
<script type="module" src="./app.js"></script>
</body>
</html>