File size: 4,370 Bytes
46f2476
08c93d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46f2476
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>LocateAnything-3B · In-Browser WebGPU (INT4)</title>
<style>
  :root { color-scheme: dark; }
  * { box-sizing: border-box; }
  body { margin: 0; font-family: ui-sans-serif, system-ui, -apple-system, "Segoe UI", Roboto, sans-serif;
         background: #0b0f17; color: #e6edf3; }
  header { padding: 18px 22px; border-bottom: 1px solid #1d2633; }
  header h1 { margin: 0; font-size: 18px; font-weight: 650; }
  header p { margin: 6px 0 0; color: #9bb0c9; font-size: 13px; }
  header a { color: #6cb6ff; text-decoration: none; }
  main { display: grid; grid-template-columns: 340px 1fr; gap: 0; min-height: calc(100vh - 70px); }
  .panel { padding: 18px 20px; }
  .left { border-right: 1px solid #1d2633; }
  label { display: block; font-size: 12px; color: #9bb0c9; margin: 14px 0 6px; text-transform: uppercase; letter-spacing: .04em; }
  input[type=text] { width: 100%; padding: 9px 11px; background: #0f1622; border: 1px solid #25303f;
                     border-radius: 8px; color: #e6edf3; font-size: 14px; }
  .samples { display: flex; gap: 8px; flex-wrap: wrap; }
  .samples img { width: 66px; height: 66px; object-fit: cover; border-radius: 8px; cursor: pointer;
                 border: 2px solid transparent; }
  .samples img:hover, .samples img.sel { border-color: #6cb6ff; }
  button { cursor: pointer; border: 0; border-radius: 8px; font-size: 14px; font-weight: 600; }
  #run { width: 100%; padding: 12px; margin-top: 16px; background: #2f81f7; color: #fff; }
  #run:disabled { background: #2a3645; color: #7d8da0; cursor: not-allowed; }
  .row { display: flex; align-items: center; gap: 10px; }
  .badge { font-size: 12px; padding: 3px 8px; border-radius: 999px; background: #1b2433; color: #9bb0c9; }
  .badge.ok { background: #14331f; color: #6ee7a0; }
  .badge.warn { background: #3a2a12; color: #f5c451; }
  .badge.err { background: #3a1717; color: #ff8585; }
  #stage { position: relative; display: inline-block; max-width: 100%; }
  canvas { max-width: 100%; height: auto; border-radius: 10px; background: #0f1622; }
  #log { margin-top: 14px; font-family: ui-monospace, SFMono-Regular, Menlo, monospace; font-size: 12px;
         white-space: pre-wrap; color: #93c2ff; max-height: 180px; overflow:auto;
         background:#0d131e; border:1px solid #1d2633; border-radius:8px; padding:10px; }
  #raw { margin-top:10px; font-family: ui-monospace, monospace; font-size:12px; color:#c8d6e5;
         word-break: break-all; }
  .muted { color:#7d8da0; font-size:12px; }
  input[type=range] { width: 100%; }
  progress { width: 100%; height: 8px; }
</style>
</head>
<body>
<header>
  <h1>LocateAnything-3B — fully in-browser, WebGPU, INT4</h1>
  <p>Open-vocabulary detection running 100% client-side via
     <a href="https://onnxruntime.ai/docs/tutorials/web/" target="_blank">onnxruntime-web</a> (WebGPU).
     Model: <a href="https://huggingface.co/Reza2kn/LocateAnything-3B-ONNX-WebGPU-INT4" target="_blank">Reza2kn/LocateAnything-3B-ONNX-WebGPU-INT4</a>
     · source <a href="https://huggingface.co/nvidia/LocateAnything-3B" target="_blank">nvidia/LocateAnything-3B</a>.
     INT4 language tower + custom 4-bit embedding gather + KV cache. No server inference.</p>
</header>
<main>
  <section class="panel left">
    <div class="row"><span id="gpu" class="badge">checking WebGPU…</span><span id="load" class="badge">model not loaded</span></div>
    <progress id="prog" value="0" max="100" style="display:none"></progress>

    <label>Sample images</label>
    <div class="samples" id="samples"></div>

    <label>Or upload your own</label>
    <input type="file" id="file" accept="image/*" />

    <label>Category prompt</label>
    <input type="text" id="cat" value="person" placeholder="e.g. person, dog, red car" />

    <label>Max new tokens: <span id="mntv">96</span></label>
    <input type="range" id="mnt" min="16" max="256" step="8" value="96" />

    <button id="run" disabled>Detect</button>
    <div id="log"></div>
  </section>

  <section class="panel">
    <div id="stage"><canvas id="cv" width="640" height="480"></canvas></div>
    <div class="muted">Decoded output</div>
    <div id="raw"></div>
  </section>
</main>
<script type="module" src="./app.js"></script>
</body>
</html>