Spaces:
Running
Running
| title: LocalAgent Tool Calling (WebGPU) | |
| emoji: 🛠️ | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: static | |
| pinned: false | |
| license: mit | |
| short_description: Sub-100M from-scratch tool-calling agent in the browser | |
| # LocalAgent — tool calling in the browser (WebGPU) | |
| A **28M-parameter, pretrained-from-scratch** byte-level agent that does **grounded tool | |
| calling** and **multi-step planning** — running **entirely in your browser** on | |
| [onnxruntime-web](https://onnxruntime.ai/docs/tutorials/web/) with the **WebGPU** backend | |
| (WASM fallback when WebGPU is unavailable). No server, no API key; the model is downloaded once | |
| and cached. | |
| Model: [`danelcsb/localagent-tiny-30m-byte`](https://huggingface.co/danelcsb/localagent-tiny-30m-byte). | |
| Source: [LocalAgent](https://github.com/sangbumchoi/localagent). | |
| ## What it shows (generable dispatch — no fixed-N classifier) | |
| - **Route gate** — a 5-way head (`web_search / computer_use / code / app_action / text`) on the ONNX | |
| `hidden` output; the `text` route is **abstention** (answer directly / no tool). | |
| - **Tool selection** — a **dense two-tower selector**: the query tower projects `hidden`, scored by | |
| cosine against a precomputed per-tool description-embedding matrix over the **50-tool** surface | |
| (`argmax_j q·tool_matrix[j]`). Adding/removing a tool is adding/removing a row — no retraining. | |
| - **Grounded arguments** — copied from spans of your prompt via the learned pointer head, so the | |
| emitted call is schema-valid by construction. | |
| - **Multi-step plans** — the rollout: pick a tool → ground it → feed back a simulated response → | |
| pick the next, until the route head emits `text`. | |
| ## How it runs (honest version) | |
| The transformer forward pass runs on **WebGPU** via an exported ONNX graph that emits `logits` and | |
| the last `hidden` state. The **route head**, the **dense selector** (matmul + normalize + argmax over | |
| the precomputed tool matrix), the **pointer-copy** grounding, and the **planner loop** are light | |
| JavaScript on top — a faithful port of the Python `routes` / `dense_selector` / `pointer_head` | |
| pipeline (parity-checked at export: 100% argmax/top-1 agreement). First load fetches | |
| `model.fp16.onnx` (~57 MB) and caches it. | |
| ## Files | |
| - `index.html` / `style.css` — the UI shell. | |
| - `app.js` — byte tokenizer, onnxruntime-web session (WebGPU + WASM fallback), route+selector dispatch, | |
| grounding, and the planner rollout. | |
| - `model.fp16.onnx`, `heads.json`, `meta.json`, `dispatch_heads.json` — the exported inference | |
| bundle (**not in the source repo**; deploy artifacts). See `DEPLOY.md` for the exact commands. | |
| ## Deploy | |
| See **`DEPLOY.md`** for copy-paste build + push commands. In short: export the bundle from the latest | |
| checkpoint and upload the static app + the four bundle files into a `sdk: static` Space: | |
| ```bash | |
| python -c "from localagent.inference.export.to_onnx import export_web; \ | |
| export_web('runs/tiny-30m-scenarios-best.pt', 'build/web')" | |
| ``` | |
| `app.js` fetches `model.fp16.onnx` / `heads.json` / `meta.json` / `dispatch_heads.json` relative to | |
| the page, so they must sit next to `index.html`. Export is parity-checked vs PyTorch (max |Δlogits| | |
| 7.6e-6; route-head & dense-selector argmax/top-1 100% agreement). The graph is standard opset-17; | |
| onnxruntime-web falls back per-op to WASM for any op without a WebGPU kernel, with identical results. | |