| --- |
| license: mit |
| library_name: pytorch |
| tags: [tool-calling, agent, tiny-llm, byte-level, on-device, from-scratch] |
| pipeline_tag: text-generation |
| --- |
| |
| # ultra-tiny-1m — LocalAgent (0.98M params) |
|
|
| A **from-scratch, byte-level** tool-calling agent model from |
| [LocalAgent](https://github.com/sangbumchoi/localagent). Pure PyTorch, **0.98M params**, |
| trained on CPU. It pairs a tiny decoder (GQA + RoPE + SwiGLU + depth-recurrence) with a **dual head** |
| (tool-selection classifier + pointer/copy argument head) and **prompt-grounded constrained |
| decoding** for reliable tool calls across 21 tools (general assistant, the Claude Code / |
| Codex coding surface, and computer-use / productivity tools), including parallel two-call turns. |
|
|
| ## Architecture |
| - vocab 256 (byte-level), d_model 192, layers 2 x6 loops, heads 6/2 (GQA), ffn 640 |
| - factorized embeddings: True |
| |
| ## Files |
| - `config.json` — `ModelConfig` |
| - `model.safetensors` / `pytorch_model.bin` — decoder weights |
| - `agent_heads.bin` — trained tool-selection + pointer heads (optional) |
|
|
| ## What it can do (use cases) |
| One byte-level model that turns a natural-language turn into a grounded tool call — across an |
| assistant, a coding agent, computer-use/productivity apps, and **parallel two-call** turns: |
|
|
| | you say | it calls | |
| |---|---| |
| | "What's the weather in Cusco?" | `get_weather(city="Cusco")` | |
| | "What is 19 * 19 * 5?" | `calculator(expression="19*19*5")` | |
| | "Open the file bin/run.sh." | `read_file(path="bin/run.sh")` | |
| | "Grep for 'TODO'." | `grep_search(pattern="TODO")` | |
| | "Run the tests." | `run_tests()` | |
| | "Commit with message 'fix bug'." | `git_commit(message="fix bug")` | |
| | "Send an email to Greta." | `send_email(recipient="Greta")` | |
| | "Go to figma.com." | `open_url(url="figma.com")` | |
| | "Send a Slack message saying 'ship it'." | `slack_send(message="ship it")` | |
| | "Create a Jira ticket titled 'broken link'." | `jira_issue(summary="broken link")` | |
| | "Compose an email to Judy **and** search for how tall is Everest." | `send_email(recipient="Judy")` + `web_search(query="how tall is Everest")` | |
|
|
| Multi-turn coding (grounds a follow-up arg from a tool response): |
| `read_file(tests/test_api.py)` → result → `run_tests()` → "FAILED…" → fix. |
| At catalog scale (100s–1000s of tools) selection is done by **retrieval** (top-k) instead of a |
| fixed head. See the [LocalAgent repo](https://github.com/sangbumchoi/localagent). |
|
|
| ## Load (pure PyTorch, no transformers) |
| ```python |
| import json, torch |
| from huggingface_hub import hf_hub_download |
| from localagent.model import LocalAgentLM, ModelConfig |
| |
| cfg_d = json.load(open(hf_hub_download("danelcsb/localagent-ultra-tiny-1m", "config.json"))) |
| cfg = ModelConfig(**{k: v for k, v in cfg_d.items() if k in ModelConfig.__dataclass_fields__}) |
| model = LocalAgentLM(cfg) |
| from safetensors.torch import load_file |
| model.load_state_dict(load_file(hf_hub_download("danelcsb/localagent-ultra-tiny-1m", "model.safetensors"))) |
| model.eval() |
| ``` |
| See the LocalAgent repo for the grounded decoder / agent runtime (tool head, pointer head, |
| retrieval, parallel-call decode). |
|
|