Upload folder using huggingface_hub
Browse files- README.md +63 -0
- agent_heads.bin +3 -0
- config.json +18 -0
- model.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags: [tool-calling, agent, tiny-llm, byte-level, on-device, from-scratch]
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# tiny-30m-byte — LocalAgent (28.32M params)
|
| 9 |
+
|
| 10 |
+
A **from-scratch, byte-level** tool-calling agent model from
|
| 11 |
+
[LocalAgent](https://github.com/sangbumchoi/localagent). Pure PyTorch, **28.32M params**,
|
| 12 |
+
trained on CPU. It pairs a tiny decoder (GQA + RoPE + SwiGLU) with a **dual head**
|
| 13 |
+
(tool-selection classifier + pointer/copy argument head) and **prompt-grounded constrained
|
| 14 |
+
decoding** for reliable tool calls across 21 tools (general assistant, the Claude Code /
|
| 15 |
+
Codex coding surface, and computer-use / productivity tools), including parallel two-call turns.
|
| 16 |
+
|
| 17 |
+
## Architecture
|
| 18 |
+
- vocab 256 (byte-level), d_model 512, layers 10, heads 8/2 (GQA), ffn 1408
|
| 19 |
+
- factorized embeddings: False
|
| 20 |
+
|
| 21 |
+
## Files
|
| 22 |
+
- `config.json` — `ModelConfig`
|
| 23 |
+
- `model.safetensors` / `pytorch_model.bin` — decoder weights
|
| 24 |
+
- `agent_heads.bin` — trained tool-selection + pointer heads (optional)
|
| 25 |
+
|
| 26 |
+
## What it can do (use cases)
|
| 27 |
+
One byte-level model that turns a natural-language turn into a grounded tool call — across an
|
| 28 |
+
assistant, a coding agent, computer-use/productivity apps, and **parallel two-call** turns:
|
| 29 |
+
|
| 30 |
+
| you say | it calls |
|
| 31 |
+
|---|---|
|
| 32 |
+
| "What's the weather in Cusco?" | `get_weather(city="Cusco")` |
|
| 33 |
+
| "What is 19 * 19 * 5?" | `calculator(expression="19*19*5")` |
|
| 34 |
+
| "Open the file bin/run.sh." | `read_file(path="bin/run.sh")` |
|
| 35 |
+
| "Grep for 'TODO'." | `grep_search(pattern="TODO")` |
|
| 36 |
+
| "Run the tests." | `run_tests()` |
|
| 37 |
+
| "Commit with message 'fix bug'." | `git_commit(message="fix bug")` |
|
| 38 |
+
| "Send an email to Greta." | `send_email(recipient="Greta")` |
|
| 39 |
+
| "Go to figma.com." | `open_url(url="figma.com")` |
|
| 40 |
+
| "Send a Slack message saying 'ship it'." | `slack_send(message="ship it")` |
|
| 41 |
+
| "Create a Jira ticket titled 'broken link'." | `jira_issue(summary="broken link")` |
|
| 42 |
+
| "Compose an email to Judy **and** search for how tall is Everest." | `send_email(recipient="Judy")` + `web_search(query="how tall is Everest")` |
|
| 43 |
+
|
| 44 |
+
Multi-turn coding (grounds a follow-up arg from a tool response):
|
| 45 |
+
`read_file(tests/test_api.py)` → result → `run_tests()` → "FAILED…" → fix.
|
| 46 |
+
At catalog scale (100s–1000s of tools) selection is done by **retrieval** (top-k) instead of a
|
| 47 |
+
fixed head. See the [LocalAgent repo](https://github.com/sangbumchoi/localagent).
|
| 48 |
+
|
| 49 |
+
## Load (pure PyTorch, no transformers)
|
| 50 |
+
```python
|
| 51 |
+
import json, torch
|
| 52 |
+
from huggingface_hub import hf_hub_download
|
| 53 |
+
from localagent.model import LocalAgentLM, ModelConfig
|
| 54 |
+
|
| 55 |
+
cfg_d = json.load(open(hf_hub_download("danelcsb/localagent-tiny-30m-byte", "config.json")))
|
| 56 |
+
cfg = ModelConfig(**{k: v for k, v in cfg_d.items() if k in ModelConfig.__dataclass_fields__})
|
| 57 |
+
model = LocalAgentLM(cfg)
|
| 58 |
+
from safetensors.torch import load_file
|
| 59 |
+
model.load_state_dict(load_file(hf_hub_download("danelcsb/localagent-tiny-30m-byte", "model.safetensors")))
|
| 60 |
+
model.eval()
|
| 61 |
+
```
|
| 62 |
+
See the LocalAgent repo for the grounded decoder / agent runtime (tool head, pointer head,
|
| 63 |
+
retrieval, parallel-call decode).
|
agent_heads.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3ff11474f1e307741d924d6b3bda02903813d5a9d14652e450fbab8b1e8bbd5b
|
| 3 |
+
size 2179973
|
config.json
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "localagent",
|
| 3 |
+
"architecture": "LocalAgentLM (byte-level GQA+RoPE+SwiGLU)",
|
| 4 |
+
"name": "tiny-30m-byte",
|
| 5 |
+
"vocab_size": 256,
|
| 6 |
+
"d_model": 512,
|
| 7 |
+
"embed_dim": 512,
|
| 8 |
+
"n_layers": 10,
|
| 9 |
+
"n_loops": 1,
|
| 10 |
+
"n_heads": 8,
|
| 11 |
+
"n_kv_heads": 2,
|
| 12 |
+
"ffn_hidden": 1408,
|
| 13 |
+
"max_seq_len": 1024,
|
| 14 |
+
"rope_theta": 10000.0,
|
| 15 |
+
"norm_eps": 1e-05,
|
| 16 |
+
"tie_embeddings": true,
|
| 17 |
+
"dropout": 0.0
|
| 18 |
+
}
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:40080f461c6c061150674f5dae2bf0079d2eb2d0812da4e503de0be4dc7af982
|
| 3 |
+
size 113297896
|