danelcsb
/

localagent-ultra-tiny-1m

Text Generation

Model card Files Files and versions

localagent-ultra-tiny-1m / README.md

danelcsb's picture

Upload folder using huggingface_hub

ec69483 verified 1 day ago

|

history blame contribute delete

3.08 kB

	---
	license: mit
	library_name: pytorch
	tags: [tool-calling, agent, tiny-llm, byte-level, on-device, from-scratch]
	pipeline_tag: text-generation
	---

	# ultra-tiny-1m — LocalAgent (0.98M params)

	A from-scratch, byte-level tool-calling agent model from
	[LocalAgent](https://github.com/sangbumchoi/localagent). Pure PyTorch, 0.98M params,
	trained on CPU. It pairs a tiny decoder (GQA + RoPE + SwiGLU + depth-recurrence) with a dual head
	(tool-selection classifier + pointer/copy argument head) and **prompt-grounded constrained
	decoding** for reliable tool calls across 21 tools (general assistant, the Claude Code /
	Codex coding surface, and computer-use / productivity tools), including parallel two-call turns.

	## Architecture
	- vocab 256 (byte-level), d_model 192, layers 2 x6 loops, heads 6/2 (GQA), ffn 640
	- factorized embeddings: True

	## Files
	- `config.json` — `ModelConfig`
	- `model.safetensors` / `pytorch_model.bin` — decoder weights
	- `agent_heads.bin` — trained tool-selection + pointer heads (optional)

	## What it can do (use cases)
	One byte-level model that turns a natural-language turn into a grounded tool call — across an
	assistant, a coding agent, computer-use/productivity apps, and parallel two-call turns:

	\| you say \| it calls \|
	\|---\|---\|
	\| "What's the weather in Cusco?" \| `get_weather(city="Cusco")` \|
	\| "What is 19 * 19 * 5?" \| `calculator(expression="19195")` \|
	\| "Open the file bin/run.sh." \| `read_file(path="bin/run.sh")` \|
	\| "Grep for 'TODO'." \| `grep_search(pattern="TODO")` \|
	\| "Run the tests." \| `run_tests()` \|
	\| "Commit with message 'fix bug'." \| `git_commit(message="fix bug")` \|
	\| "Send an email to Greta." \| `send_email(recipient="Greta")` \|
	\| "Go to figma.com." \| `open_url(url="figma.com")` \|
	\| "Send a Slack message saying 'ship it'." \| `slack_send(message="ship it")` \|
	\| "Create a Jira ticket titled 'broken link'." \| `jira_issue(summary="broken link")` \|
	\| "Compose an email to Judy and search for how tall is Everest." \| `send_email(recipient="Judy")` + `web_search(query="how tall is Everest")` \|

	Multi-turn coding (grounds a follow-up arg from a tool response):
	`read_file(tests/test_api.py)` → result → `run_tests()` → "FAILED…" → fix.
	At catalog scale (100s–1000s of tools) selection is done by retrieval (top-k) instead of a
	fixed head. See the [LocalAgent repo](https://github.com/sangbumchoi/localagent).

	## Load (pure PyTorch, no transformers)
	```python
	import json, torch
	from huggingface_hub import hf_hub_download
	from localagent.model import LocalAgentLM, ModelConfig

	cfg_d = json.load(open(hf_hub_download("danelcsb/localagent-ultra-tiny-1m", "config.json")))
	cfg = ModelConfig(**{k: v for k, v in cfg_d.items() if k in ModelConfig.__dataclass_fields__})
	model = LocalAgentLM(cfg)
	from safetensors.torch import load_file
	model.load_state_dict(load_file(hf_hub_download("danelcsb/localagent-ultra-tiny-1m", "model.safetensors")))
	model.eval()
	```
	See the LocalAgent repo for the grounded decoder / agent runtime (tool head, pointer head,
	retrieval, parallel-call decode).