Upload folder using huggingface_hub

ec69483 verified 1 day ago

3.08 kB

license: mit
library_name: pytorch
tags:
  - tool-calling
  - agent
  - tiny-llm
  - byte-level
  - on-device
  - from-scratch
pipeline_tag: text-generation

ultra-tiny-1m — LocalAgent (0.98M params)

A from-scratch, byte-level tool-calling agent model from LocalAgent. Pure PyTorch, 0.98M params, trained on CPU. It pairs a tiny decoder (GQA + RoPE + SwiGLU + depth-recurrence) with a dual head (tool-selection classifier + pointer/copy argument head) and prompt-grounded constrained decoding for reliable tool calls across 21 tools (general assistant, the Claude Code / Codex coding surface, and computer-use / productivity tools), including parallel two-call turns.

Architecture

vocab 256 (byte-level), d_model 192, layers 2 x6 loops, heads 6/2 (GQA), ffn 640
factorized embeddings: True

Files

config.json — ModelConfig
model.safetensors / pytorch_model.bin — decoder weights
agent_heads.bin — trained tool-selection + pointer heads (optional)

What it can do (use cases)

One byte-level model that turns a natural-language turn into a grounded tool call — across an assistant, a coding agent, computer-use/productivity apps, and parallel two-call turns:

you say	it calls
"What's the weather in Cusco?"	`get_weather(city="Cusco")`
"What is 19 * 19 * 5?"	`calculator(expression="19195")`
"Open the file bin/run.sh."	`read_file(path="bin/run.sh")`
"Grep for 'TODO'."	`grep_search(pattern="TODO")`
"Run the tests."	`run_tests()`
"Commit with message 'fix bug'."	`git_commit(message="fix bug")`
"Send an email to Greta."	`send_email(recipient="Greta")`
"Go to figma.com."	`open_url(url="figma.com")`
"Send a Slack message saying 'ship it'."	`slack_send(message="ship it")`
"Create a Jira ticket titled 'broken link'."	`jira_issue(summary="broken link")`
"Compose an email to Judy and search for how tall is Everest."	`send_email(recipient="Judy")` + `web_search(query="how tall is Everest")`

Multi-turn coding (grounds a follow-up arg from a tool response): read_file(tests/test_api.py) → result → run_tests() → "FAILED…" → fix. At catalog scale (100s–1000s of tools) selection is done by retrieval (top-k) instead of a fixed head. See the LocalAgent repo.

Load (pure PyTorch, no transformers)

import json, torch
from huggingface_hub import hf_hub_download
from localagent.model import LocalAgentLM, ModelConfig

cfg_d = json.load(open(hf_hub_download("danelcsb/localagent-ultra-tiny-1m", "config.json")))
cfg = ModelConfig(**{k: v for k, v in cfg_d.items() if k in ModelConfig.__dataclass_fields__})
model = LocalAgentLM(cfg)
from safetensors.torch import load_file
model.load_state_dict(load_file(hf_hub_download("danelcsb/localagent-ultra-tiny-1m", "model.safetensors")))
model.eval()

See the LocalAgent repo for the grounded decoder / agent runtime (tool head, pointer head, retrieval, parallel-call decode).