File size: 5,280 Bytes
c489079 afe3f06 c489079 afe3f06 c489079 afe3f06 c489079 afe3f06 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | ---
license: apache-2.0
language: [en]
tags:
- structured-action-model
- json-generation
- text-to-json
- agentic-ai
- function-calling
- tool-use
- iot
- robotics
- workflow-automation
- sparse-transformer
- on-device
- edge-ai
pipeline_tag: text-generation
inference: false
library_name: pytorch
---
# SAM — Structured Action Model
**SAM** is a compact (33.4M params, ~127.4 MB FP32)
schema-conditioned model that turns natural language into structured JSON actions
across **10 domains**: robotics, HTTP/REST, MQTT/IoT, databases, workflows,
e-commerce, vehicles, smart home, calendar/email, and filesystem.
Built by **AMEFORGE** on the in-house **SparseMind** architecture.
> **SAM is the successor to [Foros](https://huggingface.co/AMEFORGE/foros-v5.3).**
> Where Foros specialized in robotics ROS-JSON, SAM generalizes the approach to
> the full agentic / workflow stack while preserving the SparseMind architecture.
---
## TL;DR
The cheap path to reliable JSON for agentic systems:
| | Today (LLM API) | With SAM |
|---|---|---|
| **Output reliability** | broken JSON → retry loop | atomic-numeric tokenizer + schema-conditioned |
| **Latency** | 500–3000 ms | ~30–200 ms (CPU) |
| **Cost / 1M calls** | $$$$ | $0 (offline) |
| **Deployment** | API key, cloud, privacy concerns | runs on Jetson, Pi, laptop CPU |
---
## Benchmark
Evaluated on the **SAM Bench v1** — 200 prompts covering all 10 domains across
5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain).
*(Benchmark not yet run. After training, execute `python sam_benchmark.py` to populate this section.)*
> Benchmark is fully reproducible — see [`sam_benchmark.py`](./sam_benchmark.py)
> or the [`AMFORGE/sam-bench`](https://huggingface.co/datasets/AMEFORGE/sam-bench)
> dataset if published.
---
## Input format (schema-conditioned)
```
<SCHEMA>{...JSON Schema...}</SCHEMA> <DOMAIN_TAG> <TASK>natural language</TASK> =>
```
Output: a JSON array of operations conforming to the schema.
### Domain tags
`<ROS>` `<HTTP>` `<MQTT>` `<DB>` `<WORKFLOW>` `<ECOMMERCE>` `<VEHICLE>` `<HOME>` `<CAL>` `<FILE>`
### Examples
| Input | Output |
|---|---|
| `<ROS><TASK>move to x=0.5 y=-1.2 z=0.8</TASK> =>` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` |
| `<HTTP><TASK>get user 42</TASK> =>` | `[{"op":"http_request","method":"GET","url":"/users/42"}]` |
| `<MQTT><TASK>publish temp 22 to home/livingroom/temp qos 1</TASK> =>` | `[{"op":"mqtt_publish","topic":"home/livingroom/temp","payload":{"value":22,"unit":"celsius"},"qos":1}]` |
| `<HOME><TASK>turn on bedroom light at 50% blue</TASK> =>` | `[{"op":"set_light","room":"bedroom","brightness":50,"color":"blue"}]` |
---
## Highlights
| Property | Value |
|---|---|
| Architecture | SparseMind (decoder-only) |
| Parameters | 33,400,324 (~33.4M) |
| Size (FP32) | ~127.4 MB (~31.9 MB INT8) |
| Context length | 1024 tokens |
| Tokenizer | [`AMEFORGE/sam_tokenizer`](https://huggingface.co/AMEFORGE/sam_tokenizer) (NexusBPE) |
| Precision | FP32 (INT8 quantization compatible) |
| Domains | 10 (robotics, HTTP, MQTT, DB, workflow, e-commerce, vehicle, home, calendar, file) |
| Deployment | CPU, GPU, edge (Jetson, Raspberry Pi) |
---
## Quick inference
Use the `sam_runtime.py` SDK for a clean inference path with optional
constrained decoding:
```python
from sam_runtime import SAM
sam = SAM.from_hub("AMFORGE/sam-v1") # downloads weights + tokenizer
result = sam.generate(
task="get user 42 from api.example.com",
domain="HTTP",
schema={"type": "array"},
mode="guarded", # JSON-validated decoding
)
print(result["ops"])
# -> [{"op":"http_request","method":"GET","url":"https://api.example.com/users/42"}]
```
For OpenAI-compatible tool calling, drop-in replacement:
```python
result = sam.tool_call(
tools=[{...openai-style tool spec...}],
messages=[{"role": "user", "content": "get me user 42"}],
)
```
---
## Training
SAM was trained on a **large, deterministic multi-domain corpus** assembled
in-house at AMEFORGE. The corpus covers all 10 supported domains across
5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain),
with paraphrase variation, robustness augmentation, and schema conditioning.
Training was performed on a single GPU using a custom optimizer setup tailored
to the SparseMind architecture. Full training methodology and the dataset
construction pipeline are kept internal as part of AMEFORGE's IP.
---
## Limitations
- English-only. Multilingual extension is future work.
- Schema-conditioned: best results when a JSON Schema is provided in the prompt.
- Domain set is fixed at 10. New domains require fine-tuning or retraining.
- Numeric atomicity is guaranteed within the production-relevant ranges for
each domain. Values outside those ranges fall back to subword encoding.
- Not a chat model — single-turn, structured action generation only.
---
## Citation
```bibtex
@misc{sam_2026,
title = {SAM: A Compact Schema-Conditioned Structured Action Model
for Agentic AI},
author = {AMEFORGE},
year = {2026},
note = {Built on the SparseMind architecture.
https://huggingface.co/AMFORGE/sam-v1}
}
```
---
Made by **AMEFORGE** — https://huggingface.co/AMEFORGE
|