File size: 5,280 Bytes
c489079
afe3f06
 
c489079
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
afe3f06
c489079
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
afe3f06
 
c489079
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
afe3f06
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
license: apache-2.0
language: [en]
tags:
- structured-action-model
- json-generation
- text-to-json
- agentic-ai
- function-calling
- tool-use
- iot
- robotics
- workflow-automation
- sparse-transformer
- on-device
- edge-ai
pipeline_tag: text-generation
inference: false
library_name: pytorch
---

# SAM — Structured Action Model

**SAM** is a compact (33.4M params, ~127.4 MB FP32)
schema-conditioned model that turns natural language into structured JSON actions
across **10 domains**: robotics, HTTP/REST, MQTT/IoT, databases, workflows,
e-commerce, vehicles, smart home, calendar/email, and filesystem.

Built by **AMEFORGE** on the in-house **SparseMind** architecture.

> **SAM is the successor to [Foros](https://huggingface.co/AMEFORGE/foros-v5.3).**
> Where Foros specialized in robotics ROS-JSON, SAM generalizes the approach to
> the full agentic / workflow stack while preserving the SparseMind architecture.

---

## TL;DR

The cheap path to reliable JSON for agentic systems:

| | Today (LLM API)  | With SAM |
|---|---|---|
| **Output reliability** | broken JSON → retry loop | atomic-numeric tokenizer + schema-conditioned |
| **Latency** | 500–3000 ms | ~30–200 ms (CPU) |
| **Cost / 1M calls** | $$$$ | $0 (offline) |
| **Deployment** | API key, cloud, privacy concerns | runs on Jetson, Pi, laptop CPU |

---

## Benchmark

Evaluated on the **SAM Bench v1** — 200 prompts covering all 10 domains across
5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain).

*(Benchmark not yet run. After training, execute `python sam_benchmark.py` to populate this section.)*



> Benchmark is fully reproducible — see [`sam_benchmark.py`](./sam_benchmark.py)
> or the [`AMFORGE/sam-bench`](https://huggingface.co/datasets/AMEFORGE/sam-bench)
> dataset if published.

---

## Input format (schema-conditioned)

```
<SCHEMA>{...JSON Schema...}</SCHEMA> <DOMAIN_TAG> <TASK>natural language</TASK> =>
```

Output: a JSON array of operations conforming to the schema.

### Domain tags

`<ROS>` `<HTTP>` `<MQTT>` `<DB>` `<WORKFLOW>` `<ECOMMERCE>` `<VEHICLE>` `<HOME>` `<CAL>` `<FILE>`

### Examples

| Input | Output |
|---|---|
| `<ROS><TASK>move to x=0.5 y=-1.2 z=0.8</TASK> =>` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` |
| `<HTTP><TASK>get user 42</TASK> =>` | `[{"op":"http_request","method":"GET","url":"/users/42"}]` |
| `<MQTT><TASK>publish temp 22 to home/livingroom/temp qos 1</TASK> =>` | `[{"op":"mqtt_publish","topic":"home/livingroom/temp","payload":{"value":22,"unit":"celsius"},"qos":1}]` |
| `<HOME><TASK>turn on bedroom light at 50% blue</TASK> =>` | `[{"op":"set_light","room":"bedroom","brightness":50,"color":"blue"}]` |

---

## Highlights

| Property | Value |
|---|---|
| Architecture | SparseMind (decoder-only) |
| Parameters | 33,400,324 (~33.4M) |
| Size (FP32) | ~127.4 MB (~31.9 MB INT8) |
| Context length | 1024 tokens |
| Tokenizer | [`AMEFORGE/sam_tokenizer`](https://huggingface.co/AMEFORGE/sam_tokenizer) (NexusBPE) |
| Precision | FP32 (INT8 quantization compatible) |
| Domains | 10 (robotics, HTTP, MQTT, DB, workflow, e-commerce, vehicle, home, calendar, file) |
| Deployment | CPU, GPU, edge (Jetson, Raspberry Pi) |

---

## Quick inference

Use the `sam_runtime.py` SDK for a clean inference path with optional
constrained decoding:

```python
from sam_runtime import SAM

sam = SAM.from_hub("AMFORGE/sam-v1")    # downloads weights + tokenizer

result = sam.generate(
    task="get user 42 from api.example.com",
    domain="HTTP",
    schema={"type": "array"},
    mode="guarded",                   # JSON-validated decoding
)

print(result["ops"])
# -> [{"op":"http_request","method":"GET","url":"https://api.example.com/users/42"}]
```

For OpenAI-compatible tool calling, drop-in replacement:

```python
result = sam.tool_call(
    tools=[{...openai-style tool spec...}],
    messages=[{"role": "user", "content": "get me user 42"}],
)
```

---

## Training

SAM was trained on a **large, deterministic multi-domain corpus** assembled
in-house at AMEFORGE. The corpus covers all 10 supported domains across
5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain),
with paraphrase variation, robustness augmentation, and schema conditioning.

Training was performed on a single GPU using a custom optimizer setup tailored
to the SparseMind architecture. Full training methodology and the dataset
construction pipeline are kept internal as part of AMEFORGE's IP.

---

## Limitations

- English-only. Multilingual extension is future work.
- Schema-conditioned: best results when a JSON Schema is provided in the prompt.
- Domain set is fixed at 10. New domains require fine-tuning or retraining.
- Numeric atomicity is guaranteed within the production-relevant ranges for
  each domain. Values outside those ranges fall back to subword encoding.
- Not a chat model — single-turn, structured action generation only.

---

## Citation

```bibtex
@misc{sam_2026,
  title  = {SAM: A Compact Schema-Conditioned Structured Action Model
            for Agentic AI},
  author = {AMEFORGE},
  year   = {2026},
  note   = {Built on the SparseMind architecture.
            https://huggingface.co/AMFORGE/sam-v1}
}
```

---

Made by **AMEFORGE** — https://huggingface.co/AMEFORGE