---
license: apache-2.0
language: [en]
tags:
- structured-action-model
- json-generation
- text-to-json
- agentic-ai
- function-calling
- tool-use
- iot
- robotics
- workflow-automation
- sparse-transformer
- on-device
- edge-ai
pipeline_tag: text-generation
inference: false
library_name: pytorch
---
# SAM — Structured Action Model
**SAM** is a compact (33.4M params, ~127.4 MB FP32)
schema-conditioned model that turns natural language into structured JSON actions
across **10 domains**: robotics, HTTP/REST, MQTT/IoT, databases, workflows,
e-commerce, vehicles, smart home, calendar/email, and filesystem.
Built by **AMEFORGE** on the in-house **SparseMind** architecture.
> **SAM is the successor to [Foros](https://huggingface.co/AMEFORGE/foros-v5.3).**
> Where Foros specialized in robotics ROS-JSON, SAM generalizes the approach to
> the full agentic / workflow stack while preserving the SparseMind architecture.
---
## TL;DR
The cheap path to reliable JSON for agentic systems:
| | Today (LLM API) | With SAM |
|---|---|---|
| **Output reliability** | broken JSON → retry loop | atomic-numeric tokenizer + schema-conditioned |
| **Latency** | 500–3000 ms | ~30–200 ms (CPU) |
| **Cost / 1M calls** | $$$$ | $0 (offline) |
| **Deployment** | API key, cloud, privacy concerns | runs on Jetson, Pi, laptop CPU |
---
## Benchmark
Evaluated on the **SAM Bench v1** — 200 prompts covering all 10 domains across
5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain).
*(Benchmark not yet run. After training, execute `python sam_benchmark.py` to populate this section.)*
> Benchmark is fully reproducible — see [`sam_benchmark.py`](./sam_benchmark.py)
> or the [`AMFORGE/sam-bench`](https://huggingface.co/datasets/AMEFORGE/sam-bench)
> dataset if published.
---
## Input format (schema-conditioned)
```
{...JSON Schema...} natural language =>
```
Output: a JSON array of operations conforming to the schema.
### Domain tags
`` `` `` `` `` `` `` `` `` ``
### Examples
| Input | Output |
|---|---|
| `move to x=0.5 y=-1.2 z=0.8 =>` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` |
| `get user 42 =>` | `[{"op":"http_request","method":"GET","url":"/users/42"}]` |
| `publish temp 22 to home/livingroom/temp qos 1 =>` | `[{"op":"mqtt_publish","topic":"home/livingroom/temp","payload":{"value":22,"unit":"celsius"},"qos":1}]` |
| `turn on bedroom light at 50% blue =>` | `[{"op":"set_light","room":"bedroom","brightness":50,"color":"blue"}]` |
---
## Highlights
| Property | Value |
|---|---|
| Architecture | SparseMind (decoder-only) |
| Parameters | 33,400,324 (~33.4M) |
| Size (FP32) | ~127.4 MB (~31.9 MB INT8) |
| Context length | 1024 tokens |
| Tokenizer | [`AMEFORGE/sam_tokenizer`](https://huggingface.co/AMEFORGE/sam_tokenizer) (NexusBPE) |
| Precision | FP32 (INT8 quantization compatible) |
| Domains | 10 (robotics, HTTP, MQTT, DB, workflow, e-commerce, vehicle, home, calendar, file) |
| Deployment | CPU, GPU, edge (Jetson, Raspberry Pi) |
---
## Quick inference
Use the `sam_runtime.py` SDK for a clean inference path with optional
constrained decoding:
```python
from sam_runtime import SAM
sam = SAM.from_hub("AMFORGE/sam-v1") # downloads weights + tokenizer
result = sam.generate(
task="get user 42 from api.example.com",
domain="HTTP",
schema={"type": "array"},
mode="guarded", # JSON-validated decoding
)
print(result["ops"])
# -> [{"op":"http_request","method":"GET","url":"https://api.example.com/users/42"}]
```
For OpenAI-compatible tool calling, drop-in replacement:
```python
result = sam.tool_call(
tools=[{...openai-style tool spec...}],
messages=[{"role": "user", "content": "get me user 42"}],
)
```
---
## Training
SAM was trained on a **large, deterministic multi-domain corpus** assembled
in-house at AMEFORGE. The corpus covers all 10 supported domains across
5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain),
with paraphrase variation, robustness augmentation, and schema conditioning.
Training was performed on a single GPU using a custom optimizer setup tailored
to the SparseMind architecture. Full training methodology and the dataset
construction pipeline are kept internal as part of AMEFORGE's IP.
---
## Limitations
- English-only. Multilingual extension is future work.
- Schema-conditioned: best results when a JSON Schema is provided in the prompt.
- Domain set is fixed at 10. New domains require fine-tuning or retraining.
- Numeric atomicity is guaranteed within the production-relevant ranges for
each domain. Values outside those ranges fall back to subword encoding.
- Not a chat model — single-turn, structured action generation only.
---
## Citation
```bibtex
@misc{sam_2026,
title = {SAM: A Compact Schema-Conditioned Structured Action Model
for Agentic AI},
author = {AMEFORGE},
year = {2026},
note = {Built on the SparseMind architecture.
https://huggingface.co/AMFORGE/sam-v1}
}
```
---
Made by **AMEFORGE** — https://huggingface.co/AMEFORGE