ameforge commited on
Commit
c489079
·
verified ·
1 Parent(s): cc9abbb

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +174 -0
README.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language: [en]
4
+ tags:
5
+ - structured-action-model
6
+ - json-generation
7
+ - text-to-json
8
+ - agentic-ai
9
+ - function-calling
10
+ - tool-use
11
+ - iot
12
+ - robotics
13
+ - workflow-automation
14
+ - sparse-transformer
15
+ - on-device
16
+ - edge-ai
17
+ pipeline_tag: text-generation
18
+ inference: false
19
+ library_name: pytorch
20
+ ---
21
+
22
+ # SAM — Structured Action Model
23
+
24
+ **SAM** is a compact (35.9M params, ~137.0 MB FP32)
25
+ schema-conditioned model that turns natural language into structured JSON actions
26
+ across **10 domains**: robotics, HTTP/REST, MQTT/IoT, databases, workflows,
27
+ e-commerce, vehicles, smart home, calendar/email, and filesystem.
28
+
29
+ Built by **AMEFORGE** on the in-house **SparseMind** architecture.
30
+
31
+ > **SAM is the successor to [Foros](https://huggingface.co/AMEFORGE/foros-v5.3).**
32
+ > Where Foros specialized in robotics ROS-JSON, SAM generalizes the approach to
33
+ > the full agentic / workflow stack while preserving the SparseMind architecture.
34
+
35
+ ---
36
+
37
+ ## TL;DR
38
+
39
+ The cheap path to reliable JSON for agentic systems:
40
+
41
+ | | Today (LLM API) | With SAM |
42
+ |---|---|---|
43
+ | **Output reliability** | broken JSON → retry loop | atomic-numeric tokenizer + schema-conditioned |
44
+ | **Latency** | 500–3000 ms | ~30–200 ms (CPU) |
45
+ | **Cost / 1M calls** | $$$$ | $0 (offline) |
46
+ | **Deployment** | API key, cloud, privacy concerns | runs on Jetson, Pi, laptop CPU |
47
+
48
+ ---
49
+
50
+ ## Benchmark
51
+
52
+ Evaluated on the **SAM Bench v1** — 200 prompts covering all 10 domains across
53
+ 5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain).
54
+
55
+ *(Benchmark not yet run. After training, execute `python sam_benchmark.py` to populate this section.)*
56
+
57
+
58
+
59
+ > Benchmark is fully reproducible — see [`sam_benchmark.py`](./sam_benchmark.py)
60
+ > or the [`AMFORGE/sam-bench`](https://huggingface.co/datasets/AMEFORGE/sam-bench)
61
+ > dataset if published.
62
+
63
+ ---
64
+
65
+ ## Input format (schema-conditioned)
66
+
67
+ ```
68
+ <SCHEMA>{...JSON Schema...}</SCHEMA> <DOMAIN_TAG> <TASK>natural language</TASK> =>
69
+ ```
70
+
71
+ Output: a JSON array of operations conforming to the schema.
72
+
73
+ ### Domain tags
74
+
75
+ `<ROS>` `<HTTP>` `<MQTT>` `<DB>` `<WORKFLOW>` `<ECOMMERCE>` `<VEHICLE>` `<HOME>` `<CAL>` `<FILE>`
76
+
77
+ ### Examples
78
+
79
+ | Input | Output |
80
+ |---|---|
81
+ | `<ROS><TASK>move to x=0.5 y=-1.2 z=0.8</TASK> =>` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` |
82
+ | `<HTTP><TASK>get user 42</TASK> =>` | `[{"op":"http_request","method":"GET","url":"/users/42"}]` |
83
+ | `<MQTT><TASK>publish temp 22 to home/livingroom/temp qos 1</TASK> =>` | `[{"op":"mqtt_publish","topic":"home/livingroom/temp","payload":{"value":22,"unit":"celsius"},"qos":1}]` |
84
+ | `<HOME><TASK>turn on bedroom light at 50% blue</TASK> =>` | `[{"op":"set_light","room":"bedroom","brightness":50,"color":"blue"}]` |
85
+
86
+ ---
87
+
88
+ ## Highlights
89
+
90
+ | Property | Value |
91
+ |---|---|
92
+ | Architecture | SparseMind (decoder-only) |
93
+ | Parameters | 35,911,302 (~35.9M) |
94
+ | Size (FP32) | ~137.0 MB (~34.2 MB INT8) |
95
+ | Context length | 1024 tokens |
96
+ | Tokenizer | [`AMEFORGE/sam_tokenizer`](https://huggingface.co/AMEFORGE/sam_tokenizer) (NexusBPE) |
97
+ | Precision | FP32 (INT8 quantization compatible) |
98
+ | Domains | 10 (robotics, HTTP, MQTT, DB, workflow, e-commerce, vehicle, home, calendar, file) |
99
+ | Deployment | CPU, GPU, edge (Jetson, Raspberry Pi) |
100
+
101
+ ---
102
+
103
+ ## Quick inference
104
+
105
+ Use the `sam_runtime.py` SDK for a clean inference path with optional
106
+ constrained decoding:
107
+
108
+ ```python
109
+ from sam_runtime import SAM
110
+
111
+ sam = SAM.from_hub("AMFORGE/sam-v1") # downloads weights + tokenizer
112
+
113
+ result = sam.generate(
114
+ task="get user 42 from api.example.com",
115
+ domain="HTTP",
116
+ schema={"type": "array"},
117
+ mode="guarded", # JSON-validated decoding
118
+ )
119
+
120
+ print(result["ops"])
121
+ # -> [{"op":"http_request","method":"GET","url":"https://api.example.com/users/42"}]
122
+ ```
123
+
124
+ For OpenAI-compatible tool calling, drop-in replacement:
125
+
126
+ ```python
127
+ result = sam.tool_call(
128
+ tools=[{...openai-style tool spec...}],
129
+ messages=[{"role": "user", "content": "get me user 42"}],
130
+ )
131
+ ```
132
+
133
+ ---
134
+
135
+ ## Training
136
+
137
+ SAM was trained on a **large, deterministic multi-domain corpus** assembled
138
+ in-house at AMEFORGE. The corpus covers all 10 supported domains across
139
+ 5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain),
140
+ with paraphrase variation, robustness augmentation, and schema conditioning.
141
+
142
+ Training was performed on a single GPU using a custom optimizer setup tailored
143
+ to the SparseMind architecture. Full training methodology and the dataset
144
+ construction pipeline are kept internal as part of AMEFORGE's IP.
145
+
146
+ ---
147
+
148
+ ## Limitations
149
+
150
+ - English-only. Multilingual extension is future work.
151
+ - Schema-conditioned: best results when a JSON Schema is provided in the prompt.
152
+ - Domain set is fixed at 10. New domains require fine-tuning or retraining.
153
+ - Numeric atomicity is guaranteed within the production-relevant ranges for
154
+ each domain. Values outside those ranges fall back to subword encoding.
155
+ - Not a chat model — single-turn, structured action generation only.
156
+
157
+ ---
158
+
159
+ ## Citation
160
+
161
+ ```bibtex
162
+ @misc{sam_2026,
163
+ title = {SAM: A Compact Schema-Conditioned Structured Action Model
164
+ for Agentic AI},
165
+ author = {AMEFORGE},
166
+ year = {2026},
167
+ note = {Built on the SparseMind architecture.
168
+ https://huggingface.co/AMFORGE/sam-v1}
169
+ }
170
+ ```
171
+
172
+ ---
173
+
174
+ Made by **AMEFORGE** — https://huggingface.co/AMEFORGE