--- license: apache-2.0 language: [en] tags: - structured-action-model - json-generation - text-to-json - agentic-ai - function-calling - tool-use - iot - robotics - workflow-automation - sparse-transformer - on-device - edge-ai pipeline_tag: text-generation inference: false library_name: pytorch --- # SAM — Structured Action Model **SAM** is a compact (33.4M params, ~127.4 MB FP32) schema-conditioned model that turns natural language into structured JSON actions across **10 domains**: robotics, HTTP/REST, MQTT/IoT, databases, workflows, e-commerce, vehicles, smart home, calendar/email, and filesystem. Built by **AMEFORGE** on the in-house **SparseMind** architecture. > **SAM is the successor to [Foros](https://huggingface.co/AMEFORGE/foros-v5.3).** > Where Foros specialized in robotics ROS-JSON, SAM generalizes the approach to > the full agentic / workflow stack while preserving the SparseMind architecture. --- ## TL;DR The cheap path to reliable JSON for agentic systems: | | Today (LLM API) | With SAM | |---|---|---| | **Output reliability** | broken JSON → retry loop | atomic-numeric tokenizer + schema-conditioned | | **Latency** | 500–3000 ms | ~30–200 ms (CPU) | | **Cost / 1M calls** | $$$$ | $0 (offline) | | **Deployment** | API key, cloud, privacy concerns | runs on Jetson, Pi, laptop CPU | --- ## Benchmark Evaluated on the **SAM Bench v1** — 200 prompts covering all 10 domains across 5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain). *(Benchmark not yet run. After training, execute `python sam_benchmark.py` to populate this section.)* > Benchmark is fully reproducible — see [`sam_benchmark.py`](./sam_benchmark.py) > or the [`AMFORGE/sam-bench`](https://huggingface.co/datasets/AMEFORGE/sam-bench) > dataset if published. --- ## Input format (schema-conditioned) ``` {...JSON Schema...} natural language => ``` Output: a JSON array of operations conforming to the schema. ### Domain tags `` `` `` `` `` `` `` `` `` `` ### Examples | Input | Output | |---|---| | `move to x=0.5 y=-1.2 z=0.8 =>` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` | | `get user 42 =>` | `[{"op":"http_request","method":"GET","url":"/users/42"}]` | | `publish temp 22 to home/livingroom/temp qos 1 =>` | `[{"op":"mqtt_publish","topic":"home/livingroom/temp","payload":{"value":22,"unit":"celsius"},"qos":1}]` | | `turn on bedroom light at 50% blue =>` | `[{"op":"set_light","room":"bedroom","brightness":50,"color":"blue"}]` | --- ## Highlights | Property | Value | |---|---| | Architecture | SparseMind (decoder-only) | | Parameters | 33,400,324 (~33.4M) | | Size (FP32) | ~127.4 MB (~31.9 MB INT8) | | Context length | 1024 tokens | | Tokenizer | [`AMEFORGE/sam_tokenizer`](https://huggingface.co/AMEFORGE/sam_tokenizer) (NexusBPE) | | Precision | FP32 (INT8 quantization compatible) | | Domains | 10 (robotics, HTTP, MQTT, DB, workflow, e-commerce, vehicle, home, calendar, file) | | Deployment | CPU, GPU, edge (Jetson, Raspberry Pi) | --- ## Quick inference Use the `sam_runtime.py` SDK for a clean inference path with optional constrained decoding: ```python from sam_runtime import SAM sam = SAM.from_hub("AMFORGE/sam-v1") # downloads weights + tokenizer result = sam.generate( task="get user 42 from api.example.com", domain="HTTP", schema={"type": "array"}, mode="guarded", # JSON-validated decoding ) print(result["ops"]) # -> [{"op":"http_request","method":"GET","url":"https://api.example.com/users/42"}] ``` For OpenAI-compatible tool calling, drop-in replacement: ```python result = sam.tool_call( tools=[{...openai-style tool spec...}], messages=[{"role": "user", "content": "get me user 42"}], ) ``` --- ## Training SAM was trained on a **large, deterministic multi-domain corpus** assembled in-house at AMEFORGE. The corpus covers all 10 supported domains across 5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain), with paraphrase variation, robustness augmentation, and schema conditioning. Training was performed on a single GPU using a custom optimizer setup tailored to the SparseMind architecture. Full training methodology and the dataset construction pipeline are kept internal as part of AMEFORGE's IP. --- ## Limitations - English-only. Multilingual extension is future work. - Schema-conditioned: best results when a JSON Schema is provided in the prompt. - Domain set is fixed at 10. New domains require fine-tuning or retraining. - Numeric atomicity is guaranteed within the production-relevant ranges for each domain. Values outside those ranges fall back to subword encoding. - Not a chat model — single-turn, structured action generation only. --- ## Citation ```bibtex @misc{sam_2026, title = {SAM: A Compact Schema-Conditioned Structured Action Model for Agentic AI}, author = {AMEFORGE}, year = {2026}, note = {Built on the SparseMind architecture. https://huggingface.co/AMFORGE/sam-v1} } ``` --- Made by **AMEFORGE** — https://huggingface.co/AMEFORGE