Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +174 -0

README.md ADDED Viewed

	@@ -0,0 +1,174 @@

+---
+license: apache-2.0
+language: [en]
+tags:
+- structured-action-model
+- json-generation
+- text-to-json
+- agentic-ai
+- function-calling
+- tool-use
+- iot
+- robotics
+- workflow-automation
+- sparse-transformer
+- on-device
+- edge-ai
+pipeline_tag: text-generation
+inference: false
+library_name: pytorch
+---
+# SAM — Structured Action Model
+**SAM** is a compact (35.9M params, ~137.0 MB FP32)
+schema-conditioned model that turns natural language into structured JSON actions
+across **10 domains**: robotics, HTTP/REST, MQTT/IoT, databases, workflows,
+e-commerce, vehicles, smart home, calendar/email, and filesystem.
+Built by **AMEFORGE** on the in-house **SparseMind** architecture.
+> **SAM is the successor to [Foros](https://huggingface.co/AMEFORGE/foros-v5.3).**
+> Where Foros specialized in robotics ROS-JSON, SAM generalizes the approach to
+> the full agentic / workflow stack while preserving the SparseMind architecture.
+---
+## TL;DR
+The cheap path to reliable JSON for agentic systems:
+| | Today (LLM API)  | With SAM |
+|---|---|---|
+| **Output reliability** | broken JSON → retry loop | atomic-numeric tokenizer + schema-conditioned |
+| **Latency** | 500–3000 ms | ~30–200 ms (CPU) |
+| **Cost / 1M calls** | $$$$ | $0 (offline) |
+| **Deployment** | API key, cloud, privacy concerns | runs on Jetson, Pi, laptop CPU |
+---
+## Benchmark
+Evaluated on the **SAM Bench v1** — 200 prompts covering all 10 domains across
+5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain).
+*(Benchmark not yet run. After training, execute `python sam_benchmark.py` to populate this section.)*
+> Benchmark is fully reproducible — see [`sam_benchmark.py`](./sam_benchmark.py)
+> or the [`AMFORGE/sam-bench`](https://huggingface.co/datasets/AMEFORGE/sam-bench)
+> dataset if published.
+---
+## Input format (schema-conditioned)
+```
+<SCHEMA>{...JSON Schema...}</SCHEMA> <DOMAIN_TAG> <TASK>natural language</TASK> =>
+```
+Output: a JSON array of operations conforming to the schema.
+### Domain tags
+`<ROS>` `<HTTP>` `<MQTT>` `<DB>` `<WORKFLOW>` `<ECOMMERCE>` `<VEHICLE>` `<HOME>` `<CAL>` `<FILE>`
+### Examples
+| Input | Output |
+|---|---|
+| `<ROS><TASK>move to x=0.5 y=-1.2 z=0.8</TASK> =>` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` |
+| `<HTTP><TASK>get user 42</TASK> =>` | `[{"op":"http_request","method":"GET","url":"/users/42"}]` |
+| `<MQTT><TASK>publish temp 22 to home/livingroom/temp qos 1</TASK> =>` | `[{"op":"mqtt_publish","topic":"home/livingroom/temp","payload":{"value":22,"unit":"celsius"},"qos":1}]` |
+| `<HOME><TASK>turn on bedroom light at 50% blue</TASK> =>` | `[{"op":"set_light","room":"bedroom","brightness":50,"color":"blue"}]` |
+---
+## Highlights
+| Property | Value |
+|---|---|
+| Architecture | SparseMind (decoder-only) |
+| Parameters | 35,911,302 (~35.9M) |
+| Size (FP32) | ~137.0 MB (~34.2 MB INT8) |
+| Context length | 1024 tokens |
+| Tokenizer | [`AMEFORGE/sam_tokenizer`](https://huggingface.co/AMEFORGE/sam_tokenizer) (NexusBPE) |
+| Precision | FP32 (INT8 quantization compatible) |
+| Domains | 10 (robotics, HTTP, MQTT, DB, workflow, e-commerce, vehicle, home, calendar, file) |
+| Deployment | CPU, GPU, edge (Jetson, Raspberry Pi) |
+---
+## Quick inference
+Use the `sam_runtime.py` SDK for a clean inference path with optional
+constrained decoding:
+```python
+from sam_runtime import SAM
+sam = SAM.from_hub("AMFORGE/sam-v1")    # downloads weights + tokenizer
+result = sam.generate(
+    task="get user 42 from api.example.com",
+    domain="HTTP",
+    schema={"type": "array"},
+    mode="guarded",                   # JSON-validated decoding
+)
+print(result["ops"])
+# -> [{"op":"http_request","method":"GET","url":"https://api.example.com/users/42"}]
+```
+For OpenAI-compatible tool calling, drop-in replacement:
+```python
+result = sam.tool_call(
+    tools=[{...openai-style tool spec...}],
+    messages=[{"role": "user", "content": "get me user 42"}],
+)
+```
+---
+## Training
+SAM was trained on a **large, deterministic multi-domain corpus** assembled
+in-house at AMEFORGE. The corpus covers all 10 supported domains across
+5 difficulty tiers (atomic / compound / noisy / long-chain / cross-domain),
+with paraphrase variation, robustness augmentation, and schema conditioning.
+Training was performed on a single GPU using a custom optimizer setup tailored
+to the SparseMind architecture. Full training methodology and the dataset
+construction pipeline are kept internal as part of AMEFORGE's IP.
+---
+## Limitations
+- English-only. Multilingual extension is future work.
+- Schema-conditioned: best results when a JSON Schema is provided in the prompt.
+- Domain set is fixed at 10. New domains require fine-tuning or retraining.
+- Numeric atomicity is guaranteed within the production-relevant ranges for
+  each domain. Values outside those ranges fall back to subword encoding.
+- Not a chat model — single-turn, structured action generation only.
+---
+## Citation
+```bibtex
+@misc{sam_2026,
+  title  = {SAM: A Compact Schema-Conditioned Structured Action Model
+            for Agentic AI},
+  author = {AMEFORGE},
+  year   = {2026},
+  note   = {Built on the SparseMind architecture.
+            https://huggingface.co/AMFORGE/sam-v1}
+}
+```
+---
+Made by **AMEFORGE** — https://huggingface.co/AMEFORGE