Text Generation
Transformers
English
French
structured-generation
function-calling
tool-use
json
edge
offline
robotics
iot
agentic
small-language-model
Eval Results (legacy)
Instructions to use AMFORGE/samg with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AMFORGE/samg with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AMFORGE/samg")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AMFORGE/samg", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AMFORGE/samg with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AMFORGE/samg" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AMFORGE/samg
- SGLang
How to use AMFORGE/samg with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AMFORGE/samg" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AMFORGE/samg" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AMFORGE/samg with Docker Model Runner:
docker model run hf.co/AMFORGE/samg
| license: bsl-1.0 | |
| language: | |
| - en | |
| - fr | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - structured-generation | |
| - function-calling | |
| - tool-use | |
| - json | |
| - edge | |
| - offline | |
| - robotics | |
| - iot | |
| - agentic | |
| - small-language-model | |
| model-index: | |
| - name: SAM-G | |
| results: | |
| - task: | |
| type: structured-action-generation | |
| name: Instruction-to-JSON (10 domains, zero-shot) | |
| metrics: | |
| - type: json_valid | |
| value: 100 | |
| name: Valid JSON (%) | |
| - type: exact_match | |
| value: 76 | |
| name: Exact match (%) | |
| - type: exact_match_fr | |
| value: 77 | |
| name: Exact match, French (%) | |
| - task: | |
| type: text-generation | |
| name: Language modeling (FineWeb-Edu held-out) | |
| metrics: | |
| - type: bits_per_byte | |
| value: 1.179 | |
| name: Bits per byte | |
| # SAM-G | |
| **SAM-G** is a 30.3M-parameter dual-mode language model for **offline structured | |
| action generation**. Given a natural-language instruction it emits compact, | |
| schema-valid JSON for ten domains; given a question it emits free text. Mode | |
| selection is learned, not prompted. Built by **AMEFORGE** for robotics, IoT and | |
| embedded deployment where hosted-LLM APIs are too costly, too slow, or | |
| unavailable. | |
| - **Parameters:** 30.3M · **Footprint:** 121 MB fp32 (~30 MB int8) | |
| - **Context:** 1024 tokens · **Languages:** English, French (actions) | |
| - **Throughput:** ~235 tok/s, 16 ms first-token (single GPU); runs on a | |
| Raspberry-Pi-class CPU | |
| - **Released:** model weights + inference tokenizer. Training pipeline, data | |
| generators and architecture are proprietary. | |
| ## Two modes | |
| | Input | Model emits | | |
| |---|---| | |
| | `turn on the kitchen lamp` | `[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}` | | |
| | `what is a mutex` | `[CHAT] A mutex is a lock that allows one thread at a time.` | | |
| Domains: `ros`, `http`, `mqtt`, `db`, `workflow`, `ecommerce`, `vehicle`, | |
| `home`, `cal`, `file`. | |
| ## Benchmark | |
| SAM-G is evaluated **zero-shot** in its native format; baselines run **3-shot** | |
| through their chat template with a system instruction. `bpb` is tokenizer-fair | |
| (per-token perplexity is not comparable across vocabularies). `exact/M` = | |
| action exact-match per million parameters — the efficiency axis. | |
| | Model | Params | bpb ↓ | JSON valid % | Exact % | Exact FR % | Cloze % | MB | tok/s | exact/M ↑ | | |
| |---|---|---|---|---|---|---|---|---|---| | |
| | **SAM-G** | **30.3M** | 1.179 | **100** | **76** | **77** | 83 | **121** | **235** | **2.51** | | |
| | Pythia-70M | 70M | 1.674 | 2 | 0 | 0 | 75 | 141 | 120 | 0.00 | | |
| | Qwen2.5-0.5B-Instruct | 494M | 0.814 | 99 | 25 | 7 | 96 | 988 | 27 | 0.05 | | |
| | SmolLM2-360M-Instruct | 362M | 0.812 | 96 | 14 | 0 | 96 | 724 | 21 | 0.04 | | |
| | Qwen2.5-1.5B-Instruct | 889M | 0.753 | 98 | 21 | 0 | 96 | 444* | 13 | 0.02 | | |
| <sub>*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte | |
| and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G | |
| leads decisively on structured action, French actions, footprint, speed, and | |
| exact-match per parameter. Notably Qwen2.5-1.5B scores *below* Qwen2.5-0.5B on | |
| action exact-match — capability here comes from domain specialization, not | |
| scale.</sub> | |
| ## Per-domain exact match (%) | |
| | ros | http | mqtt | db | workflow | ecommerce | vehicle | home | cal | file | | |
| |---|---|---|---|---|---|---|---|---|---| | |
| | 0 | 100 | 100 | 100 | 60 | 100 | 100 | 50 | 80 | 60 | | |
| All general baselines score 0 on most domains, succeeding only partially on the | |
| most generic ones (home, cal). `ros` (floating-point fields) is SAM-G's weakest | |
| schema and benefits most from additional training data. | |
| ## Usage | |
| ```python | |
| import sentencepiece as spm, torch | |
| # Load the released inference tokenizer (samg_tokenizer.model) and weights. | |
| sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model") | |
| prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]" | |
| ids = torch.tensor([sp.EncodeAsIds(prompt)]) | |
| # greedy-decode with your loaded model until EOS, then sp.DecodeIds(...) | |
| # -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}} | |
| ``` | |
| Always parse output as JSON and validate against your schema before execution. | |
| ## Intended use | |
| On-device home automation; NL→ROS robot command layers; MQTT fleet gateways; | |
| offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers; | |
| and the structured tool-calling stage of agentic pipelines — as a drop-in | |
| replacement or a fast router ahead of a larger hosted model. | |
| ## Limitations | |
| - Not a general assistant: factual knowledge and open-ended reasoning are | |
| limited at this scale; larger general models lead on bits-per-byte and cloze. | |
| - French covers actions, not extended prose. | |
| - Schemas outside the ten domains need fine-tuning. The `ros` schema | |
| (floating-point fields) is the weakest and benefits most from more data. | |
| - The action benchmark is synthetic, drawn from the training distribution | |
| family with a disjoint evaluation seed (999). | |
| ## Citation | |
| ```bibtex | |
| @misc{samg2026, | |
| title = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation}, | |
| author = {AMEFORGE Lab}, | |
| year = {2026} | |
| } | |
| ``` |