Text Generation
Transformers
English
French
structured-generation
function-calling
tool-use
json
edge
offline
robotics
iot
agentic
small-language-model
Eval Results (legacy)
Instructions to use AMFORGE/samg with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AMFORGE/samg with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AMFORGE/samg")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AMFORGE/samg", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AMFORGE/samg with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AMFORGE/samg" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AMFORGE/samg
- SGLang
How to use AMFORGE/samg with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AMFORGE/samg" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AMFORGE/samg" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AMFORGE/samg with Docker Model Runner:
docker model run hf.co/AMFORGE/samg
File size: 5,152 Bytes
1384bbf 1f22aa4 1384bbf 1f22aa4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
license: bsl-1.0
language:
- en
- fr
library_name: transformers
pipeline_tag: text-generation
tags:
- structured-generation
- function-calling
- tool-use
- json
- edge
- offline
- robotics
- iot
- agentic
- small-language-model
model-index:
- name: SAM-G
results:
- task:
type: structured-action-generation
name: Instruction-to-JSON (10 domains, zero-shot)
metrics:
- type: json_valid
value: 100
name: Valid JSON (%)
- type: exact_match
value: 76
name: Exact match (%)
- type: exact_match_fr
value: 77
name: Exact match, French (%)
- task:
type: text-generation
name: Language modeling (FineWeb-Edu held-out)
metrics:
- type: bits_per_byte
value: 1.179
name: Bits per byte
---
# SAM-G
**SAM-G** is a 30.3M-parameter dual-mode language model for **offline structured
action generation**. Given a natural-language instruction it emits compact,
schema-valid JSON for ten domains; given a question it emits free text. Mode
selection is learned, not prompted. Built by **AMEFORGE** for robotics, IoT and
embedded deployment where hosted-LLM APIs are too costly, too slow, or
unavailable.
- **Parameters:** 30.3M · **Footprint:** 121 MB fp32 (~30 MB int8)
- **Context:** 1024 tokens · **Languages:** English, French (actions)
- **Throughput:** ~235 tok/s, 16 ms first-token (single GPU); runs on a
Raspberry-Pi-class CPU
- **Released:** model weights + inference tokenizer. Training pipeline, data
generators and architecture are proprietary.
## Two modes
| Input | Model emits |
|---|---|
| `turn on the kitchen lamp` | `[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}` |
| `what is a mutex` | `[CHAT] A mutex is a lock that allows one thread at a time.` |
Domains: `ros`, `http`, `mqtt`, `db`, `workflow`, `ecommerce`, `vehicle`,
`home`, `cal`, `file`.
## Benchmark
SAM-G is evaluated **zero-shot** in its native format; baselines run **3-shot**
through their chat template with a system instruction. `bpb` is tokenizer-fair
(per-token perplexity is not comparable across vocabularies). `exact/M` =
action exact-match per million parameters — the efficiency axis.
| Model | Params | bpb ↓ | JSON valid % | Exact % | Exact FR % | Cloze % | MB | tok/s | exact/M ↑ |
|---|---|---|---|---|---|---|---|---|---|
| **SAM-G** | **30.3M** | 1.179 | **100** | **76** | **77** | 83 | **121** | **235** | **2.51** |
| Pythia-70M | 70M | 1.674 | 2 | 0 | 0 | 75 | 141 | 120 | 0.00 |
| Qwen2.5-0.5B-Instruct | 494M | 0.814 | 99 | 25 | 7 | 96 | 988 | 27 | 0.05 |
| SmolLM2-360M-Instruct | 362M | 0.812 | 96 | 14 | 0 | 96 | 724 | 21 | 0.04 |
| Qwen2.5-1.5B-Instruct | 889M | 0.753 | 98 | 21 | 0 | 96 | 444* | 13 | 0.02 |
<sub>*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte
and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G
leads decisively on structured action, French actions, footprint, speed, and
exact-match per parameter. Notably Qwen2.5-1.5B scores *below* Qwen2.5-0.5B on
action exact-match — capability here comes from domain specialization, not
scale.</sub>
## Per-domain exact match (%)
| ros | http | mqtt | db | workflow | ecommerce | vehicle | home | cal | file |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 100 | 100 | 100 | 60 | 100 | 100 | 50 | 80 | 60 |
All general baselines score 0 on most domains, succeeding only partially on the
most generic ones (home, cal). `ros` (floating-point fields) is SAM-G's weakest
schema and benefits most from additional training data.
## Usage
```python
import sentencepiece as spm, torch
# Load the released inference tokenizer (samg_tokenizer.model) and weights.
sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")
prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
ids = torch.tensor([sp.EncodeAsIds(prompt)])
# greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
# -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}
```
Always parse output as JSON and validate against your schema before execution.
## Intended use
On-device home automation; NL→ROS robot command layers; MQTT fleet gateways;
offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers;
and the structured tool-calling stage of agentic pipelines — as a drop-in
replacement or a fast router ahead of a larger hosted model.
## Limitations
- Not a general assistant: factual knowledge and open-ended reasoning are
limited at this scale; larger general models lead on bits-per-byte and cloze.
- French covers actions, not extended prose.
- Schemas outside the ten domains need fine-tuning. The `ros` schema
(floating-point fields) is the weakest and benefits most from more data.
- The action benchmark is synthetic, drawn from the training distribution
family with a disjoint evaluation seed (999).
## Citation
```bibtex
@misc{samg2026,
title = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
author = {AMEFORGE Lab},
year = {2026}
}
``` |