File size: 5,152 Bytes
1384bbf
 
1f22aa4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1384bbf
1f22aa4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
license: bsl-1.0
language:
- en
- fr
library_name: transformers
pipeline_tag: text-generation
tags:
- structured-generation
- function-calling
- tool-use
- json
- edge
- offline
- robotics
- iot
- agentic
- small-language-model
model-index:
- name: SAM-G
  results:
  - task:
      type: structured-action-generation
      name: Instruction-to-JSON (10 domains, zero-shot)
    metrics:
    - type: json_valid
      value: 100
      name: Valid JSON (%)
    - type: exact_match
      value: 76
      name: Exact match (%)
    - type: exact_match_fr
      value: 77
      name: Exact match, French (%)
  - task:
      type: text-generation
      name: Language modeling (FineWeb-Edu held-out)
    metrics:
    - type: bits_per_byte
      value: 1.179
      name: Bits per byte
---

# SAM-G

**SAM-G** is a 30.3M-parameter dual-mode language model for **offline structured
action generation**. Given a natural-language instruction it emits compact,
schema-valid JSON for ten domains; given a question it emits free text. Mode
selection is learned, not prompted. Built by **AMEFORGE** for robotics, IoT and
embedded deployment where hosted-LLM APIs are too costly, too slow, or
unavailable.

- **Parameters:** 30.3M · **Footprint:** 121 MB fp32 (~30 MB int8)
- **Context:** 1024 tokens · **Languages:** English, French (actions)
- **Throughput:** ~235 tok/s, 16 ms first-token (single GPU); runs on a
  Raspberry-Pi-class CPU
- **Released:** model weights + inference tokenizer. Training pipeline, data
  generators and architecture are proprietary.

## Two modes

| Input | Model emits |
|---|---|
| `turn on the kitchen lamp` | `[ACTION] {"domain":"home","op":"set_state","params":{"device":"lamp","name":"kitchen","state":"on"}}` |
| `what is a mutex` | `[CHAT] A mutex is a lock that allows one thread at a time.` |

Domains: `ros`, `http`, `mqtt`, `db`, `workflow`, `ecommerce`, `vehicle`,
`home`, `cal`, `file`.

## Benchmark

SAM-G is evaluated **zero-shot** in its native format; baselines run **3-shot**
through their chat template with a system instruction. `bpb` is tokenizer-fair
(per-token perplexity is not comparable across vocabularies). `exact/M` =
action exact-match per million parameters — the efficiency axis.

| Model | Params | bpb ↓ | JSON valid % | Exact % | Exact FR % | Cloze % | MB | tok/s | exact/M ↑ |
|---|---|---|---|---|---|---|---|---|---|
| **SAM-G** | **30.3M** | 1.179 | **100** | **76** | **77** | 83 | **121** | **235** | **2.51** |
| Pythia-70M | 70M | 1.674 | 2 | 0 | 0 | 75 | 141 | 120 | 0.00 |
| Qwen2.5-0.5B-Instruct | 494M | 0.814 | 99 | 25 | 7 | 96 | 988 | 27 | 0.05 |
| SmolLM2-360M-Instruct | 362M | 0.812 | 96 | 14 | 0 | 96 | 724 | 21 | 0.04 |
| Qwen2.5-1.5B-Instruct | 889M | 0.753 | 98 | 21 | 0 | 96 | 444* | 13 | 0.02 |

<sub>*Qwen2.5-1.5B loaded in 4-bit. Larger general models lead on bits-per-byte
and cloze (they are 12–30× bigger and trained for general knowledge); SAM-G
leads decisively on structured action, French actions, footprint, speed, and
exact-match per parameter. Notably Qwen2.5-1.5B scores *below* Qwen2.5-0.5B on
action exact-match — capability here comes from domain specialization, not
scale.</sub>

## Per-domain exact match (%)

| ros | http | mqtt | db | workflow | ecommerce | vehicle | home | cal | file |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 100 | 100 | 100 | 60 | 100 | 100 | 50 | 80 | 60 |

All general baselines score 0 on most domains, succeeding only partially on the
most generic ones (home, cal). `ros` (floating-point fields) is SAM-G's weakest
schema and benefits most from additional training data.

## Usage

```python
import sentencepiece as spm, torch
# Load the released inference tokenizer (samg_tokenizer.model) and weights.
sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model")

prompt = "publish 21.5 on sensors/temp qos 1 [ACTION]"
ids = torch.tensor([sp.EncodeAsIds(prompt)])
# greedy-decode with your loaded model until EOS, then sp.DecodeIds(...)
# -> {"domain":"mqtt","op":"publish","params":{"topic":"sensors/temp","payload":21.5,"qos":1}}
```

Always parse output as JSON and validate against your schema before execution.

## Intended use

On-device home automation; NL→ROS robot command layers; MQTT fleet gateways;
offline vehicle commands; NL-to-SQL on embedded databases; workflow triggers;
and the structured tool-calling stage of agentic pipelines — as a drop-in
replacement or a fast router ahead of a larger hosted model.

## Limitations

- Not a general assistant: factual knowledge and open-ended reasoning are
  limited at this scale; larger general models lead on bits-per-byte and cloze.
- French covers actions, not extended prose.
- Schemas outside the ten domains need fine-tuning. The `ros` schema
  (floating-point fields) is the weakest and benefits most from more data.
- The action benchmark is synthetic, drawn from the training distribution
  family with a disjoint evaluation seed (999).

## Citation

```bibtex
@misc{samg2026,
  title  = {SAM-G: A 30M-Parameter Dual-Mode Language Model for Offline Structured Action Generation},
  author = {AMEFORGE Lab},
  year   = {2026}
}
```