File size: 4,821 Bytes
9e3a160 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | ---
license: apache-2.0
library_name: pytorch
pipeline_tag: text-generation
tags:
- ternary
- bitnet
- microcontroller
- edge-ai
- tinyml
- byte-level
- language-model
- routed-architecture
---
# Atome LM
A reference implementation of a **routed-ternary tiny language model** with a bit-exact
Python β C99 inference engine, sized for **microcontroller-class RAM budgets**.
The contribution is **integration, not a new architecture**: a complete
train β ternary export β base-3 packing β C99 inference path, with bit-exact Python β C
parity enforced by tests. It combines three known ideas β ternary weights
([BitNet b1.58](https://arxiv.org/abs/2402.17764)), a per-token-routed 3-pathway block
([Hymba](https://arxiv.org/abs/2411.13676), [MossNet](https://arxiv.org/abs/2510.26182)),
and a byte tokenizer at super-tiny scale ([Guertler 2024](https://arxiv.org/abs/2405.14159)).
- **Code:** https://github.com/TilelliLab/atome-lm
- **Project home / live in-browser demo:** https://atomelm.com
- **License:** Apache-2.0 (code, weights, everything)
> β οΈ This is a **research artifact, not a product or a general chatbot.** Read the
> "Honest results" section below before citing any number. The honesty dossier lives in
> [`HONEST_RESULTS.md`](https://github.com/TilelliLab/atome-lm/blob/main/HONEST_RESULTS.md)
> in the source repo.
## Files in this repo
| File | What it is |
|---|---|
| `atome_944k.bin` (272 KB) | Packed `ATOME01` C-engine blob, ternary, loadable directly by the Atome C99 engine |
| `atome_1m_v1.pt` (3.7 MB) | PyTorch source checkpoint (944,640 params) that produced the blob; use to fine-tune or re-export |
| `vanilla_1m_v1.pt` (3.7 MB) | FP32 vanilla-GPT baseline (950,608 params) β shipped so you can reproduce the 944K reversal A/B |
| `*.train.json` | Every-1000-step training logs for both checkpoints (every reported number is auditable) |
| `config.json` | Architecture hyperparameters + provenance for all three checkpoints |
| `SHA256SUMS` | Checksums for the three weight files |
## Honest results β read this before citing anything
All numbers are **single-seed**, from the training logs shipped alongside.
| Regime | Atome ternary | Vanilla FP32 (param-fair) | Verdict |
|---|---|---|---|
| **60K (MCU target)** | 6.31 ppl | 8.12 ppl | **Atome wins β22% ppl** (β52% at flash-fair budget) |
| **944K (these checkpoints)** | val 1.0545 / 2.87 ppl | val 0.9337 / 2.54 ppl | **Vanilla wins by ~11%** |
**The 944K result reverses.** At 944K parameters the FP32 vanilla baseline *beats* Atome by
~11% in val loss and perplexity, same recipe / same val slice / same seed. Atome's bet is the
**sub-1M, MCU-class regime**: the 3-pathway inductive bias substitutes for capacity at small
scale and *constrains* it above ~1M. This is the most important honest finding in the kit β
it is **not** "tiny ternary beats everything."
The bundled 944K checkpoint is here to make the architecture **runnable**, not to set a
quality bar. It is narrow, single-corpus (TinyStories), and sometimes incoherent.
### What is NOT measured / NOT claimed
- **Single seed only.** No multi-seed variance yet.
- **MCU parity is QEMU only** (ARM Cortex-M3, MPS2-AN385), to FP32 epsilon. **No silicon
bring-up** is done in this repository. The RP2040 demo exceeds 264 KB SRAM at 944K β the
MCU claim is regime-dependent (it holds at the ~60K engine-default config, not at 944K).
- **Router-entropy** is exposed for free as a per-token uncertainty signal, but its
**calibration is unmeasured at this scale**.
## Usage
This is a **custom architecture**, not a `transformers` AutoModel. Get the code from the
source repo, then load the PyTorch checkpoint:
```bash
git clone https://github.com/TilelliLab/atome-lm
cd atome-lm && pip install -e . # Python >=3.10, PyTorch >=2.0
```
```python
import torch
from atome_llm.core.atome_lm import AtomeLM
ckpt = torch.load("atome_1m_v1.pt", map_location="cpu", weights_only=False)
model = AtomeLM(**ckpt["config"]) # vocab=256, d_model=256, n_layers=8, d_head=64, top_k=4
model.load_state_dict(ckpt["state_dict"])
model.eval()
ids = torch.randint(0, 256, (1, 32)) # byte-level: ids are raw bytes 0-255
logits = model(ids) # (1, 32, 256)
ent_per_layer = model.router_entropies(ids) # free per-token uncertainty signal
```
For microcontroller deployment, load `atome_944k.bin` directly with the Atome C99 engine
(`atome_load(...)`) shipped in the source repo's `c_engine/`.
## Citation
```bibtex
@software{atome_llm_2026,
title = {Atome LM: a tiny ternary language model for microcontroller deployment},
author = {Atome LM contributors},
year = {2026},
note = {Apache 2.0, https://atomelm.com},
url = {https://github.com/TilelliLab/atome-lm}
}
```
|