atome-lm / README.md

Atome LM v0.3.0 — checkpoints + honest model card

9e3a160 verified 1 day ago

4.82 kB

	---
	license: apache-2.0
	library_name: pytorch
	pipeline_tag: text-generation
	tags:
	- ternary
	- bitnet
	- microcontroller
	- edge-ai
	- tinyml
	- byte-level
	- language-model
	- routed-architecture
	---

	# Atome LM

	A reference implementation of a routed-ternary tiny language model with a bit-exact
	Python ↔ C99 inference engine, sized for microcontroller-class RAM budgets.

	The contribution is integration, not a new architecture: a complete
	train → ternary export → base-3 packing → C99 inference path, with bit-exact Python ↔ C
	parity enforced by tests. It combines three known ideas — ternary weights
	([BitNet b1.58](https://arxiv.org/abs/2402.17764)), a per-token-routed 3-pathway block
	([Hymba](https://arxiv.org/abs/2411.13676), [MossNet](https://arxiv.org/abs/2510.26182)),
	and a byte tokenizer at super-tiny scale ([Guertler 2024](https://arxiv.org/abs/2405.14159)).

	- Code: https://github.com/TilelliLab/atome-lm
	- Project home / live in-browser demo: https://atomelm.com
	- License: Apache-2.0 (code, weights, everything)

	> ⚠️ This is a research artifact, not a product or a general chatbot. Read the
	> "Honest results" section below before citing any number. The honesty dossier lives in
	> [`HONEST_RESULTS.md`](https://github.com/TilelliLab/atome-lm/blob/main/HONEST_RESULTS.md)
	> in the source repo.

	## Files in this repo

	\| File \| What it is \|
	\|---\|---\|
	\| `atome_944k.bin` (272 KB) \| Packed `ATOME01` C-engine blob, ternary, loadable directly by the Atome C99 engine \|
	\| `atome_1m_v1.pt` (3.7 MB) \| PyTorch source checkpoint (944,640 params) that produced the blob; use to fine-tune or re-export \|
	\| `vanilla_1m_v1.pt` (3.7 MB) \| FP32 vanilla-GPT baseline (950,608 params) — shipped so you can reproduce the 944K reversal A/B \|
	\| `*.train.json` \| Every-1000-step training logs for both checkpoints (every reported number is auditable) \|
	\| `config.json` \| Architecture hyperparameters + provenance for all three checkpoints \|
	\| `SHA256SUMS` \| Checksums for the three weight files \|

	## Honest results — read this before citing anything

	All numbers are single-seed, from the training logs shipped alongside.

	\| Regime \| Atome ternary \| Vanilla FP32 (param-fair) \| Verdict \|
	\|---\|---\|---\|---\|
	\| 60K (MCU target) \| 6.31 ppl \| 8.12 ppl \| Atome wins −22% ppl (−52% at flash-fair budget) \|
	\| 944K (these checkpoints) \| val 1.0545 / 2.87 ppl \| val 0.9337 / 2.54 ppl \| Vanilla wins by ~11% \|

	The 944K result reverses. At 944K parameters the FP32 vanilla baseline beats Atome by
	~11% in val loss and perplexity, same recipe / same val slice / same seed. Atome's bet is the
	sub-1M, MCU-class regime: the 3-pathway inductive bias substitutes for capacity at small
	scale and constrains it above ~1M. This is the most important honest finding in the kit —
	it is not "tiny ternary beats everything."

	The bundled 944K checkpoint is here to make the architecture runnable, not to set a
	quality bar. It is narrow, single-corpus (TinyStories), and sometimes incoherent.

	### What is NOT measured / NOT claimed
	- Single seed only. No multi-seed variance yet.
	- MCU parity is QEMU only (ARM Cortex-M3, MPS2-AN385), to FP32 epsilon. **No silicon
	bring-up** is done in this repository. The RP2040 demo exceeds 264 KB SRAM at 944K — the
	MCU claim is regime-dependent (it holds at the ~60K engine-default config, not at 944K).
	- Router-entropy is exposed for free as a per-token uncertainty signal, but its
	calibration is unmeasured at this scale.

	## Usage

	This is a custom architecture, not a `transformers` AutoModel. Get the code from the
	source repo, then load the PyTorch checkpoint:

	```bash
	git clone https://github.com/TilelliLab/atome-lm
	cd atome-lm && pip install -e . # Python >=3.10, PyTorch >=2.0
	```

	```python
	import torch
	from atome_llm.core.atome_lm import AtomeLM

	ckpt = torch.load("atome_1m_v1.pt", map_location="cpu", weights_only=False)
	model = AtomeLM(**ckpt["config"]) # vocab=256, d_model=256, n_layers=8, d_head=64, top_k=4
	model.load_state_dict(ckpt["state_dict"])
	model.eval()

	ids = torch.randint(0, 256, (1, 32)) # byte-level: ids are raw bytes 0-255
	logits = model(ids) # (1, 32, 256)
	ent_per_layer = model.router_entropies(ids) # free per-token uncertainty signal
	```

	For microcontroller deployment, load `atome_944k.bin` directly with the Atome C99 engine
	(`atome_load(...)`) shipped in the source repo's `c_engine/`.

	## Citation

	```bibtex
	@software{atome_llm_2026,
	title = {Atome LM: a tiny ternary language model for microcontroller deployment},
	author = {Atome LM contributors},
	year = {2026},
	note = {Apache 2.0, https://atomelm.com},
	url = {https://github.com/TilelliLab/atome-lm}
	}
	```