| --- |
| license: apache-2.0 |
| library_name: pytorch |
| pipeline_tag: text-generation |
| tags: |
| - ternary |
| - bitnet |
| - microcontroller |
| - edge-ai |
| - tinyml |
| - byte-level |
| - language-model |
| - routed-architecture |
| --- |
| |
| # Atome LM |
|
|
| A reference implementation of a **routed-ternary tiny language model** with a bit-exact |
| Python β C99 inference engine, sized for **microcontroller-class RAM budgets**. |
|
|
| The contribution is **integration, not a new architecture**: a complete |
| train β ternary export β base-3 packing β C99 inference path, with bit-exact Python β C |
| parity enforced by tests. It combines three known ideas β ternary weights |
| ([BitNet b1.58](https://arxiv.org/abs/2402.17764)), a per-token-routed 3-pathway block |
| ([Hymba](https://arxiv.org/abs/2411.13676), [MossNet](https://arxiv.org/abs/2510.26182)), |
| and a byte tokenizer at super-tiny scale ([Guertler 2024](https://arxiv.org/abs/2405.14159)). |
|
|
| - **Code:** https://github.com/TilelliLab/atome-lm |
| - **Project home / live in-browser demo:** https://atomelm.com |
| - **License:** Apache-2.0 (code, weights, everything) |
|
|
| > β οΈ This is a **research artifact, not a product or a general chatbot.** Read the |
| > "Honest results" section below before citing any number. The honesty dossier lives in |
| > [`HONEST_RESULTS.md`](https://github.com/TilelliLab/atome-lm/blob/main/HONEST_RESULTS.md) |
| > in the source repo. |
|
|
| ## Files in this repo |
|
|
| | File | What it is | |
| |---|---| |
| | `atome_944k.bin` (272 KB) | Packed `ATOME01` C-engine blob, ternary, loadable directly by the Atome C99 engine | |
| | `atome_1m_v1.pt` (3.7 MB) | PyTorch source checkpoint (944,640 params) that produced the blob; use to fine-tune or re-export | |
| | `vanilla_1m_v1.pt` (3.7 MB) | FP32 vanilla-GPT baseline (950,608 params) β shipped so you can reproduce the 944K reversal A/B | |
| | `*.train.json` | Every-1000-step training logs for both checkpoints (every reported number is auditable) | |
| | `config.json` | Architecture hyperparameters + provenance for all three checkpoints | |
| | `SHA256SUMS` | Checksums for the three weight files | |
|
|
| ## Honest results β read this before citing anything |
|
|
| All numbers are **single-seed**, from the training logs shipped alongside. |
|
|
| | Regime | Atome ternary | Vanilla FP32 (param-fair) | Verdict | |
| |---|---|---|---| |
| | **60K (MCU target)** | 6.31 ppl | 8.12 ppl | **Atome wins β22% ppl** (β52% at flash-fair budget) | |
| | **944K (these checkpoints)** | val 1.0545 / 2.87 ppl | val 0.9337 / 2.54 ppl | **Vanilla wins by ~11%** | |
|
|
| **The 944K result reverses.** At 944K parameters the FP32 vanilla baseline *beats* Atome by |
| ~11% in val loss and perplexity, same recipe / same val slice / same seed. Atome's bet is the |
| **sub-1M, MCU-class regime**: the 3-pathway inductive bias substitutes for capacity at small |
| scale and *constrains* it above ~1M. This is the most important honest finding in the kit β |
| it is **not** "tiny ternary beats everything." |
|
|
| The bundled 944K checkpoint is here to make the architecture **runnable**, not to set a |
| quality bar. It is narrow, single-corpus (TinyStories), and sometimes incoherent. |
|
|
| ### What is NOT measured / NOT claimed |
| - **Single seed only.** No multi-seed variance yet. |
| - **MCU parity is QEMU only** (ARM Cortex-M3, MPS2-AN385), to FP32 epsilon. **No silicon |
| bring-up** is done in this repository. The RP2040 demo exceeds 264 KB SRAM at 944K β the |
| MCU claim is regime-dependent (it holds at the ~60K engine-default config, not at 944K). |
| - **Router-entropy** is exposed for free as a per-token uncertainty signal, but its |
| **calibration is unmeasured at this scale**. |
|
|
| ## Usage |
|
|
| This is a **custom architecture**, not a `transformers` AutoModel. Get the code from the |
| source repo, then load the PyTorch checkpoint: |
|
|
| ```bash |
| git clone https://github.com/TilelliLab/atome-lm |
| cd atome-lm && pip install -e . # Python >=3.10, PyTorch >=2.0 |
| ``` |
|
|
| ```python |
| import torch |
| from atome_llm.core.atome_lm import AtomeLM |
| |
| ckpt = torch.load("atome_1m_v1.pt", map_location="cpu", weights_only=False) |
| model = AtomeLM(**ckpt["config"]) # vocab=256, d_model=256, n_layers=8, d_head=64, top_k=4 |
| model.load_state_dict(ckpt["state_dict"]) |
| model.eval() |
| |
| ids = torch.randint(0, 256, (1, 32)) # byte-level: ids are raw bytes 0-255 |
| logits = model(ids) # (1, 32, 256) |
| ent_per_layer = model.router_entropies(ids) # free per-token uncertainty signal |
| ``` |
|
|
| For microcontroller deployment, load `atome_944k.bin` directly with the Atome C99 engine |
| (`atome_load(...)`) shipped in the source repo's `c_engine/`. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{atome_llm_2026, |
| title = {Atome LM: a tiny ternary language model for microcontroller deployment}, |
| author = {Atome LM contributors}, |
| year = {2026}, |
| note = {Apache 2.0, https://atomelm.com}, |
| url = {https://github.com/TilelliLab/atome-lm} |
| } |
| ``` |
|
|