--- license: apache-2.0 library_name: pytorch pipeline_tag: text-generation tags: - ternary - bitnet - microcontroller - edge-ai - tinyml - byte-level - language-model - routed-architecture --- # Atome LM A reference implementation of a **routed-ternary tiny language model** with a bit-exact Python ↔ C99 inference engine, sized for **microcontroller-class RAM budgets**. The contribution is **integration, not a new architecture**: a complete train → ternary export → base-3 packing → C99 inference path, with bit-exact Python ↔ C parity enforced by tests. It combines three known ideas — ternary weights ([BitNet b1.58](https://arxiv.org/abs/2402.17764)), a per-token-routed 3-pathway block ([Hymba](https://arxiv.org/abs/2411.13676), [MossNet](https://arxiv.org/abs/2510.26182)), and a byte tokenizer at super-tiny scale ([Guertler 2024](https://arxiv.org/abs/2405.14159)). - **Code:** https://github.com/TilelliLab/atome-lm - **Project home / live in-browser demo:** https://atomelm.com - **License:** Apache-2.0 (code, weights, everything) > ⚠️ This is a **research artifact, not a product or a general chatbot.** Read the > "Honest results" section below before citing any number. The honesty dossier lives in > [`HONEST_RESULTS.md`](https://github.com/TilelliLab/atome-lm/blob/main/HONEST_RESULTS.md) > in the source repo. ## Files in this repo | File | What it is | |---|---| | `atome_944k.bin` (272 KB) | Packed `ATOME01` C-engine blob, ternary, loadable directly by the Atome C99 engine | | `atome_1m_v1.pt` (3.7 MB) | PyTorch source checkpoint (944,640 params) that produced the blob; use to fine-tune or re-export | | `vanilla_1m_v1.pt` (3.7 MB) | FP32 vanilla-GPT baseline (950,608 params) — shipped so you can reproduce the 944K reversal A/B | | `*.train.json` | Every-1000-step training logs for both checkpoints (every reported number is auditable) | | `config.json` | Architecture hyperparameters + provenance for all three checkpoints | | `SHA256SUMS` | Checksums for the three weight files | ## Honest results — read this before citing anything All numbers are **single-seed**, from the training logs shipped alongside. | Regime | Atome ternary | Vanilla FP32 (param-fair) | Verdict | |---|---|---|---| | **60K (MCU target)** | 6.31 ppl | 8.12 ppl | **Atome wins −22% ppl** (−52% at flash-fair budget) | | **944K (these checkpoints)** | val 1.0545 / 2.87 ppl | val 0.9337 / 2.54 ppl | **Vanilla wins by ~11%** | **The 944K result reverses.** At 944K parameters the FP32 vanilla baseline *beats* Atome by ~11% in val loss and perplexity, same recipe / same val slice / same seed. Atome's bet is the **sub-1M, MCU-class regime**: the 3-pathway inductive bias substitutes for capacity at small scale and *constrains* it above ~1M. This is the most important honest finding in the kit — it is **not** "tiny ternary beats everything." The bundled 944K checkpoint is here to make the architecture **runnable**, not to set a quality bar. It is narrow, single-corpus (TinyStories), and sometimes incoherent. ### What is NOT measured / NOT claimed - **Single seed only.** No multi-seed variance yet. - **MCU parity is QEMU only** (ARM Cortex-M3, MPS2-AN385), to FP32 epsilon. **No silicon bring-up** is done in this repository. The RP2040 demo exceeds 264 KB SRAM at 944K — the MCU claim is regime-dependent (it holds at the ~60K engine-default config, not at 944K). - **Router-entropy** is exposed for free as a per-token uncertainty signal, but its **calibration is unmeasured at this scale**. ## Usage This is a **custom architecture**, not a `transformers` AutoModel. Get the code from the source repo, then load the PyTorch checkpoint: ```bash git clone https://github.com/TilelliLab/atome-lm cd atome-lm && pip install -e . # Python >=3.10, PyTorch >=2.0 ``` ```python import torch from atome_llm.core.atome_lm import AtomeLM ckpt = torch.load("atome_1m_v1.pt", map_location="cpu", weights_only=False) model = AtomeLM(**ckpt["config"]) # vocab=256, d_model=256, n_layers=8, d_head=64, top_k=4 model.load_state_dict(ckpt["state_dict"]) model.eval() ids = torch.randint(0, 256, (1, 32)) # byte-level: ids are raw bytes 0-255 logits = model(ids) # (1, 32, 256) ent_per_layer = model.router_entropies(ids) # free per-token uncertainty signal ``` For microcontroller deployment, load `atome_944k.bin` directly with the Atome C99 engine (`atome_load(...)`) shipped in the source repo's `c_engine/`. ## Citation ```bibtex @software{atome_llm_2026, title = {Atome LM: a tiny ternary language model for microcontroller deployment}, author = {Atome LM contributors}, year = {2026}, note = {Apache 2.0, https://atomelm.com}, url = {https://github.com/TilelliLab/atome-lm} } ```