Atome LM v0.3.0 — checkpoints + honest model card

Browse files

Files changed (8) hide show

README.md +111 -0
SHA256SUMS +3 -0
atome_1m_v1.pt +3 -0
atome_1m_v1.train.json +229 -0
atome_944k.bin +3 -0
config.json +79 -0
vanilla_1m_v1.pt +3 -0
vanilla_1m_v1.train.json +230 -0

README.md ADDED Viewed

	@@ -0,0 +1,111 @@

+---
+license: apache-2.0
+library_name: pytorch
+pipeline_tag: text-generation
+tags:
+  - ternary
+  - bitnet
+  - microcontroller
+  - edge-ai
+  - tinyml
+  - byte-level
+  - language-model
+  - routed-architecture
+---
+# Atome LM
+A reference implementation of a **routed-ternary tiny language model** with a bit-exact
+Python ↔ C99 inference engine, sized for **microcontroller-class RAM budgets**.
+The contribution is **integration, not a new architecture**: a complete
+train → ternary export → base-3 packing → C99 inference path, with bit-exact Python ↔ C
+parity enforced by tests. It combines three known ideas — ternary weights
+([BitNet b1.58](https://arxiv.org/abs/2402.17764)), a per-token-routed 3-pathway block
+([Hymba](https://arxiv.org/abs/2411.13676), [MossNet](https://arxiv.org/abs/2510.26182)),
+and a byte tokenizer at super-tiny scale ([Guertler 2024](https://arxiv.org/abs/2405.14159)).
+- **Code:** https://github.com/TilelliLab/atome-lm
+- **Project home / live in-browser demo:** https://atomelm.com
+- **License:** Apache-2.0 (code, weights, everything)
+> ⚠️ This is a **research artifact, not a product or a general chatbot.** Read the
+> "Honest results" section below before citing any number. The honesty dossier lives in
+> [`HONEST_RESULTS.md`](https://github.com/TilelliLab/atome-lm/blob/main/HONEST_RESULTS.md)
+> in the source repo.
+## Files in this repo
+| File | What it is |
+|---|---|
+| `atome_944k.bin` (272 KB) | Packed `ATOME01` C-engine blob, ternary, loadable directly by the Atome C99 engine |
+| `atome_1m_v1.pt` (3.7 MB) | PyTorch source checkpoint (944,640 params) that produced the blob; use to fine-tune or re-export |
+| `vanilla_1m_v1.pt` (3.7 MB) | FP32 vanilla-GPT baseline (950,608 params) — shipped so you can reproduce the 944K reversal A/B |
+| `*.train.json` | Every-1000-step training logs for both checkpoints (every reported number is auditable) |
+| `config.json` | Architecture hyperparameters + provenance for all three checkpoints |
+| `SHA256SUMS` | Checksums for the three weight files |
+## Honest results — read this before citing anything
+All numbers are **single-seed**, from the training logs shipped alongside.
+| Regime | Atome ternary | Vanilla FP32 (param-fair) | Verdict |
+|---|---|---|---|
+| **60K (MCU target)** | 6.31 ppl | 8.12 ppl | **Atome wins −22% ppl** (−52% at flash-fair budget) |
+| **944K (these checkpoints)** | val 1.0545 / 2.87 ppl | val 0.9337 / 2.54 ppl | **Vanilla wins by ~11%** |
+**The 944K result reverses.** At 944K parameters the FP32 vanilla baseline *beats* Atome by
+~11% in val loss and perplexity, same recipe / same val slice / same seed. Atome's bet is the
+**sub-1M, MCU-class regime**: the 3-pathway inductive bias substitutes for capacity at small
+scale and *constrains* it above ~1M. This is the most important honest finding in the kit —
+it is **not** "tiny ternary beats everything."
+The bundled 944K checkpoint is here to make the architecture **runnable**, not to set a
+quality bar. It is narrow, single-corpus (TinyStories), and sometimes incoherent.
+### What is NOT measured / NOT claimed
+- **Single seed only.** No multi-seed variance yet.
+- **MCU parity is QEMU only** (ARM Cortex-M3, MPS2-AN385), to FP32 epsilon. **No silicon
+  bring-up** is done in this repository. The RP2040 demo exceeds 264 KB SRAM at 944K — the
+  MCU claim is regime-dependent (it holds at the ~60K engine-default config, not at 944K).
+- **Router-entropy** is exposed for free as a per-token uncertainty signal, but its
+  **calibration is unmeasured at this scale**.
+## Usage
+This is a **custom architecture**, not a `transformers` AutoModel. Get the code from the
+source repo, then load the PyTorch checkpoint:
+```bash
+git clone https://github.com/TilelliLab/atome-lm
+cd atome-lm && pip install -e .      # Python >=3.10, PyTorch >=2.0
+```
+```python
+import torch
+from atome_llm.core.atome_lm import AtomeLM
+ckpt = torch.load("atome_1m_v1.pt", map_location="cpu", weights_only=False)
+model = AtomeLM(**ckpt["config"])    # vocab=256, d_model=256, n_layers=8, d_head=64, top_k=4
+model.load_state_dict(ckpt["state_dict"])
+model.eval()
+ids = torch.randint(0, 256, (1, 32))          # byte-level: ids are raw bytes 0-255
+logits = model(ids)                            # (1, 32, 256)
+ent_per_layer = model.router_entropies(ids)    # free per-token uncertainty signal
+```
+For microcontroller deployment, load `atome_944k.bin` directly with the Atome C99 engine
+(`atome_load(...)`) shipped in the source repo's `c_engine/`.
+## Citation
+```bibtex
+@software{atome_llm_2026,
+  title  = {Atome LM: a tiny ternary language model for microcontroller deployment},
+  author = {Atome LM contributors},
+  year   = {2026},
+  note   = {Apache 2.0, https://atomelm.com},
+  url    = {https://github.com/TilelliLab/atome-lm}
+}
+```

SHA256SUMS ADDED Viewed

	@@ -0,0 +1,3 @@

+fdf8a6b69eacc5e4834e488759593198e482399887fce2c5b048a599844ae2f5  atome_944k.bin
+0bba4c123a9026bffb36f05acc9a7f9e68dcac95b01321d151d32d8320b660c8  atome_1m_v1.pt
+8c2f4308185c91c5c493d61a7ac5aa3d1c44cfb3baaa205ce7275ce74ee4494d  vanilla_1m_v1.pt

atome_1m_v1.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0bba4c123a9026bffb36f05acc9a7f9e68dcac95b01321d151d32d8320b660c8
+size 3808762

atome_1m_v1.train.json ADDED Viewed

	@@ -0,0 +1,229 @@

+{
+  "params": 944640,
+  "args": {
+    "data": "data/tinystories_full.txt",
+    "output": "checkpoints/atome_1m_v1.pt",
+    "steps": 30000,
+    "seq_len": 256,
+    "batch_size": 64,
+    "accum_steps": 4,
+    "lr": 0.0003,
+    "min_lr": 3e-05,
+    "warmup": 1000,
+    "weight_decay": 0.1,
+    "d_model": 256,
+    "n_layers": 8,
+    "d_head": 64,
+    "top_k": 4,
+    "bf16": true,
+    "eval_every": 1000,
+    "seed": 0
+  },
+  "log": [
+    {
+      "step": 1000,
+      "train_loss": 1.689065933227539,
+      "val_loss": 1.6851140782237053,
+      "val_ppl": 5.3930661286628725,
+      "lr": 0.0003
+    },
+    {
+      "step": 2000,
+      "train_loss": 1.475701928138733,
+      "val_loss": 1.4368714336305857,
+      "val_ppl": 4.207511724416042,
+      "lr": 0.0002992086242158385
+    },
+    {
+      "step": 3000,
+      "train_loss": 1.3402614891529083,
+      "val_loss": 1.355498529970646,
+      "val_ppl": 3.8786941199889884,
+      "lr": 0.00029684377502086165
+    },
+    {
+      "step": 4000,
+      "train_loss": 1.2906470894813538,
+      "val_loss": 1.298057682812214,
+      "val_ppl": 3.662176646542712,
+      "lr": 0.0002929331781096783
+    },
+    {
+      "step": 5000,
+      "train_loss": 1.2640663385391235,
+      "val_loss": 1.2564894184470177,
+      "val_ppl": 3.513066906295889,
+      "lr": 0.00028752268165557917
+    },
+    {
+      "step": 6000,
+      "train_loss": 1.205640196800232,
+      "val_loss": 1.2161348164081573,
+      "val_ppl": 3.374120900293555,
+      "lr": 0.0002806757187826245
+    },
+    {
+      "step": 7000,
+      "train_loss": 1.1917544305324554,
+      "val_loss": 1.1835042145103216,
+      "val_ppl": 3.2657982326287116,
+      "lr": 0.00027247256387026185
+    },
+    {
+      "step": 8000,
+      "train_loss": 1.1544596254825592,
+      "val_loss": 1.1677243299782276,
+      "val_ppl": 3.2146687829705525,
+      "lr": 0.0002630093914096226
+    },
+    {
+      "step": 9000,
+      "train_loss": 1.1510637402534485,
+      "val_loss": 1.1527819111943245,
+      "val_ppl": 3.166990953913901,
+      "lr": 0.0002523971484455467
+    },
+    {
+      "step": 10000,
+      "train_loss": 1.140123575925827,
+      "val_loss": 1.1461433116346598,
+      "val_ppl": 3.146036201225796,
+      "lr": 0.0002407602538239216
+    },
+    {
+      "step": 11000,
+      "train_loss": 1.1275735795497894,
+      "val_loss": 1.131921675056219,
+      "val_ppl": 3.1016110655411038,
+      "lr": 0.00022823513949447164
+    },
+    {
+      "step": 12000,
+      "train_loss": 1.1099890172481537,
+      "val_loss": 1.112453417852521,
+      "val_ppl": 3.041812083259338,
+      "lr": 0.00021496865097088842
+    },
+    {
+      "step": 13000,
+      "train_loss": 1.1127586960792542,
+      "val_loss": 1.112892348319292,
+      "val_ppl": 3.043147520317438,
+      "lr": 0.0002011163257014448
+    },
+    {
+      "step": 14000,
+      "train_loss": 1.0873990654945374,
+      "val_loss": 1.1024821121245623,
+      "val_ppl": 3.0116319626741244,
+      "lr": 0.00018684056953462323
+    },
+    {
+      "step": 15000,
+      "train_loss": 1.0949949026107788,
+      "val_loss": 1.1003286074846983,
+      "val_ppl": 3.0051533776041945,
+      "lr": 0.00017230875265903135
+    },
+    {
+      "step": 16000,
+      "train_loss": 1.092372715473175,
+      "val_loss": 1.0886210184544325,
+      "val_ppl": 2.9701754301311736,
+      "lr": 0.00015769124734096862
+    },
+    {
+      "step": 17000,
+      "train_loss": 1.0719301402568817,
+      "val_loss": 1.087962357327342,
+      "val_ppl": 2.968219735175533,
+      "lr": 0.00014315943046537674
+    },
+    {
+      "step": 18000,
+      "train_loss": 1.0894330739974976,
+      "val_loss": 1.0875801891088486,
+      "val_ppl": 2.9670855926576603,
+      "lr": 0.0001288836742985552
+    },
+    {
+      "step": 19000,
+      "train_loss": 1.0676527321338654,
+      "val_loss": 1.0716162715107203,
+      "val_ppl": 2.920095354830056,
+      "lr": 0.00011503134902911152
+    },
+    {
+      "step": 20000,
+      "train_loss": 1.0742259323596954,
+      "val_loss": 1.0812196973711252,
+      "val_ppl": 2.948273360207015,
+      "lr": 0.00010176486050552833
+    },
+    {
+      "step": 21000,
+      "train_loss": 1.0726729929447174,
+      "val_loss": 1.0718515273183584,
+      "val_ppl": 2.9207824050342435,
+      "lr": 8.923974617607838e-05
+    },
+    {
+      "step": 22000,
+      "train_loss": 1.0701198875904083,
+      "val_loss": 1.0739975553005934,
+      "val_ppl": 2.927057216357621,
+      "lr": 7.760285155445327e-05
+    },
+    {
+      "step": 23000,
+      "train_loss": 1.0675779581069946,
+      "val_loss": 1.0646078549325466,
+      "val_ppl": 2.899701657373658,
+      "lr": 6.699060859037736e-05
+    },
+    {
+      "step": 24000,
+      "train_loss": 1.0793527662754059,
+      "val_loss": 1.0707154776901007,
+      "val_ppl": 2.917466135348921,
+      "lr": 5.7527436129738084e-05
+    },
+    {
+      "step": 25000,
+      "train_loss": 1.0686360597610474,
+      "val_loss": 1.067691769450903,
+      "val_ppl": 2.9086578924472115,
+      "lr": 4.9324281217375474e-05
+    },
+    {
+      "step": 26000,
+      "train_loss": 1.079252928495407,
+      "val_loss": 1.064154027029872,
+      "val_ppl": 2.8983859904178786,
+      "lr": 4.247731834442082e-05
+    },
+    {
+      "step": 27000,
+      "train_loss": 1.0666958093643188,
+      "val_loss": 1.0639245696365833,
+      "val_ppl": 2.8977210106189566,
+      "lr": 3.7066821890321684e-05
+    },
+    {
+      "step": 28000,
+      "train_loss": 1.065284639596939,
+      "val_loss": 1.0690924655646086,
+      "val_ppl": 2.912734892906038,
+      "lr": 3.31562249791383e-05
+    },
+    {
+      "step": 29000,
+      "train_loss": 1.06133571267128,
+      "val_loss": 1.0545352958142757,
+      "val_ppl": 2.8706408450794916,
+      "lr": 3.0791375784161455e-05
+    }
+  ],
+  "final_val": 1.0572172198444605,
+  "best_val": 1.0545352958142757
+}

atome_944k.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fdf8a6b69eacc5e4834e488759593198e482399887fce2c5b048a599844ae2f5
+size 276655

config.json ADDED Viewed

	@@ -0,0 +1,79 @@

+{
+  "model_type": "atome-lm",
+  "architecture": "routed-ternary-3pathway",
+  "_comment": "Atome LM is a custom architecture, NOT a transformers AutoModel. Load with atome_llm.core.atome_lm.AtomeLM from github.com/TilelliLab/atome-lm. This config documents the bundled checkpoints; it is not consumed by transformers.",
+  "checkpoints": {
+    "atome_944k.bin": {
+      "format": "ATOME01 packed C-engine blob (4 trits/byte)",
+      "precision": "ternary {-alpha, 0, +alpha} per tensor (BitNet b1.58 style)",
+      "bits_per_weight": 1.58,
+      "params": 944640,
+      "disk_bytes": 276655,
+      "loadable_by": "Atome C99 engine (atome_load)",
+      "derived_from": "atome_1m_v1.pt"
+    },
+    "atome_1m_v1.pt": {
+      "format": "PyTorch state_dict",
+      "precision": "fp32 source (export to ternary via scripts/export_to_atome.py)",
+      "params": 944640,
+      "config": {
+        "vocab_size": 256,
+        "d_model": 256,
+        "n_layers": 8,
+        "d_head": 64,
+        "top_k": 4,
+        "kernel_size": 5,
+        "n_pathways": 3
+      },
+      "tokenizer": "byte-level (no vocab file; ids 0-255)",
+      "final_val_loss": 1.0545,
+      "final_val_ppl": 2.87
+    },
+    "vanilla_1m_v1.pt": {
+      "format": "PyTorch state_dict",
+      "precision": "fp32",
+      "role": "param-fair vanilla GPT baseline for the 944K reversal A/B in HONEST_RESULTS.md",
+      "params": 950608,
+      "config": {
+        "kind": "vanilla_transformer_fp32",
+        "vocab_size": 256,
+        "d_model": 152,
+        "n_layers": 3,
+        "n_heads": 4,
+        "d_ff": 608,
+        "max_seq": 256
+      },
+      "final_val_loss": 0.9337,
+      "final_val_ppl": 2.54
+    }
+  },
+  "engine_default_config": {
+    "_comment": "The C99 engine compile-time #defines; ~60K params, the MCU target regime (NOT the 944K bundled checkpoint).",
+    "vocab_size": 256,
+    "d_model": 64,
+    "n_layers": 4,
+    "d_head": 16,
+    "top_k": 4,
+    "kernel_size": 5,
+    "n_pathways": 3
+  },
+  "training": {
+    "corpus": "TinyStories (train.txt + valid.txt concatenated)",
+    "steps": 30000,
+    "seq_len": 256,
+    "batch_size": 64,
+    "accum_steps": 4,
+    "optimizer": "AdamW lr 3e-4->3e-5 cosine, warmup 1000, weight_decay 0.1",
+    "precision": "bf16 autocast",
+    "seed": 0,
+    "seeds_note": "single seed only; multi-seed variance not yet measured"
+  },
+  "license": "Apache-2.0",
+  "version": "0.3.0",
+  "source_repository": "https://github.com/TilelliLab/atome-lm",
+  "project_home": "https://atomelm.com"
+}

vanilla_1m_v1.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c2f4308185c91c5c493d61a7ac5aa3d1c44cfb3baaa205ce7275ce74ee4494d
+size 3812805

vanilla_1m_v1.train.json ADDED Viewed

	@@ -0,0 +1,230 @@

+{
+  "params": 950608,
+  "args": {
+    "data": "data/tinystories_full.txt",
+    "output": "checkpoints/vanilla_1m_v1.pt",
+    "steps": 30000,
+    "seq_len": 256,
+    "batch_size": 64,
+    "accum_steps": 4,
+    "lr": 0.0003,
+    "min_lr": 3e-05,
+    "warmup": 1000,
+    "weight_decay": 0.1,
+    "d_model": 152,
+    "n_layers": 3,
+    "n_heads": 4,
+    "d_ff": 608,
+    "max_seq": 256,
+    "bf16": true,
+    "eval_every": 1000,
+    "seed": 0
+  },
+  "log": [
+    {
+      "step": 1000,
+      "train_loss": 2.0875988006591797,
+      "val_loss": 2.0943055227398872,
+      "val_ppl": 8.119799995221573,
+      "lr": 0.0003
+    },
+    {
+      "step": 2000,
+      "train_loss": 1.5252898037433624,
+      "val_loss": 1.5066693723201752,
+      "val_ppl": 4.511679019275092,
+      "lr": 0.0002992086242158385
+    },
+    {
+      "step": 3000,
+      "train_loss": 1.3099323511123657,
+      "val_loss": 1.3194083347916603,
+      "val_ppl": 3.7412071801680677,
+      "lr": 0.00029684377502086165
+    },
+    {
+      "step": 4000,
+      "train_loss": 1.2161387205123901,
+      "val_loss": 1.2286550998687744,
+      "val_ppl": 3.4166314169360987,
+      "lr": 0.0002929331781096783
+    },
+    {
+      "step": 5000,
+      "train_loss": 1.1787906289100647,
+      "val_loss": 1.1772918552160263,
+      "val_ppl": 3.2455728094700103,
+      "lr": 0.00028752268165557917
+    },
+    {
+      "step": 6000,
+      "train_loss": 1.1403338611125946,
+      "val_loss": 1.1352313607931137,
+      "val_ppl": 3.1118934297571132,
+      "lr": 0.0002806757187826245
+    },
+    {
+      "step": 7000,
+      "train_loss": 1.1162661612033844,
+      "val_loss": 1.1075621414929628,
+      "val_ppl": 3.0269700675173796,
+      "lr": 0.00027247256387026185
+    },
+    {
+      "step": 8000,
+      "train_loss": 1.0829694867134094,
+      "val_loss": 1.0843632984906435,
+      "val_ppl": 2.9575561386746556,
+      "lr": 0.0002630093914096226
+    },
+    {
+      "step": 9000,
+      "train_loss": 1.0747118294239044,
+      "val_loss": 1.0635895021259785,
+      "val_ppl": 2.8967502410992467,
+      "lr": 0.0002523971484455467
+    },
+    {
+      "step": 10000,
+      "train_loss": 1.0519791841506958,
+      "val_loss": 1.0476661436259747,
+      "val_ppl": 2.85098954738486,
+      "lr": 0.0002407602538239216
+    },
+    {
+      "step": 11000,
+      "train_loss": 1.0250678956508636,
+      "val_loss": 1.0324134565889835,
+      "val_ppl": 2.807834249846705,
+      "lr": 0.00022823513949447164
+    },
+    {
+      "step": 12000,
+      "train_loss": 1.0199836790561676,
+      "val_loss": 1.023882026784122,
+      "val_ppl": 2.783981303587245,
+      "lr": 0.00021496865097088842
+    },
+    {
+      "step": 13000,
+      "train_loss": 1.0101815909147263,
+      "val_loss": 1.0102009763941169,
+      "val_ppl": 2.7461528714618,
+      "lr": 0.0002011163257014448
+    },
+    {
+      "step": 14000,
+      "train_loss": 1.0113594383001328,
+      "val_loss": 1.0001213569194078,
+      "val_ppl": 2.7186117307853896,
+      "lr": 0.00018684056953462323
+    },
+    {
+      "step": 15000,
+      "train_loss": 0.98267862200737,
+      "val_loss": 0.9921664940193295,
+      "val_ppl": 2.697071336220516,
+      "lr": 0.00017230875265903135
+    },
+    {
+      "step": 16000,
+      "train_loss": 0.995794028043747,
+      "val_loss": 0.9845060091465712,
+      "val_ppl": 2.6764893965183,
+      "lr": 0.00015769124734096862
+    },
+    {
+      "step": 17000,
+      "train_loss": 0.962462991476059,
+      "val_loss": 0.9766457295045257,
+      "val_ppl": 2.655533907298061,
+      "lr": 0.00014315943046537674
+    },
+    {
+      "step": 18000,
+      "train_loss": 0.9672404527664185,
+      "val_loss": 0.9714991142973304,
+      "val_ppl": 2.6419020052744058,
+      "lr": 0.0001288836742985552
+    },
+    {
+      "step": 19000,
+      "train_loss": 0.9653829336166382,
+      "val_loss": 0.9648234033957124,
+      "val_ppl": 2.624324168813844,
+      "lr": 0.00011503134902911152
+    },
+    {
+      "step": 20000,
+      "train_loss": 0.9600358754396439,
+      "val_loss": 0.959049197845161,
+      "val_ppl": 2.6092144469334535,
+      "lr": 0.00010176486050552833
+    },
+    {
+      "step": 21000,
+      "train_loss": 0.9566726982593536,
+      "val_loss": 0.9548654137179255,
+      "val_ppl": 2.598320861041842,
+      "lr": 8.923974617607838e-05
+    },
+    {
+      "step": 22000,
+      "train_loss": 0.9502571374177933,
+      "val_loss": 0.9499085610732436,
+      "val_ppl": 2.5854732356090246,
+      "lr": 7.760285155445327e-05
+    },
+    {
+      "step": 23000,
+      "train_loss": 0.9525800943374634,
+      "val_loss": 0.9469442367553711,
+      "val_ppl": 2.5778204027666733,
+      "lr": 6.699060859037736e-05
+    },
+    {
+      "step": 24000,
+      "train_loss": 0.9471650272607803,
+      "val_loss": 0.9441628893837333,
+      "val_ppl": 2.57066055039882,
+      "lr": 5.7527436129738084e-05
+    },
+    {
+      "step": 25000,
+      "train_loss": 0.9476055055856705,
+      "val_loss": 0.9407382626086473,
+      "val_ppl": 2.561872054696453,
+      "lr": 4.9324281217375474e-05
+    },
+    {
+      "step": 26000,
+      "train_loss": 0.9304470866918564,
+      "val_loss": 0.9391492558643222,
+      "val_ppl": 2.5578044553007495,
+      "lr": 4.247731834442082e-05
+    },
+    {
+      "step": 27000,
+      "train_loss": 0.9319835901260376,
+      "val_loss": 0.936947762966156,
+      "val_ppl": 2.5521796607019356,
+      "lr": 3.7066821890321684e-05
+    },
+    {
+      "step": 28000,
+      "train_loss": 0.933847963809967,
+      "val_loss": 0.9346829485148191,
+      "val_ppl": 2.5464059879406724,
+      "lr": 3.31562249791383e-05
+    },
+    {
+      "step": 29000,
+      "train_loss": 0.936771810054779,
+      "val_loss": 0.9336990155279636,
+      "val_ppl": 2.5439017273055704,
+      "lr": 3.0791375784161455e-05
+    }
+  ],
+  "final_val": 0.9317306941375136,
+  "best_val": 0.9336990155279636
+}