Atome LM v0.3.0 — checkpoints + honest model card
Browse files- README.md +111 -0
- SHA256SUMS +3 -0
- atome_1m_v1.pt +3 -0
- atome_1m_v1.train.json +229 -0
- atome_944k.bin +3 -0
- config.json +79 -0
- vanilla_1m_v1.pt +3 -0
- vanilla_1m_v1.train.json +230 -0
README.md
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
tags:
|
| 6 |
+
- ternary
|
| 7 |
+
- bitnet
|
| 8 |
+
- microcontroller
|
| 9 |
+
- edge-ai
|
| 10 |
+
- tinyml
|
| 11 |
+
- byte-level
|
| 12 |
+
- language-model
|
| 13 |
+
- routed-architecture
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# Atome LM
|
| 17 |
+
|
| 18 |
+
A reference implementation of a **routed-ternary tiny language model** with a bit-exact
|
| 19 |
+
Python ↔ C99 inference engine, sized for **microcontroller-class RAM budgets**.
|
| 20 |
+
|
| 21 |
+
The contribution is **integration, not a new architecture**: a complete
|
| 22 |
+
train → ternary export → base-3 packing → C99 inference path, with bit-exact Python ↔ C
|
| 23 |
+
parity enforced by tests. It combines three known ideas — ternary weights
|
| 24 |
+
([BitNet b1.58](https://arxiv.org/abs/2402.17764)), a per-token-routed 3-pathway block
|
| 25 |
+
([Hymba](https://arxiv.org/abs/2411.13676), [MossNet](https://arxiv.org/abs/2510.26182)),
|
| 26 |
+
and a byte tokenizer at super-tiny scale ([Guertler 2024](https://arxiv.org/abs/2405.14159)).
|
| 27 |
+
|
| 28 |
+
- **Code:** https://github.com/TilelliLab/atome-lm
|
| 29 |
+
- **Project home / live in-browser demo:** https://atomelm.com
|
| 30 |
+
- **License:** Apache-2.0 (code, weights, everything)
|
| 31 |
+
|
| 32 |
+
> ⚠️ This is a **research artifact, not a product or a general chatbot.** Read the
|
| 33 |
+
> "Honest results" section below before citing any number. The honesty dossier lives in
|
| 34 |
+
> [`HONEST_RESULTS.md`](https://github.com/TilelliLab/atome-lm/blob/main/HONEST_RESULTS.md)
|
| 35 |
+
> in the source repo.
|
| 36 |
+
|
| 37 |
+
## Files in this repo
|
| 38 |
+
|
| 39 |
+
| File | What it is |
|
| 40 |
+
|---|---|
|
| 41 |
+
| `atome_944k.bin` (272 KB) | Packed `ATOME01` C-engine blob, ternary, loadable directly by the Atome C99 engine |
|
| 42 |
+
| `atome_1m_v1.pt` (3.7 MB) | PyTorch source checkpoint (944,640 params) that produced the blob; use to fine-tune or re-export |
|
| 43 |
+
| `vanilla_1m_v1.pt` (3.7 MB) | FP32 vanilla-GPT baseline (950,608 params) — shipped so you can reproduce the 944K reversal A/B |
|
| 44 |
+
| `*.train.json` | Every-1000-step training logs for both checkpoints (every reported number is auditable) |
|
| 45 |
+
| `config.json` | Architecture hyperparameters + provenance for all three checkpoints |
|
| 46 |
+
| `SHA256SUMS` | Checksums for the three weight files |
|
| 47 |
+
|
| 48 |
+
## Honest results — read this before citing anything
|
| 49 |
+
|
| 50 |
+
All numbers are **single-seed**, from the training logs shipped alongside.
|
| 51 |
+
|
| 52 |
+
| Regime | Atome ternary | Vanilla FP32 (param-fair) | Verdict |
|
| 53 |
+
|---|---|---|---|
|
| 54 |
+
| **60K (MCU target)** | 6.31 ppl | 8.12 ppl | **Atome wins −22% ppl** (−52% at flash-fair budget) |
|
| 55 |
+
| **944K (these checkpoints)** | val 1.0545 / 2.87 ppl | val 0.9337 / 2.54 ppl | **Vanilla wins by ~11%** |
|
| 56 |
+
|
| 57 |
+
**The 944K result reverses.** At 944K parameters the FP32 vanilla baseline *beats* Atome by
|
| 58 |
+
~11% in val loss and perplexity, same recipe / same val slice / same seed. Atome's bet is the
|
| 59 |
+
**sub-1M, MCU-class regime**: the 3-pathway inductive bias substitutes for capacity at small
|
| 60 |
+
scale and *constrains* it above ~1M. This is the most important honest finding in the kit —
|
| 61 |
+
it is **not** "tiny ternary beats everything."
|
| 62 |
+
|
| 63 |
+
The bundled 944K checkpoint is here to make the architecture **runnable**, not to set a
|
| 64 |
+
quality bar. It is narrow, single-corpus (TinyStories), and sometimes incoherent.
|
| 65 |
+
|
| 66 |
+
### What is NOT measured / NOT claimed
|
| 67 |
+
- **Single seed only.** No multi-seed variance yet.
|
| 68 |
+
- **MCU parity is QEMU only** (ARM Cortex-M3, MPS2-AN385), to FP32 epsilon. **No silicon
|
| 69 |
+
bring-up** is done in this repository. The RP2040 demo exceeds 264 KB SRAM at 944K — the
|
| 70 |
+
MCU claim is regime-dependent (it holds at the ~60K engine-default config, not at 944K).
|
| 71 |
+
- **Router-entropy** is exposed for free as a per-token uncertainty signal, but its
|
| 72 |
+
**calibration is unmeasured at this scale**.
|
| 73 |
+
|
| 74 |
+
## Usage
|
| 75 |
+
|
| 76 |
+
This is a **custom architecture**, not a `transformers` AutoModel. Get the code from the
|
| 77 |
+
source repo, then load the PyTorch checkpoint:
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
git clone https://github.com/TilelliLab/atome-lm
|
| 81 |
+
cd atome-lm && pip install -e . # Python >=3.10, PyTorch >=2.0
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
import torch
|
| 86 |
+
from atome_llm.core.atome_lm import AtomeLM
|
| 87 |
+
|
| 88 |
+
ckpt = torch.load("atome_1m_v1.pt", map_location="cpu", weights_only=False)
|
| 89 |
+
model = AtomeLM(**ckpt["config"]) # vocab=256, d_model=256, n_layers=8, d_head=64, top_k=4
|
| 90 |
+
model.load_state_dict(ckpt["state_dict"])
|
| 91 |
+
model.eval()
|
| 92 |
+
|
| 93 |
+
ids = torch.randint(0, 256, (1, 32)) # byte-level: ids are raw bytes 0-255
|
| 94 |
+
logits = model(ids) # (1, 32, 256)
|
| 95 |
+
ent_per_layer = model.router_entropies(ids) # free per-token uncertainty signal
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
For microcontroller deployment, load `atome_944k.bin` directly with the Atome C99 engine
|
| 99 |
+
(`atome_load(...)`) shipped in the source repo's `c_engine/`.
|
| 100 |
+
|
| 101 |
+
## Citation
|
| 102 |
+
|
| 103 |
+
```bibtex
|
| 104 |
+
@software{atome_llm_2026,
|
| 105 |
+
title = {Atome LM: a tiny ternary language model for microcontroller deployment},
|
| 106 |
+
author = {Atome LM contributors},
|
| 107 |
+
year = {2026},
|
| 108 |
+
note = {Apache 2.0, https://atomelm.com},
|
| 109 |
+
url = {https://github.com/TilelliLab/atome-lm}
|
| 110 |
+
}
|
| 111 |
+
```
|
SHA256SUMS
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
fdf8a6b69eacc5e4834e488759593198e482399887fce2c5b048a599844ae2f5 atome_944k.bin
|
| 2 |
+
0bba4c123a9026bffb36f05acc9a7f9e68dcac95b01321d151d32d8320b660c8 atome_1m_v1.pt
|
| 3 |
+
8c2f4308185c91c5c493d61a7ac5aa3d1c44cfb3baaa205ce7275ce74ee4494d vanilla_1m_v1.pt
|
atome_1m_v1.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0bba4c123a9026bffb36f05acc9a7f9e68dcac95b01321d151d32d8320b660c8
|
| 3 |
+
size 3808762
|
atome_1m_v1.train.json
ADDED
|
@@ -0,0 +1,229 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"params": 944640,
|
| 3 |
+
"args": {
|
| 4 |
+
"data": "data/tinystories_full.txt",
|
| 5 |
+
"output": "checkpoints/atome_1m_v1.pt",
|
| 6 |
+
"steps": 30000,
|
| 7 |
+
"seq_len": 256,
|
| 8 |
+
"batch_size": 64,
|
| 9 |
+
"accum_steps": 4,
|
| 10 |
+
"lr": 0.0003,
|
| 11 |
+
"min_lr": 3e-05,
|
| 12 |
+
"warmup": 1000,
|
| 13 |
+
"weight_decay": 0.1,
|
| 14 |
+
"d_model": 256,
|
| 15 |
+
"n_layers": 8,
|
| 16 |
+
"d_head": 64,
|
| 17 |
+
"top_k": 4,
|
| 18 |
+
"bf16": true,
|
| 19 |
+
"eval_every": 1000,
|
| 20 |
+
"seed": 0
|
| 21 |
+
},
|
| 22 |
+
"log": [
|
| 23 |
+
{
|
| 24 |
+
"step": 1000,
|
| 25 |
+
"train_loss": 1.689065933227539,
|
| 26 |
+
"val_loss": 1.6851140782237053,
|
| 27 |
+
"val_ppl": 5.3930661286628725,
|
| 28 |
+
"lr": 0.0003
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"step": 2000,
|
| 32 |
+
"train_loss": 1.475701928138733,
|
| 33 |
+
"val_loss": 1.4368714336305857,
|
| 34 |
+
"val_ppl": 4.207511724416042,
|
| 35 |
+
"lr": 0.0002992086242158385
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"step": 3000,
|
| 39 |
+
"train_loss": 1.3402614891529083,
|
| 40 |
+
"val_loss": 1.355498529970646,
|
| 41 |
+
"val_ppl": 3.8786941199889884,
|
| 42 |
+
"lr": 0.00029684377502086165
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"step": 4000,
|
| 46 |
+
"train_loss": 1.2906470894813538,
|
| 47 |
+
"val_loss": 1.298057682812214,
|
| 48 |
+
"val_ppl": 3.662176646542712,
|
| 49 |
+
"lr": 0.0002929331781096783
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"step": 5000,
|
| 53 |
+
"train_loss": 1.2640663385391235,
|
| 54 |
+
"val_loss": 1.2564894184470177,
|
| 55 |
+
"val_ppl": 3.513066906295889,
|
| 56 |
+
"lr": 0.00028752268165557917
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"step": 6000,
|
| 60 |
+
"train_loss": 1.205640196800232,
|
| 61 |
+
"val_loss": 1.2161348164081573,
|
| 62 |
+
"val_ppl": 3.374120900293555,
|
| 63 |
+
"lr": 0.0002806757187826245
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"step": 7000,
|
| 67 |
+
"train_loss": 1.1917544305324554,
|
| 68 |
+
"val_loss": 1.1835042145103216,
|
| 69 |
+
"val_ppl": 3.2657982326287116,
|
| 70 |
+
"lr": 0.00027247256387026185
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"step": 8000,
|
| 74 |
+
"train_loss": 1.1544596254825592,
|
| 75 |
+
"val_loss": 1.1677243299782276,
|
| 76 |
+
"val_ppl": 3.2146687829705525,
|
| 77 |
+
"lr": 0.0002630093914096226
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"step": 9000,
|
| 81 |
+
"train_loss": 1.1510637402534485,
|
| 82 |
+
"val_loss": 1.1527819111943245,
|
| 83 |
+
"val_ppl": 3.166990953913901,
|
| 84 |
+
"lr": 0.0002523971484455467
|
| 85 |
+
},
|
| 86 |
+
{
|
| 87 |
+
"step": 10000,
|
| 88 |
+
"train_loss": 1.140123575925827,
|
| 89 |
+
"val_loss": 1.1461433116346598,
|
| 90 |
+
"val_ppl": 3.146036201225796,
|
| 91 |
+
"lr": 0.0002407602538239216
|
| 92 |
+
},
|
| 93 |
+
{
|
| 94 |
+
"step": 11000,
|
| 95 |
+
"train_loss": 1.1275735795497894,
|
| 96 |
+
"val_loss": 1.131921675056219,
|
| 97 |
+
"val_ppl": 3.1016110655411038,
|
| 98 |
+
"lr": 0.00022823513949447164
|
| 99 |
+
},
|
| 100 |
+
{
|
| 101 |
+
"step": 12000,
|
| 102 |
+
"train_loss": 1.1099890172481537,
|
| 103 |
+
"val_loss": 1.112453417852521,
|
| 104 |
+
"val_ppl": 3.041812083259338,
|
| 105 |
+
"lr": 0.00021496865097088842
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"step": 13000,
|
| 109 |
+
"train_loss": 1.1127586960792542,
|
| 110 |
+
"val_loss": 1.112892348319292,
|
| 111 |
+
"val_ppl": 3.043147520317438,
|
| 112 |
+
"lr": 0.0002011163257014448
|
| 113 |
+
},
|
| 114 |
+
{
|
| 115 |
+
"step": 14000,
|
| 116 |
+
"train_loss": 1.0873990654945374,
|
| 117 |
+
"val_loss": 1.1024821121245623,
|
| 118 |
+
"val_ppl": 3.0116319626741244,
|
| 119 |
+
"lr": 0.00018684056953462323
|
| 120 |
+
},
|
| 121 |
+
{
|
| 122 |
+
"step": 15000,
|
| 123 |
+
"train_loss": 1.0949949026107788,
|
| 124 |
+
"val_loss": 1.1003286074846983,
|
| 125 |
+
"val_ppl": 3.0051533776041945,
|
| 126 |
+
"lr": 0.00017230875265903135
|
| 127 |
+
},
|
| 128 |
+
{
|
| 129 |
+
"step": 16000,
|
| 130 |
+
"train_loss": 1.092372715473175,
|
| 131 |
+
"val_loss": 1.0886210184544325,
|
| 132 |
+
"val_ppl": 2.9701754301311736,
|
| 133 |
+
"lr": 0.00015769124734096862
|
| 134 |
+
},
|
| 135 |
+
{
|
| 136 |
+
"step": 17000,
|
| 137 |
+
"train_loss": 1.0719301402568817,
|
| 138 |
+
"val_loss": 1.087962357327342,
|
| 139 |
+
"val_ppl": 2.968219735175533,
|
| 140 |
+
"lr": 0.00014315943046537674
|
| 141 |
+
},
|
| 142 |
+
{
|
| 143 |
+
"step": 18000,
|
| 144 |
+
"train_loss": 1.0894330739974976,
|
| 145 |
+
"val_loss": 1.0875801891088486,
|
| 146 |
+
"val_ppl": 2.9670855926576603,
|
| 147 |
+
"lr": 0.0001288836742985552
|
| 148 |
+
},
|
| 149 |
+
{
|
| 150 |
+
"step": 19000,
|
| 151 |
+
"train_loss": 1.0676527321338654,
|
| 152 |
+
"val_loss": 1.0716162715107203,
|
| 153 |
+
"val_ppl": 2.920095354830056,
|
| 154 |
+
"lr": 0.00011503134902911152
|
| 155 |
+
},
|
| 156 |
+
{
|
| 157 |
+
"step": 20000,
|
| 158 |
+
"train_loss": 1.0742259323596954,
|
| 159 |
+
"val_loss": 1.0812196973711252,
|
| 160 |
+
"val_ppl": 2.948273360207015,
|
| 161 |
+
"lr": 0.00010176486050552833
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"step": 21000,
|
| 165 |
+
"train_loss": 1.0726729929447174,
|
| 166 |
+
"val_loss": 1.0718515273183584,
|
| 167 |
+
"val_ppl": 2.9207824050342435,
|
| 168 |
+
"lr": 8.923974617607838e-05
|
| 169 |
+
},
|
| 170 |
+
{
|
| 171 |
+
"step": 22000,
|
| 172 |
+
"train_loss": 1.0701198875904083,
|
| 173 |
+
"val_loss": 1.0739975553005934,
|
| 174 |
+
"val_ppl": 2.927057216357621,
|
| 175 |
+
"lr": 7.760285155445327e-05
|
| 176 |
+
},
|
| 177 |
+
{
|
| 178 |
+
"step": 23000,
|
| 179 |
+
"train_loss": 1.0675779581069946,
|
| 180 |
+
"val_loss": 1.0646078549325466,
|
| 181 |
+
"val_ppl": 2.899701657373658,
|
| 182 |
+
"lr": 6.699060859037736e-05
|
| 183 |
+
},
|
| 184 |
+
{
|
| 185 |
+
"step": 24000,
|
| 186 |
+
"train_loss": 1.0793527662754059,
|
| 187 |
+
"val_loss": 1.0707154776901007,
|
| 188 |
+
"val_ppl": 2.917466135348921,
|
| 189 |
+
"lr": 5.7527436129738084e-05
|
| 190 |
+
},
|
| 191 |
+
{
|
| 192 |
+
"step": 25000,
|
| 193 |
+
"train_loss": 1.0686360597610474,
|
| 194 |
+
"val_loss": 1.067691769450903,
|
| 195 |
+
"val_ppl": 2.9086578924472115,
|
| 196 |
+
"lr": 4.9324281217375474e-05
|
| 197 |
+
},
|
| 198 |
+
{
|
| 199 |
+
"step": 26000,
|
| 200 |
+
"train_loss": 1.079252928495407,
|
| 201 |
+
"val_loss": 1.064154027029872,
|
| 202 |
+
"val_ppl": 2.8983859904178786,
|
| 203 |
+
"lr": 4.247731834442082e-05
|
| 204 |
+
},
|
| 205 |
+
{
|
| 206 |
+
"step": 27000,
|
| 207 |
+
"train_loss": 1.0666958093643188,
|
| 208 |
+
"val_loss": 1.0639245696365833,
|
| 209 |
+
"val_ppl": 2.8977210106189566,
|
| 210 |
+
"lr": 3.7066821890321684e-05
|
| 211 |
+
},
|
| 212 |
+
{
|
| 213 |
+
"step": 28000,
|
| 214 |
+
"train_loss": 1.065284639596939,
|
| 215 |
+
"val_loss": 1.0690924655646086,
|
| 216 |
+
"val_ppl": 2.912734892906038,
|
| 217 |
+
"lr": 3.31562249791383e-05
|
| 218 |
+
},
|
| 219 |
+
{
|
| 220 |
+
"step": 29000,
|
| 221 |
+
"train_loss": 1.06133571267128,
|
| 222 |
+
"val_loss": 1.0545352958142757,
|
| 223 |
+
"val_ppl": 2.8706408450794916,
|
| 224 |
+
"lr": 3.0791375784161455e-05
|
| 225 |
+
}
|
| 226 |
+
],
|
| 227 |
+
"final_val": 1.0572172198444605,
|
| 228 |
+
"best_val": 1.0545352958142757
|
| 229 |
+
}
|
atome_944k.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fdf8a6b69eacc5e4834e488759593198e482399887fce2c5b048a599844ae2f5
|
| 3 |
+
size 276655
|
config.json
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "atome-lm",
|
| 3 |
+
"architecture": "routed-ternary-3pathway",
|
| 4 |
+
"_comment": "Atome LM is a custom architecture, NOT a transformers AutoModel. Load with atome_llm.core.atome_lm.AtomeLM from github.com/TilelliLab/atome-lm. This config documents the bundled checkpoints; it is not consumed by transformers.",
|
| 5 |
+
|
| 6 |
+
"checkpoints": {
|
| 7 |
+
"atome_944k.bin": {
|
| 8 |
+
"format": "ATOME01 packed C-engine blob (4 trits/byte)",
|
| 9 |
+
"precision": "ternary {-alpha, 0, +alpha} per tensor (BitNet b1.58 style)",
|
| 10 |
+
"bits_per_weight": 1.58,
|
| 11 |
+
"params": 944640,
|
| 12 |
+
"disk_bytes": 276655,
|
| 13 |
+
"loadable_by": "Atome C99 engine (atome_load)",
|
| 14 |
+
"derived_from": "atome_1m_v1.pt"
|
| 15 |
+
},
|
| 16 |
+
"atome_1m_v1.pt": {
|
| 17 |
+
"format": "PyTorch state_dict",
|
| 18 |
+
"precision": "fp32 source (export to ternary via scripts/export_to_atome.py)",
|
| 19 |
+
"params": 944640,
|
| 20 |
+
"config": {
|
| 21 |
+
"vocab_size": 256,
|
| 22 |
+
"d_model": 256,
|
| 23 |
+
"n_layers": 8,
|
| 24 |
+
"d_head": 64,
|
| 25 |
+
"top_k": 4,
|
| 26 |
+
"kernel_size": 5,
|
| 27 |
+
"n_pathways": 3
|
| 28 |
+
},
|
| 29 |
+
"tokenizer": "byte-level (no vocab file; ids 0-255)",
|
| 30 |
+
"final_val_loss": 1.0545,
|
| 31 |
+
"final_val_ppl": 2.87
|
| 32 |
+
},
|
| 33 |
+
"vanilla_1m_v1.pt": {
|
| 34 |
+
"format": "PyTorch state_dict",
|
| 35 |
+
"precision": "fp32",
|
| 36 |
+
"role": "param-fair vanilla GPT baseline for the 944K reversal A/B in HONEST_RESULTS.md",
|
| 37 |
+
"params": 950608,
|
| 38 |
+
"config": {
|
| 39 |
+
"kind": "vanilla_transformer_fp32",
|
| 40 |
+
"vocab_size": 256,
|
| 41 |
+
"d_model": 152,
|
| 42 |
+
"n_layers": 3,
|
| 43 |
+
"n_heads": 4,
|
| 44 |
+
"d_ff": 608,
|
| 45 |
+
"max_seq": 256
|
| 46 |
+
},
|
| 47 |
+
"final_val_loss": 0.9337,
|
| 48 |
+
"final_val_ppl": 2.54
|
| 49 |
+
}
|
| 50 |
+
},
|
| 51 |
+
|
| 52 |
+
"engine_default_config": {
|
| 53 |
+
"_comment": "The C99 engine compile-time #defines; ~60K params, the MCU target regime (NOT the 944K bundled checkpoint).",
|
| 54 |
+
"vocab_size": 256,
|
| 55 |
+
"d_model": 64,
|
| 56 |
+
"n_layers": 4,
|
| 57 |
+
"d_head": 16,
|
| 58 |
+
"top_k": 4,
|
| 59 |
+
"kernel_size": 5,
|
| 60 |
+
"n_pathways": 3
|
| 61 |
+
},
|
| 62 |
+
|
| 63 |
+
"training": {
|
| 64 |
+
"corpus": "TinyStories (train.txt + valid.txt concatenated)",
|
| 65 |
+
"steps": 30000,
|
| 66 |
+
"seq_len": 256,
|
| 67 |
+
"batch_size": 64,
|
| 68 |
+
"accum_steps": 4,
|
| 69 |
+
"optimizer": "AdamW lr 3e-4->3e-5 cosine, warmup 1000, weight_decay 0.1",
|
| 70 |
+
"precision": "bf16 autocast",
|
| 71 |
+
"seed": 0,
|
| 72 |
+
"seeds_note": "single seed only; multi-seed variance not yet measured"
|
| 73 |
+
},
|
| 74 |
+
|
| 75 |
+
"license": "Apache-2.0",
|
| 76 |
+
"version": "0.3.0",
|
| 77 |
+
"source_repository": "https://github.com/TilelliLab/atome-lm",
|
| 78 |
+
"project_home": "https://atomelm.com"
|
| 79 |
+
}
|
vanilla_1m_v1.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8c2f4308185c91c5c493d61a7ac5aa3d1c44cfb3baaa205ce7275ce74ee4494d
|
| 3 |
+
size 3812805
|
vanilla_1m_v1.train.json
ADDED
|
@@ -0,0 +1,230 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"params": 950608,
|
| 3 |
+
"args": {
|
| 4 |
+
"data": "data/tinystories_full.txt",
|
| 5 |
+
"output": "checkpoints/vanilla_1m_v1.pt",
|
| 6 |
+
"steps": 30000,
|
| 7 |
+
"seq_len": 256,
|
| 8 |
+
"batch_size": 64,
|
| 9 |
+
"accum_steps": 4,
|
| 10 |
+
"lr": 0.0003,
|
| 11 |
+
"min_lr": 3e-05,
|
| 12 |
+
"warmup": 1000,
|
| 13 |
+
"weight_decay": 0.1,
|
| 14 |
+
"d_model": 152,
|
| 15 |
+
"n_layers": 3,
|
| 16 |
+
"n_heads": 4,
|
| 17 |
+
"d_ff": 608,
|
| 18 |
+
"max_seq": 256,
|
| 19 |
+
"bf16": true,
|
| 20 |
+
"eval_every": 1000,
|
| 21 |
+
"seed": 0
|
| 22 |
+
},
|
| 23 |
+
"log": [
|
| 24 |
+
{
|
| 25 |
+
"step": 1000,
|
| 26 |
+
"train_loss": 2.0875988006591797,
|
| 27 |
+
"val_loss": 2.0943055227398872,
|
| 28 |
+
"val_ppl": 8.119799995221573,
|
| 29 |
+
"lr": 0.0003
|
| 30 |
+
},
|
| 31 |
+
{
|
| 32 |
+
"step": 2000,
|
| 33 |
+
"train_loss": 1.5252898037433624,
|
| 34 |
+
"val_loss": 1.5066693723201752,
|
| 35 |
+
"val_ppl": 4.511679019275092,
|
| 36 |
+
"lr": 0.0002992086242158385
|
| 37 |
+
},
|
| 38 |
+
{
|
| 39 |
+
"step": 3000,
|
| 40 |
+
"train_loss": 1.3099323511123657,
|
| 41 |
+
"val_loss": 1.3194083347916603,
|
| 42 |
+
"val_ppl": 3.7412071801680677,
|
| 43 |
+
"lr": 0.00029684377502086165
|
| 44 |
+
},
|
| 45 |
+
{
|
| 46 |
+
"step": 4000,
|
| 47 |
+
"train_loss": 1.2161387205123901,
|
| 48 |
+
"val_loss": 1.2286550998687744,
|
| 49 |
+
"val_ppl": 3.4166314169360987,
|
| 50 |
+
"lr": 0.0002929331781096783
|
| 51 |
+
},
|
| 52 |
+
{
|
| 53 |
+
"step": 5000,
|
| 54 |
+
"train_loss": 1.1787906289100647,
|
| 55 |
+
"val_loss": 1.1772918552160263,
|
| 56 |
+
"val_ppl": 3.2455728094700103,
|
| 57 |
+
"lr": 0.00028752268165557917
|
| 58 |
+
},
|
| 59 |
+
{
|
| 60 |
+
"step": 6000,
|
| 61 |
+
"train_loss": 1.1403338611125946,
|
| 62 |
+
"val_loss": 1.1352313607931137,
|
| 63 |
+
"val_ppl": 3.1118934297571132,
|
| 64 |
+
"lr": 0.0002806757187826245
|
| 65 |
+
},
|
| 66 |
+
{
|
| 67 |
+
"step": 7000,
|
| 68 |
+
"train_loss": 1.1162661612033844,
|
| 69 |
+
"val_loss": 1.1075621414929628,
|
| 70 |
+
"val_ppl": 3.0269700675173796,
|
| 71 |
+
"lr": 0.00027247256387026185
|
| 72 |
+
},
|
| 73 |
+
{
|
| 74 |
+
"step": 8000,
|
| 75 |
+
"train_loss": 1.0829694867134094,
|
| 76 |
+
"val_loss": 1.0843632984906435,
|
| 77 |
+
"val_ppl": 2.9575561386746556,
|
| 78 |
+
"lr": 0.0002630093914096226
|
| 79 |
+
},
|
| 80 |
+
{
|
| 81 |
+
"step": 9000,
|
| 82 |
+
"train_loss": 1.0747118294239044,
|
| 83 |
+
"val_loss": 1.0635895021259785,
|
| 84 |
+
"val_ppl": 2.8967502410992467,
|
| 85 |
+
"lr": 0.0002523971484455467
|
| 86 |
+
},
|
| 87 |
+
{
|
| 88 |
+
"step": 10000,
|
| 89 |
+
"train_loss": 1.0519791841506958,
|
| 90 |
+
"val_loss": 1.0476661436259747,
|
| 91 |
+
"val_ppl": 2.85098954738486,
|
| 92 |
+
"lr": 0.0002407602538239216
|
| 93 |
+
},
|
| 94 |
+
{
|
| 95 |
+
"step": 11000,
|
| 96 |
+
"train_loss": 1.0250678956508636,
|
| 97 |
+
"val_loss": 1.0324134565889835,
|
| 98 |
+
"val_ppl": 2.807834249846705,
|
| 99 |
+
"lr": 0.00022823513949447164
|
| 100 |
+
},
|
| 101 |
+
{
|
| 102 |
+
"step": 12000,
|
| 103 |
+
"train_loss": 1.0199836790561676,
|
| 104 |
+
"val_loss": 1.023882026784122,
|
| 105 |
+
"val_ppl": 2.783981303587245,
|
| 106 |
+
"lr": 0.00021496865097088842
|
| 107 |
+
},
|
| 108 |
+
{
|
| 109 |
+
"step": 13000,
|
| 110 |
+
"train_loss": 1.0101815909147263,
|
| 111 |
+
"val_loss": 1.0102009763941169,
|
| 112 |
+
"val_ppl": 2.7461528714618,
|
| 113 |
+
"lr": 0.0002011163257014448
|
| 114 |
+
},
|
| 115 |
+
{
|
| 116 |
+
"step": 14000,
|
| 117 |
+
"train_loss": 1.0113594383001328,
|
| 118 |
+
"val_loss": 1.0001213569194078,
|
| 119 |
+
"val_ppl": 2.7186117307853896,
|
| 120 |
+
"lr": 0.00018684056953462323
|
| 121 |
+
},
|
| 122 |
+
{
|
| 123 |
+
"step": 15000,
|
| 124 |
+
"train_loss": 0.98267862200737,
|
| 125 |
+
"val_loss": 0.9921664940193295,
|
| 126 |
+
"val_ppl": 2.697071336220516,
|
| 127 |
+
"lr": 0.00017230875265903135
|
| 128 |
+
},
|
| 129 |
+
{
|
| 130 |
+
"step": 16000,
|
| 131 |
+
"train_loss": 0.995794028043747,
|
| 132 |
+
"val_loss": 0.9845060091465712,
|
| 133 |
+
"val_ppl": 2.6764893965183,
|
| 134 |
+
"lr": 0.00015769124734096862
|
| 135 |
+
},
|
| 136 |
+
{
|
| 137 |
+
"step": 17000,
|
| 138 |
+
"train_loss": 0.962462991476059,
|
| 139 |
+
"val_loss": 0.9766457295045257,
|
| 140 |
+
"val_ppl": 2.655533907298061,
|
| 141 |
+
"lr": 0.00014315943046537674
|
| 142 |
+
},
|
| 143 |
+
{
|
| 144 |
+
"step": 18000,
|
| 145 |
+
"train_loss": 0.9672404527664185,
|
| 146 |
+
"val_loss": 0.9714991142973304,
|
| 147 |
+
"val_ppl": 2.6419020052744058,
|
| 148 |
+
"lr": 0.0001288836742985552
|
| 149 |
+
},
|
| 150 |
+
{
|
| 151 |
+
"step": 19000,
|
| 152 |
+
"train_loss": 0.9653829336166382,
|
| 153 |
+
"val_loss": 0.9648234033957124,
|
| 154 |
+
"val_ppl": 2.624324168813844,
|
| 155 |
+
"lr": 0.00011503134902911152
|
| 156 |
+
},
|
| 157 |
+
{
|
| 158 |
+
"step": 20000,
|
| 159 |
+
"train_loss": 0.9600358754396439,
|
| 160 |
+
"val_loss": 0.959049197845161,
|
| 161 |
+
"val_ppl": 2.6092144469334535,
|
| 162 |
+
"lr": 0.00010176486050552833
|
| 163 |
+
},
|
| 164 |
+
{
|
| 165 |
+
"step": 21000,
|
| 166 |
+
"train_loss": 0.9566726982593536,
|
| 167 |
+
"val_loss": 0.9548654137179255,
|
| 168 |
+
"val_ppl": 2.598320861041842,
|
| 169 |
+
"lr": 8.923974617607838e-05
|
| 170 |
+
},
|
| 171 |
+
{
|
| 172 |
+
"step": 22000,
|
| 173 |
+
"train_loss": 0.9502571374177933,
|
| 174 |
+
"val_loss": 0.9499085610732436,
|
| 175 |
+
"val_ppl": 2.5854732356090246,
|
| 176 |
+
"lr": 7.760285155445327e-05
|
| 177 |
+
},
|
| 178 |
+
{
|
| 179 |
+
"step": 23000,
|
| 180 |
+
"train_loss": 0.9525800943374634,
|
| 181 |
+
"val_loss": 0.9469442367553711,
|
| 182 |
+
"val_ppl": 2.5778204027666733,
|
| 183 |
+
"lr": 6.699060859037736e-05
|
| 184 |
+
},
|
| 185 |
+
{
|
| 186 |
+
"step": 24000,
|
| 187 |
+
"train_loss": 0.9471650272607803,
|
| 188 |
+
"val_loss": 0.9441628893837333,
|
| 189 |
+
"val_ppl": 2.57066055039882,
|
| 190 |
+
"lr": 5.7527436129738084e-05
|
| 191 |
+
},
|
| 192 |
+
{
|
| 193 |
+
"step": 25000,
|
| 194 |
+
"train_loss": 0.9476055055856705,
|
| 195 |
+
"val_loss": 0.9407382626086473,
|
| 196 |
+
"val_ppl": 2.561872054696453,
|
| 197 |
+
"lr": 4.9324281217375474e-05
|
| 198 |
+
},
|
| 199 |
+
{
|
| 200 |
+
"step": 26000,
|
| 201 |
+
"train_loss": 0.9304470866918564,
|
| 202 |
+
"val_loss": 0.9391492558643222,
|
| 203 |
+
"val_ppl": 2.5578044553007495,
|
| 204 |
+
"lr": 4.247731834442082e-05
|
| 205 |
+
},
|
| 206 |
+
{
|
| 207 |
+
"step": 27000,
|
| 208 |
+
"train_loss": 0.9319835901260376,
|
| 209 |
+
"val_loss": 0.936947762966156,
|
| 210 |
+
"val_ppl": 2.5521796607019356,
|
| 211 |
+
"lr": 3.7066821890321684e-05
|
| 212 |
+
},
|
| 213 |
+
{
|
| 214 |
+
"step": 28000,
|
| 215 |
+
"train_loss": 0.933847963809967,
|
| 216 |
+
"val_loss": 0.9346829485148191,
|
| 217 |
+
"val_ppl": 2.5464059879406724,
|
| 218 |
+
"lr": 3.31562249791383e-05
|
| 219 |
+
},
|
| 220 |
+
{
|
| 221 |
+
"step": 29000,
|
| 222 |
+
"train_loss": 0.936771810054779,
|
| 223 |
+
"val_loss": 0.9336990155279636,
|
| 224 |
+
"val_ppl": 2.5439017273055704,
|
| 225 |
+
"lr": 3.0791375784161455e-05
|
| 226 |
+
}
|
| 227 |
+
],
|
| 228 |
+
"final_val": 0.9317306941375136,
|
| 229 |
+
"best_val": 0.9336990155279636
|
| 230 |
+
}
|