TilelliLab commited on
Commit
9e3a160
·
verified ·
1 Parent(s): f9f251d

Atome LM v0.3.0 — checkpoints + honest model card

Browse files
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: pytorch
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - ternary
7
+ - bitnet
8
+ - microcontroller
9
+ - edge-ai
10
+ - tinyml
11
+ - byte-level
12
+ - language-model
13
+ - routed-architecture
14
+ ---
15
+
16
+ # Atome LM
17
+
18
+ A reference implementation of a **routed-ternary tiny language model** with a bit-exact
19
+ Python ↔ C99 inference engine, sized for **microcontroller-class RAM budgets**.
20
+
21
+ The contribution is **integration, not a new architecture**: a complete
22
+ train → ternary export → base-3 packing → C99 inference path, with bit-exact Python ↔ C
23
+ parity enforced by tests. It combines three known ideas — ternary weights
24
+ ([BitNet b1.58](https://arxiv.org/abs/2402.17764)), a per-token-routed 3-pathway block
25
+ ([Hymba](https://arxiv.org/abs/2411.13676), [MossNet](https://arxiv.org/abs/2510.26182)),
26
+ and a byte tokenizer at super-tiny scale ([Guertler 2024](https://arxiv.org/abs/2405.14159)).
27
+
28
+ - **Code:** https://github.com/TilelliLab/atome-lm
29
+ - **Project home / live in-browser demo:** https://atomelm.com
30
+ - **License:** Apache-2.0 (code, weights, everything)
31
+
32
+ > ⚠️ This is a **research artifact, not a product or a general chatbot.** Read the
33
+ > "Honest results" section below before citing any number. The honesty dossier lives in
34
+ > [`HONEST_RESULTS.md`](https://github.com/TilelliLab/atome-lm/blob/main/HONEST_RESULTS.md)
35
+ > in the source repo.
36
+
37
+ ## Files in this repo
38
+
39
+ | File | What it is |
40
+ |---|---|
41
+ | `atome_944k.bin` (272 KB) | Packed `ATOME01` C-engine blob, ternary, loadable directly by the Atome C99 engine |
42
+ | `atome_1m_v1.pt` (3.7 MB) | PyTorch source checkpoint (944,640 params) that produced the blob; use to fine-tune or re-export |
43
+ | `vanilla_1m_v1.pt` (3.7 MB) | FP32 vanilla-GPT baseline (950,608 params) — shipped so you can reproduce the 944K reversal A/B |
44
+ | `*.train.json` | Every-1000-step training logs for both checkpoints (every reported number is auditable) |
45
+ | `config.json` | Architecture hyperparameters + provenance for all three checkpoints |
46
+ | `SHA256SUMS` | Checksums for the three weight files |
47
+
48
+ ## Honest results — read this before citing anything
49
+
50
+ All numbers are **single-seed**, from the training logs shipped alongside.
51
+
52
+ | Regime | Atome ternary | Vanilla FP32 (param-fair) | Verdict |
53
+ |---|---|---|---|
54
+ | **60K (MCU target)** | 6.31 ppl | 8.12 ppl | **Atome wins −22% ppl** (−52% at flash-fair budget) |
55
+ | **944K (these checkpoints)** | val 1.0545 / 2.87 ppl | val 0.9337 / 2.54 ppl | **Vanilla wins by ~11%** |
56
+
57
+ **The 944K result reverses.** At 944K parameters the FP32 vanilla baseline *beats* Atome by
58
+ ~11% in val loss and perplexity, same recipe / same val slice / same seed. Atome's bet is the
59
+ **sub-1M, MCU-class regime**: the 3-pathway inductive bias substitutes for capacity at small
60
+ scale and *constrains* it above ~1M. This is the most important honest finding in the kit —
61
+ it is **not** "tiny ternary beats everything."
62
+
63
+ The bundled 944K checkpoint is here to make the architecture **runnable**, not to set a
64
+ quality bar. It is narrow, single-corpus (TinyStories), and sometimes incoherent.
65
+
66
+ ### What is NOT measured / NOT claimed
67
+ - **Single seed only.** No multi-seed variance yet.
68
+ - **MCU parity is QEMU only** (ARM Cortex-M3, MPS2-AN385), to FP32 epsilon. **No silicon
69
+ bring-up** is done in this repository. The RP2040 demo exceeds 264 KB SRAM at 944K — the
70
+ MCU claim is regime-dependent (it holds at the ~60K engine-default config, not at 944K).
71
+ - **Router-entropy** is exposed for free as a per-token uncertainty signal, but its
72
+ **calibration is unmeasured at this scale**.
73
+
74
+ ## Usage
75
+
76
+ This is a **custom architecture**, not a `transformers` AutoModel. Get the code from the
77
+ source repo, then load the PyTorch checkpoint:
78
+
79
+ ```bash
80
+ git clone https://github.com/TilelliLab/atome-lm
81
+ cd atome-lm && pip install -e . # Python >=3.10, PyTorch >=2.0
82
+ ```
83
+
84
+ ```python
85
+ import torch
86
+ from atome_llm.core.atome_lm import AtomeLM
87
+
88
+ ckpt = torch.load("atome_1m_v1.pt", map_location="cpu", weights_only=False)
89
+ model = AtomeLM(**ckpt["config"]) # vocab=256, d_model=256, n_layers=8, d_head=64, top_k=4
90
+ model.load_state_dict(ckpt["state_dict"])
91
+ model.eval()
92
+
93
+ ids = torch.randint(0, 256, (1, 32)) # byte-level: ids are raw bytes 0-255
94
+ logits = model(ids) # (1, 32, 256)
95
+ ent_per_layer = model.router_entropies(ids) # free per-token uncertainty signal
96
+ ```
97
+
98
+ For microcontroller deployment, load `atome_944k.bin` directly with the Atome C99 engine
99
+ (`atome_load(...)`) shipped in the source repo's `c_engine/`.
100
+
101
+ ## Citation
102
+
103
+ ```bibtex
104
+ @software{atome_llm_2026,
105
+ title = {Atome LM: a tiny ternary language model for microcontroller deployment},
106
+ author = {Atome LM contributors},
107
+ year = {2026},
108
+ note = {Apache 2.0, https://atomelm.com},
109
+ url = {https://github.com/TilelliLab/atome-lm}
110
+ }
111
+ ```
SHA256SUMS ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ fdf8a6b69eacc5e4834e488759593198e482399887fce2c5b048a599844ae2f5 atome_944k.bin
2
+ 0bba4c123a9026bffb36f05acc9a7f9e68dcac95b01321d151d32d8320b660c8 atome_1m_v1.pt
3
+ 8c2f4308185c91c5c493d61a7ac5aa3d1c44cfb3baaa205ce7275ce74ee4494d vanilla_1m_v1.pt
atome_1m_v1.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bba4c123a9026bffb36f05acc9a7f9e68dcac95b01321d151d32d8320b660c8
3
+ size 3808762
atome_1m_v1.train.json ADDED
@@ -0,0 +1,229 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "params": 944640,
3
+ "args": {
4
+ "data": "data/tinystories_full.txt",
5
+ "output": "checkpoints/atome_1m_v1.pt",
6
+ "steps": 30000,
7
+ "seq_len": 256,
8
+ "batch_size": 64,
9
+ "accum_steps": 4,
10
+ "lr": 0.0003,
11
+ "min_lr": 3e-05,
12
+ "warmup": 1000,
13
+ "weight_decay": 0.1,
14
+ "d_model": 256,
15
+ "n_layers": 8,
16
+ "d_head": 64,
17
+ "top_k": 4,
18
+ "bf16": true,
19
+ "eval_every": 1000,
20
+ "seed": 0
21
+ },
22
+ "log": [
23
+ {
24
+ "step": 1000,
25
+ "train_loss": 1.689065933227539,
26
+ "val_loss": 1.6851140782237053,
27
+ "val_ppl": 5.3930661286628725,
28
+ "lr": 0.0003
29
+ },
30
+ {
31
+ "step": 2000,
32
+ "train_loss": 1.475701928138733,
33
+ "val_loss": 1.4368714336305857,
34
+ "val_ppl": 4.207511724416042,
35
+ "lr": 0.0002992086242158385
36
+ },
37
+ {
38
+ "step": 3000,
39
+ "train_loss": 1.3402614891529083,
40
+ "val_loss": 1.355498529970646,
41
+ "val_ppl": 3.8786941199889884,
42
+ "lr": 0.00029684377502086165
43
+ },
44
+ {
45
+ "step": 4000,
46
+ "train_loss": 1.2906470894813538,
47
+ "val_loss": 1.298057682812214,
48
+ "val_ppl": 3.662176646542712,
49
+ "lr": 0.0002929331781096783
50
+ },
51
+ {
52
+ "step": 5000,
53
+ "train_loss": 1.2640663385391235,
54
+ "val_loss": 1.2564894184470177,
55
+ "val_ppl": 3.513066906295889,
56
+ "lr": 0.00028752268165557917
57
+ },
58
+ {
59
+ "step": 6000,
60
+ "train_loss": 1.205640196800232,
61
+ "val_loss": 1.2161348164081573,
62
+ "val_ppl": 3.374120900293555,
63
+ "lr": 0.0002806757187826245
64
+ },
65
+ {
66
+ "step": 7000,
67
+ "train_loss": 1.1917544305324554,
68
+ "val_loss": 1.1835042145103216,
69
+ "val_ppl": 3.2657982326287116,
70
+ "lr": 0.00027247256387026185
71
+ },
72
+ {
73
+ "step": 8000,
74
+ "train_loss": 1.1544596254825592,
75
+ "val_loss": 1.1677243299782276,
76
+ "val_ppl": 3.2146687829705525,
77
+ "lr": 0.0002630093914096226
78
+ },
79
+ {
80
+ "step": 9000,
81
+ "train_loss": 1.1510637402534485,
82
+ "val_loss": 1.1527819111943245,
83
+ "val_ppl": 3.166990953913901,
84
+ "lr": 0.0002523971484455467
85
+ },
86
+ {
87
+ "step": 10000,
88
+ "train_loss": 1.140123575925827,
89
+ "val_loss": 1.1461433116346598,
90
+ "val_ppl": 3.146036201225796,
91
+ "lr": 0.0002407602538239216
92
+ },
93
+ {
94
+ "step": 11000,
95
+ "train_loss": 1.1275735795497894,
96
+ "val_loss": 1.131921675056219,
97
+ "val_ppl": 3.1016110655411038,
98
+ "lr": 0.00022823513949447164
99
+ },
100
+ {
101
+ "step": 12000,
102
+ "train_loss": 1.1099890172481537,
103
+ "val_loss": 1.112453417852521,
104
+ "val_ppl": 3.041812083259338,
105
+ "lr": 0.00021496865097088842
106
+ },
107
+ {
108
+ "step": 13000,
109
+ "train_loss": 1.1127586960792542,
110
+ "val_loss": 1.112892348319292,
111
+ "val_ppl": 3.043147520317438,
112
+ "lr": 0.0002011163257014448
113
+ },
114
+ {
115
+ "step": 14000,
116
+ "train_loss": 1.0873990654945374,
117
+ "val_loss": 1.1024821121245623,
118
+ "val_ppl": 3.0116319626741244,
119
+ "lr": 0.00018684056953462323
120
+ },
121
+ {
122
+ "step": 15000,
123
+ "train_loss": 1.0949949026107788,
124
+ "val_loss": 1.1003286074846983,
125
+ "val_ppl": 3.0051533776041945,
126
+ "lr": 0.00017230875265903135
127
+ },
128
+ {
129
+ "step": 16000,
130
+ "train_loss": 1.092372715473175,
131
+ "val_loss": 1.0886210184544325,
132
+ "val_ppl": 2.9701754301311736,
133
+ "lr": 0.00015769124734096862
134
+ },
135
+ {
136
+ "step": 17000,
137
+ "train_loss": 1.0719301402568817,
138
+ "val_loss": 1.087962357327342,
139
+ "val_ppl": 2.968219735175533,
140
+ "lr": 0.00014315943046537674
141
+ },
142
+ {
143
+ "step": 18000,
144
+ "train_loss": 1.0894330739974976,
145
+ "val_loss": 1.0875801891088486,
146
+ "val_ppl": 2.9670855926576603,
147
+ "lr": 0.0001288836742985552
148
+ },
149
+ {
150
+ "step": 19000,
151
+ "train_loss": 1.0676527321338654,
152
+ "val_loss": 1.0716162715107203,
153
+ "val_ppl": 2.920095354830056,
154
+ "lr": 0.00011503134902911152
155
+ },
156
+ {
157
+ "step": 20000,
158
+ "train_loss": 1.0742259323596954,
159
+ "val_loss": 1.0812196973711252,
160
+ "val_ppl": 2.948273360207015,
161
+ "lr": 0.00010176486050552833
162
+ },
163
+ {
164
+ "step": 21000,
165
+ "train_loss": 1.0726729929447174,
166
+ "val_loss": 1.0718515273183584,
167
+ "val_ppl": 2.9207824050342435,
168
+ "lr": 8.923974617607838e-05
169
+ },
170
+ {
171
+ "step": 22000,
172
+ "train_loss": 1.0701198875904083,
173
+ "val_loss": 1.0739975553005934,
174
+ "val_ppl": 2.927057216357621,
175
+ "lr": 7.760285155445327e-05
176
+ },
177
+ {
178
+ "step": 23000,
179
+ "train_loss": 1.0675779581069946,
180
+ "val_loss": 1.0646078549325466,
181
+ "val_ppl": 2.899701657373658,
182
+ "lr": 6.699060859037736e-05
183
+ },
184
+ {
185
+ "step": 24000,
186
+ "train_loss": 1.0793527662754059,
187
+ "val_loss": 1.0707154776901007,
188
+ "val_ppl": 2.917466135348921,
189
+ "lr": 5.7527436129738084e-05
190
+ },
191
+ {
192
+ "step": 25000,
193
+ "train_loss": 1.0686360597610474,
194
+ "val_loss": 1.067691769450903,
195
+ "val_ppl": 2.9086578924472115,
196
+ "lr": 4.9324281217375474e-05
197
+ },
198
+ {
199
+ "step": 26000,
200
+ "train_loss": 1.079252928495407,
201
+ "val_loss": 1.064154027029872,
202
+ "val_ppl": 2.8983859904178786,
203
+ "lr": 4.247731834442082e-05
204
+ },
205
+ {
206
+ "step": 27000,
207
+ "train_loss": 1.0666958093643188,
208
+ "val_loss": 1.0639245696365833,
209
+ "val_ppl": 2.8977210106189566,
210
+ "lr": 3.7066821890321684e-05
211
+ },
212
+ {
213
+ "step": 28000,
214
+ "train_loss": 1.065284639596939,
215
+ "val_loss": 1.0690924655646086,
216
+ "val_ppl": 2.912734892906038,
217
+ "lr": 3.31562249791383e-05
218
+ },
219
+ {
220
+ "step": 29000,
221
+ "train_loss": 1.06133571267128,
222
+ "val_loss": 1.0545352958142757,
223
+ "val_ppl": 2.8706408450794916,
224
+ "lr": 3.0791375784161455e-05
225
+ }
226
+ ],
227
+ "final_val": 1.0572172198444605,
228
+ "best_val": 1.0545352958142757
229
+ }
atome_944k.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdf8a6b69eacc5e4834e488759593198e482399887fce2c5b048a599844ae2f5
3
+ size 276655
config.json ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "atome-lm",
3
+ "architecture": "routed-ternary-3pathway",
4
+ "_comment": "Atome LM is a custom architecture, NOT a transformers AutoModel. Load with atome_llm.core.atome_lm.AtomeLM from github.com/TilelliLab/atome-lm. This config documents the bundled checkpoints; it is not consumed by transformers.",
5
+
6
+ "checkpoints": {
7
+ "atome_944k.bin": {
8
+ "format": "ATOME01 packed C-engine blob (4 trits/byte)",
9
+ "precision": "ternary {-alpha, 0, +alpha} per tensor (BitNet b1.58 style)",
10
+ "bits_per_weight": 1.58,
11
+ "params": 944640,
12
+ "disk_bytes": 276655,
13
+ "loadable_by": "Atome C99 engine (atome_load)",
14
+ "derived_from": "atome_1m_v1.pt"
15
+ },
16
+ "atome_1m_v1.pt": {
17
+ "format": "PyTorch state_dict",
18
+ "precision": "fp32 source (export to ternary via scripts/export_to_atome.py)",
19
+ "params": 944640,
20
+ "config": {
21
+ "vocab_size": 256,
22
+ "d_model": 256,
23
+ "n_layers": 8,
24
+ "d_head": 64,
25
+ "top_k": 4,
26
+ "kernel_size": 5,
27
+ "n_pathways": 3
28
+ },
29
+ "tokenizer": "byte-level (no vocab file; ids 0-255)",
30
+ "final_val_loss": 1.0545,
31
+ "final_val_ppl": 2.87
32
+ },
33
+ "vanilla_1m_v1.pt": {
34
+ "format": "PyTorch state_dict",
35
+ "precision": "fp32",
36
+ "role": "param-fair vanilla GPT baseline for the 944K reversal A/B in HONEST_RESULTS.md",
37
+ "params": 950608,
38
+ "config": {
39
+ "kind": "vanilla_transformer_fp32",
40
+ "vocab_size": 256,
41
+ "d_model": 152,
42
+ "n_layers": 3,
43
+ "n_heads": 4,
44
+ "d_ff": 608,
45
+ "max_seq": 256
46
+ },
47
+ "final_val_loss": 0.9337,
48
+ "final_val_ppl": 2.54
49
+ }
50
+ },
51
+
52
+ "engine_default_config": {
53
+ "_comment": "The C99 engine compile-time #defines; ~60K params, the MCU target regime (NOT the 944K bundled checkpoint).",
54
+ "vocab_size": 256,
55
+ "d_model": 64,
56
+ "n_layers": 4,
57
+ "d_head": 16,
58
+ "top_k": 4,
59
+ "kernel_size": 5,
60
+ "n_pathways": 3
61
+ },
62
+
63
+ "training": {
64
+ "corpus": "TinyStories (train.txt + valid.txt concatenated)",
65
+ "steps": 30000,
66
+ "seq_len": 256,
67
+ "batch_size": 64,
68
+ "accum_steps": 4,
69
+ "optimizer": "AdamW lr 3e-4->3e-5 cosine, warmup 1000, weight_decay 0.1",
70
+ "precision": "bf16 autocast",
71
+ "seed": 0,
72
+ "seeds_note": "single seed only; multi-seed variance not yet measured"
73
+ },
74
+
75
+ "license": "Apache-2.0",
76
+ "version": "0.3.0",
77
+ "source_repository": "https://github.com/TilelliLab/atome-lm",
78
+ "project_home": "https://atomelm.com"
79
+ }
vanilla_1m_v1.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c2f4308185c91c5c493d61a7ac5aa3d1c44cfb3baaa205ce7275ce74ee4494d
3
+ size 3812805
vanilla_1m_v1.train.json ADDED
@@ -0,0 +1,230 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "params": 950608,
3
+ "args": {
4
+ "data": "data/tinystories_full.txt",
5
+ "output": "checkpoints/vanilla_1m_v1.pt",
6
+ "steps": 30000,
7
+ "seq_len": 256,
8
+ "batch_size": 64,
9
+ "accum_steps": 4,
10
+ "lr": 0.0003,
11
+ "min_lr": 3e-05,
12
+ "warmup": 1000,
13
+ "weight_decay": 0.1,
14
+ "d_model": 152,
15
+ "n_layers": 3,
16
+ "n_heads": 4,
17
+ "d_ff": 608,
18
+ "max_seq": 256,
19
+ "bf16": true,
20
+ "eval_every": 1000,
21
+ "seed": 0
22
+ },
23
+ "log": [
24
+ {
25
+ "step": 1000,
26
+ "train_loss": 2.0875988006591797,
27
+ "val_loss": 2.0943055227398872,
28
+ "val_ppl": 8.119799995221573,
29
+ "lr": 0.0003
30
+ },
31
+ {
32
+ "step": 2000,
33
+ "train_loss": 1.5252898037433624,
34
+ "val_loss": 1.5066693723201752,
35
+ "val_ppl": 4.511679019275092,
36
+ "lr": 0.0002992086242158385
37
+ },
38
+ {
39
+ "step": 3000,
40
+ "train_loss": 1.3099323511123657,
41
+ "val_loss": 1.3194083347916603,
42
+ "val_ppl": 3.7412071801680677,
43
+ "lr": 0.00029684377502086165
44
+ },
45
+ {
46
+ "step": 4000,
47
+ "train_loss": 1.2161387205123901,
48
+ "val_loss": 1.2286550998687744,
49
+ "val_ppl": 3.4166314169360987,
50
+ "lr": 0.0002929331781096783
51
+ },
52
+ {
53
+ "step": 5000,
54
+ "train_loss": 1.1787906289100647,
55
+ "val_loss": 1.1772918552160263,
56
+ "val_ppl": 3.2455728094700103,
57
+ "lr": 0.00028752268165557917
58
+ },
59
+ {
60
+ "step": 6000,
61
+ "train_loss": 1.1403338611125946,
62
+ "val_loss": 1.1352313607931137,
63
+ "val_ppl": 3.1118934297571132,
64
+ "lr": 0.0002806757187826245
65
+ },
66
+ {
67
+ "step": 7000,
68
+ "train_loss": 1.1162661612033844,
69
+ "val_loss": 1.1075621414929628,
70
+ "val_ppl": 3.0269700675173796,
71
+ "lr": 0.00027247256387026185
72
+ },
73
+ {
74
+ "step": 8000,
75
+ "train_loss": 1.0829694867134094,
76
+ "val_loss": 1.0843632984906435,
77
+ "val_ppl": 2.9575561386746556,
78
+ "lr": 0.0002630093914096226
79
+ },
80
+ {
81
+ "step": 9000,
82
+ "train_loss": 1.0747118294239044,
83
+ "val_loss": 1.0635895021259785,
84
+ "val_ppl": 2.8967502410992467,
85
+ "lr": 0.0002523971484455467
86
+ },
87
+ {
88
+ "step": 10000,
89
+ "train_loss": 1.0519791841506958,
90
+ "val_loss": 1.0476661436259747,
91
+ "val_ppl": 2.85098954738486,
92
+ "lr": 0.0002407602538239216
93
+ },
94
+ {
95
+ "step": 11000,
96
+ "train_loss": 1.0250678956508636,
97
+ "val_loss": 1.0324134565889835,
98
+ "val_ppl": 2.807834249846705,
99
+ "lr": 0.00022823513949447164
100
+ },
101
+ {
102
+ "step": 12000,
103
+ "train_loss": 1.0199836790561676,
104
+ "val_loss": 1.023882026784122,
105
+ "val_ppl": 2.783981303587245,
106
+ "lr": 0.00021496865097088842
107
+ },
108
+ {
109
+ "step": 13000,
110
+ "train_loss": 1.0101815909147263,
111
+ "val_loss": 1.0102009763941169,
112
+ "val_ppl": 2.7461528714618,
113
+ "lr": 0.0002011163257014448
114
+ },
115
+ {
116
+ "step": 14000,
117
+ "train_loss": 1.0113594383001328,
118
+ "val_loss": 1.0001213569194078,
119
+ "val_ppl": 2.7186117307853896,
120
+ "lr": 0.00018684056953462323
121
+ },
122
+ {
123
+ "step": 15000,
124
+ "train_loss": 0.98267862200737,
125
+ "val_loss": 0.9921664940193295,
126
+ "val_ppl": 2.697071336220516,
127
+ "lr": 0.00017230875265903135
128
+ },
129
+ {
130
+ "step": 16000,
131
+ "train_loss": 0.995794028043747,
132
+ "val_loss": 0.9845060091465712,
133
+ "val_ppl": 2.6764893965183,
134
+ "lr": 0.00015769124734096862
135
+ },
136
+ {
137
+ "step": 17000,
138
+ "train_loss": 0.962462991476059,
139
+ "val_loss": 0.9766457295045257,
140
+ "val_ppl": 2.655533907298061,
141
+ "lr": 0.00014315943046537674
142
+ },
143
+ {
144
+ "step": 18000,
145
+ "train_loss": 0.9672404527664185,
146
+ "val_loss": 0.9714991142973304,
147
+ "val_ppl": 2.6419020052744058,
148
+ "lr": 0.0001288836742985552
149
+ },
150
+ {
151
+ "step": 19000,
152
+ "train_loss": 0.9653829336166382,
153
+ "val_loss": 0.9648234033957124,
154
+ "val_ppl": 2.624324168813844,
155
+ "lr": 0.00011503134902911152
156
+ },
157
+ {
158
+ "step": 20000,
159
+ "train_loss": 0.9600358754396439,
160
+ "val_loss": 0.959049197845161,
161
+ "val_ppl": 2.6092144469334535,
162
+ "lr": 0.00010176486050552833
163
+ },
164
+ {
165
+ "step": 21000,
166
+ "train_loss": 0.9566726982593536,
167
+ "val_loss": 0.9548654137179255,
168
+ "val_ppl": 2.598320861041842,
169
+ "lr": 8.923974617607838e-05
170
+ },
171
+ {
172
+ "step": 22000,
173
+ "train_loss": 0.9502571374177933,
174
+ "val_loss": 0.9499085610732436,
175
+ "val_ppl": 2.5854732356090246,
176
+ "lr": 7.760285155445327e-05
177
+ },
178
+ {
179
+ "step": 23000,
180
+ "train_loss": 0.9525800943374634,
181
+ "val_loss": 0.9469442367553711,
182
+ "val_ppl": 2.5778204027666733,
183
+ "lr": 6.699060859037736e-05
184
+ },
185
+ {
186
+ "step": 24000,
187
+ "train_loss": 0.9471650272607803,
188
+ "val_loss": 0.9441628893837333,
189
+ "val_ppl": 2.57066055039882,
190
+ "lr": 5.7527436129738084e-05
191
+ },
192
+ {
193
+ "step": 25000,
194
+ "train_loss": 0.9476055055856705,
195
+ "val_loss": 0.9407382626086473,
196
+ "val_ppl": 2.561872054696453,
197
+ "lr": 4.9324281217375474e-05
198
+ },
199
+ {
200
+ "step": 26000,
201
+ "train_loss": 0.9304470866918564,
202
+ "val_loss": 0.9391492558643222,
203
+ "val_ppl": 2.5578044553007495,
204
+ "lr": 4.247731834442082e-05
205
+ },
206
+ {
207
+ "step": 27000,
208
+ "train_loss": 0.9319835901260376,
209
+ "val_loss": 0.936947762966156,
210
+ "val_ppl": 2.5521796607019356,
211
+ "lr": 3.7066821890321684e-05
212
+ },
213
+ {
214
+ "step": 28000,
215
+ "train_loss": 0.933847963809967,
216
+ "val_loss": 0.9346829485148191,
217
+ "val_ppl": 2.5464059879406724,
218
+ "lr": 3.31562249791383e-05
219
+ },
220
+ {
221
+ "step": 29000,
222
+ "train_loss": 0.936771810054779,
223
+ "val_loss": 0.9336990155279636,
224
+ "val_ppl": 2.5439017273055704,
225
+ "lr": 3.0791375784161455e-05
226
+ }
227
+ ],
228
+ "final_val": 0.9317306941375136,
229
+ "best_val": 0.9336990155279636
230
+ }