Cukinator commited on
Commit
c68536f
·
verified ·
1 Parent(s): b13bd07

Add README: describe compact_2bit source repo + audit summary

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - pytorch
7
+ - text-generation
8
+ - 1.58bit
9
+ - ternary
10
+ - byte-level
11
+ - mlgru
12
+ - ablation
13
+ - checkpoints
14
+ library_name: pytorch
15
+ pipeline_tag: text-generation
16
+ model_type: custom
17
+ ---
18
+
19
+ # CPU-1 Ablation Study — Source Checkpoints (compact 2-bit)
20
+
21
+ Repo: `Cukinator/cpu1-ablation-checkpoints`
22
+ Unpacked: [`Cukinator/cpu1-ablations-final`](https://huggingface.co/Cukinator/cpu1-ablations-final)
23
+ Code: [github.com/Cukinator/1.58bits](https://github.com/Cukinator/1.58bits)
24
+ Dataset: [`Cukinator/cpu1-ablation-dataset`](https://huggingface.co/datasets/Cukinator/cpu1-ablation-dataset)
25
+
26
+ This repository stores the **raw training checkpoints** produced by
27
+ `train_ablation.py` from the [1.58bits repo](https://github.com/Cukinator/1.58bits).
28
+ There are two checkpoint flavours, both saved per run inside its own folder:
29
+
30
+ | Filename pattern | Format | Purpose |
31
+ |------------------|--------|---------|
32
+ | `<run>/checkpoint_<run>_final.pt` | `compact_2bit` (2-bit packed ternary + bf16 scales) | Final inference checkpoint — minimal size, ~9 MB for a 39M ternary model |
33
+ | `<run>/checkpoint_<run>_step<N>.pt` | bf16 model + bf16 optimizer state | Phase 1 intermediate resume points |
34
+ | `<run>/checkpoint_<run>_phase2_step<N>.pt` | bf16 model + bf16 optimizer state | Phase 2 intermediate resume points (delete-gate runs only) |
35
+
36
+ > If you just want **ready-to-use float32 weights**, use the unpacked mirror
37
+ > at [`Cukinator/cpu1-ablations-final`](https://huggingface.co/Cukinator/cpu1-ablations-final) — those are plain
38
+ > `.pt` files you can load with `torch.load(...)` and `model.load_state_dict(...)`
39
+ > without any unpacking step.
40
+
41
+ This source repo exists so that (a) training jobs can resume from the latest
42
+ step checkpoint after preemption, and (b) the compact_2bit format itself
43
+ can be inspected and benchmarked.
44
+
45
+ ## Repository contents
46
+
47
+ 22 trained runs, organised in three rounds:
48
+
49
+ | Round | Tokens/param | Runs |
50
+ |-------|:------------:|------|
51
+ | **r1** — original ablation budget | 2 | `run_01`, `run_02`, `run_02a_byte_only_heads`, `run_03`, `run_04`, `run_05`, `run_05b_kernel_strict`, `run_06`, `run_07`, `run_08`, `run_09`, `run_10`, `run_13`, `run_14`, `run_15`, `run_16` |
52
+ | **r2** — re-run at higher budget | 15 | `run_04_r2`, `run_07_r2` (partial), `run_13_r2`, `run_14_r2`, `run_15_r2`, `run_16_r2` |
53
+ | **r3** — cold-start rescue (queued) | 50 | `run_04_r3`, `run_07_r3`, `run_14_r3`, `run_15_r3` *(not yet uploaded)* |
54
+
55
+ The naming and architecture of each run is defined in `RUN_CONFIGS` / `SMALL_RUN_CONFIGS`
56
+ in [`train_ablation.py`](https://github.com/Cukinator/1.58bits/blob/main/train_ablation.py).
57
+
58
+ ## Quick start (compact_2bit)
59
+
60
+ Loading a compact_2bit checkpoint requires the unpacking helper that
61
+ ships with the training code:
62
+
63
+ ```python
64
+ import sys
65
+ sys.path.insert(0, "/path/to/1.58bits")
66
+ from train_ablation import load_ablation_checkpoint, build_ablation_model, generate
67
+ import torch
68
+
69
+ state, config = load_ablation_checkpoint(
70
+ "run_02/checkpoint_run_02_final.pt"
71
+ )
72
+ model = build_ablation_model(config)
73
+ model.load_state_dict(state, strict=False)
74
+ model.eval()
75
+
76
+ print(generate(model, "The quick brown fox", 128, config, torch.device("cpu")))
77
+ ```
78
+
79
+ For the same checkpoint **without** an external dependency, use
80
+ [`Cukinator/cpu1-ablations-final`](https://huggingface.co/Cukinator/cpu1-ablations-final).
81
+
82
+ ## Final-checkpoint sizes (compact_2bit)
83
+
84
+ Sizes are measured from the actual `_final.pt` files on disk.
85
+
86
+ | Run family | Architecture | d_model | Final size |
87
+ |------------|-------------|--------:|-----------:|
88
+ | `run_01` | Transformer + BPE (16K vocab) + FP16 | 512 | ~210 MB |
89
+ | `run_02`, `run_02a`, `run_03` | FP16 byte-level baselines | 512 | ~75 MB |
90
+ | `run_04`..`run_10` | 39M ternary chain | 512 | ~9 MB |
91
+ | `run_05b_kernel_strict` | MLGRU without W_o | 512 | ~8 MB |
92
+ | `run_13` | 10M BPE + ternary (4K vocab) | 320 | ~5 MB |
93
+ | `run_14`, `run_15`, `run_16` | 10M byte + ternary variants | 320 | ~3 MB |
94
+
95
+ ## Training results
96
+
97
+ The full table of `val_loss`, `perplexity`, throughput and architecture per
98
+ run is published in the
99
+ [README of the unpacked mirror](https://huggingface.co/Cukinator/cpu1-ablations-final).
100
+
101
+ A summary of the 2026-05 audit:
102
+
103
+ - **FP16 baselines** (`run_01`, `run_02`, `run_02a`, `run_03`) converge as
104
+ designed: byte + LocalByteDecoder reaches val_loss 1.72, MLGRU FP16 reaches 1.87.
105
+ - **All byte-level ternary runs collapse to `ln(256) ≈ 5.545 nats`** — the
106
+ uniform-output entropy floor. This holds across both scales (10M and 39M)
107
+ and both token budgets (2 tok/p and 15 tok/p).
108
+ - A 7.5× increase in tokens-per-parameter (r2) moved the validation loss
109
+ by 0.0001 nats. The cold-start dynamics of straight-through-estimator
110
+ ternary training, not the budget, are the bottleneck at this scale.
111
+ - An r3 set with four corrections (bf16 AMP, `lr_scale=2.0` on BitLinear,
112
+ CE-only training signal, 50 tok/param) is queued in `RUN_CONFIGS` but
113
+ has not yet been trained.
114
+
115
+ Details, mechanistic analysis and throughput projections are documented in
116
+ the
117
+ [main repository README](https://github.com/Cukinator/1.58bits/blob/main/README.md#ablation-audit--2026-05-findings).
118
+
119
+ ## License
120
+
121
+ Apache-2.0. Same as the source code at [github.com/Cukinator/1.58bits](https://github.com/Cukinator/1.58bits).