Cukinator commited on
Commit
6214a14
·
verified ·
1 Parent(s): 329a61f

Add README

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CPU-1 Ablation Study — Ready-to-use Checkpoints
2
+
3
+ Repo: `Cukinator/cpu1-ablations-final`
4
+ Source (compact 2-bit): `Cukinator/cpu1-ablation-checkpoints`
5
+
6
+ Each checkpoint is a standard PyTorch `.pt` file with float32 weights — no
7
+ unpacking needed. Compatible with `train_ablation.py` from the
8
+ [1.58bits repo](https://github.com/Cukinator/1.58bits).
9
+
10
+ ## Quick start
11
+
12
+ ```python
13
+ import torch, sys
14
+ sys.path.insert(0, "/path/to/1.58bits")
15
+ from train_ablation import build_ablation_model, generate
16
+
17
+ ckpt = torch.load("run_02/model.pt", map_location="cpu")
18
+ model = build_ablation_model(ckpt["config"])
19
+ model.load_state_dict(ckpt["state_dict"])
20
+ model.eval()
21
+
22
+ text = generate(model, "The quick brown fox", 128, ckpt["config"], torch.device("cpu"))
23
+ print(text)
24
+ ```
25
+
26
+ ## Ablation chain
27
+
28
+ | Run | Architecture | Val Loss | Perplexity |
29
+ |-----|-------------|----------|-----------|
30
+ | run_01 | Transformer + BPE + FP16 (baseline) | 4.66 | 106.1 |
31
+ | run_02a | Transformer + Byte + 4 heads (no LBD) | 2.31 | 10.1 |
32
+ | run_02 | Transformer + Byte + LocalByteDecoder | 1.72 | 5.56 |
33
+ | run_03 | MLGRU + Byte + FP16 | 1.87 | 6.49 |
34
+ | run_04 | MLGRU + Byte + Ternary | 5.57 | 261.7 |
35
+ | run_05 | + FPResidual | 5.55 | 257.7 |
36
+ | run_05b | MLGRU kernel strict | 5.59 | 268.8 |
37
+ | run_06 | + BolmoPatchEmbedding | 5.56 | 258.8 |
38
+ | run_07 | + DeleteGate (CPU-1 complete) | 5.56 | 258.8 |
39
+ | run_10 | + learned per-channel decay | 5.53 | 253.1 |
40
+ | run_13 | Small 10M BPE model | 30.5 | — |
41
+
42
+ > **Note**: ternary runs (04–10) were trained with only 2 tokens/param
43
+ > (ablation budget). High perplexity reflects under-training, not architecture
44
+ > failure. FP16 runs (01–03) are valid references.