Cukinator commited on
Commit
b93a0df
·
verified ·
1 Parent(s): 1ceac71

Add README

Browse files
Files changed (1) hide show
  1. README.md +35 -15
README.md CHANGED
@@ -1,12 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # CPU-1 Ablation Study — Ready-to-use Checkpoints
2
 
3
  Repo: `Cukinator/cpu1-ablations-final`
4
  Source (compact 2-bit): `Cukinator/cpu1-ablation-checkpoints`
5
 
6
- Each checkpoint is a standard PyTorch `.pt` file with float32 weights — no
7
- unpacking needed. Compatible with `train_ablation.py` from the
8
  [1.58bits repo](https://github.com/Cukinator/1.58bits).
9
 
 
 
 
10
  ## Quick start
11
 
12
  ```python
@@ -25,19 +45,19 @@ print(text)
25
 
26
  ## Ablation chain
27
 
28
- | Run | Architecture | Val Loss | Perplexity |
29
- |-----|-------------|----------|-----------|
30
- | run_01 | Transformer + BPE + FP16 (baseline) | 4.66 | 106.1 |
31
- | run_02a | Transformer + Byte + 4 heads (no LBD) | 2.31 | 10.1 |
32
- | run_02 | Transformer + Byte + LocalByteDecoder | 1.72 | 5.56 |
33
- | run_03 | MLGRU + Byte + FP16 | 1.87 | 6.49 |
34
- | run_04 | MLGRU + Byte + Ternary | 5.57 | 261.7 |
35
- | run_05 | + FPResidual | 5.55 | 257.7 |
36
- | run_05b | MLGRU kernel strict | 5.59 | 268.8 |
37
- | run_06 | + BolmoPatchEmbedding | 5.56 | 258.8 |
38
- | run_07 | + DeleteGate (CPU-1 complete) | 5.56 | 258.8 |
39
- | run_10 | + learned per-channel decay | 5.53 | 253.1 |
40
- | run_13 | Small 10M BPE model | 30.5 | — |
41
 
42
  > **Note**: ternary runs (04–10) were trained with only 2 tokens/param
43
  > (ablation budget). High perplexity reflects under-training, not architecture
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - pytorch
7
+ - text-generation
8
+ - 1.58bit
9
+ - ternary
10
+ - byte-level
11
+ - mlgru
12
+ - ablation
13
+ library_name: pytorch
14
+ pipeline_tag: text-generation
15
+ model_type: custom
16
+ ---
17
+
18
  # CPU-1 Ablation Study — Ready-to-use Checkpoints
19
 
20
  Repo: `Cukinator/cpu1-ablations-final`
21
  Source (compact 2-bit): `Cukinator/cpu1-ablation-checkpoints`
22
 
23
+ Each checkpoint is a standard PyTorch `.pt` file with **float32 weights**
24
+ no unpacking needed. Compatible with `train_ablation.py` from the
25
  [1.58bits repo](https://github.com/Cukinator/1.58bits).
26
 
27
+ 11 progressive ablation runs, each adding one component vs the previous.
28
+ Most runs are **~50M parameters**; run_13 is a small 10M variant.
29
+
30
  ## Quick start
31
 
32
  ```python
 
45
 
46
  ## Ablation chain
47
 
48
+ | Run | Architecture | Params | Val Loss | Perplexity |
49
+ |-----|-------------|--------|----------|-----------|
50
+ | run_01 | Transformer + BPE + FP16 (baseline) | 54.7M | 4.66 | 106.1 |
51
+ | run_02a | Transformer + Byte + 4 heads (no LBD) | 38.5M | 2.31 | 10.1 |
52
+ | run_02 | Transformer + Byte + LocalByteDecoder | 38.8M | 1.72 | 5.56 |
53
+ | run_03 | MLGRU + Byte + FP16 | 38.8M | 1.87 | 6.49 |
54
+ | run_04 | MLGRU + Byte + Ternary | 38.9M | 5.57 | 261.7 |
55
+ | run_05 | + FPResidual | 39.0M | 5.55 | 257.7 |
56
+ | run_05b | MLGRU kernel strict | 35.8M | 5.59 | 268.8 |
57
+ | run_06 | + BolmoPatchEmbedding | 39.0M | 5.56 | 258.8 |
58
+ | run_07 | + DeleteGate (CPU-1 complete) | 39.0M | 5.56 | 258.8 |
59
+ | run_10 | + learned per-channel decay | 39.4M | 5.53 | 253.1 |
60
+ | run_13 | Small 10M BPE model | 12.5M | 30.5 | — |
61
 
62
  > **Note**: ternary runs (04–10) were trained with only 2 tokens/param
63
  > (ablation budget). High perplexity reflects under-training, not architecture