Cukinator
/

cpu1-ablations-final

Text Generation

Model card Files Files and versions

Cukinator commited on May 14

Commit

6214a14

·

verified ·

1 Parent(s): 329a61f

Add README

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# CPU-1 Ablation Study — Ready-to-use Checkpoints
+Repo: `Cukinator/cpu1-ablations-final`
+Source (compact 2-bit): `Cukinator/cpu1-ablation-checkpoints`
+Each checkpoint is a standard PyTorch `.pt` file with float32 weights — no
+unpacking needed. Compatible with `train_ablation.py` from the
+[1.58bits repo](https://github.com/Cukinator/1.58bits).
+## Quick start
+```python
+import torch, sys
+sys.path.insert(0, "/path/to/1.58bits")
+from train_ablation import build_ablation_model, generate
+ckpt = torch.load("run_02/model.pt", map_location="cpu")
+model = build_ablation_model(ckpt["config"])
+model.load_state_dict(ckpt["state_dict"])
+model.eval()
+text = generate(model, "The quick brown fox", 128, ckpt["config"], torch.device("cpu"))
+print(text)
+```
+## Ablation chain
+| Run | Architecture | Val Loss | Perplexity |
+|-----|-------------|----------|-----------|
+| run_01 | Transformer + BPE + FP16 (baseline) | 4.66 | 106.1 |
+| run_02a | Transformer + Byte + 4 heads (no LBD) | 2.31 | 10.1 |
+| run_02 | Transformer + Byte + LocalByteDecoder | 1.72 | 5.56 |
+| run_03 | MLGRU + Byte + FP16 | 1.87 | 6.49 |
+| run_04 | MLGRU + Byte + Ternary | 5.57 | 261.7 |
+| run_05 | + FPResidual | 5.55 | 257.7 |
+| run_05b | MLGRU kernel strict | 5.59 | 268.8 |
+| run_06 | + BolmoPatchEmbedding | 5.56 | 258.8 |
+| run_07 | + DeleteGate (CPU-1 complete) | 5.56 | 258.8 |
+| run_10 | + learned per-channel decay | 5.53 | 253.1 |
+| run_13 | Small 10M BPE model | 30.5 | — |
+> **Note**: ternary runs (04–10) were trained with only 2 tokens/param
+> (ablation budget). High perplexity reflects under-training, not architecture
+> failure. FP16 runs (01–03) are valid references.