OpenTransformer commited on
Commit
bde71d6
·
verified ·
1 Parent(s): 4a6904f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PureBit Transformer
2
+
3
+ **A transformer that operates on raw binary bits instead of tokens.**
4
+
5
+ ## Architecture
6
+ - **Vocab size**: 2 (just 0 and 1!)
7
+ - **d_model**: 256
8
+ - **Layers**: 6
9
+ - **Heads**: 8
10
+ - **Parameters**: ~18M
11
+
12
+ ## Training
13
+ - Trained on raw UTF-8 bytes converted to bits
14
+ - Best loss achieved: **0.6863** (random = 0.693)
15
+ - Training data: ~70MB of text = 560M bits
16
+
17
+ ## Key Insight
18
+ This explores whether transformers can learn at the bit level. Results show minimal learning beyond random - predicting individual bits is extremely hard without byte-level structure.
19
+
20
+ ## Usage
21
+ ```python
22
+ import torch
23
+
24
+ # Load checkpoint
25
+ ckpt = torch.load('purebit_best_70mb.pt')
26
+ print(f"Loss: {ckpt['loss']:.4f}")
27
+ print(f"Bits seen: {ckpt['bits']:,}")
28
+
29
+ # Model architecture in model.py
30
+ ```
31
+
32
+ ## Files
33
+ - `purebit_best_70mb.pt` - Best checkpoint (loss 0.6863)
34
+ - `model.py` - Model architecture
35
+ - `train.py` - Training script
36
+ - `infer.py` - Inference script
37
+
38
+ ## Author
39
+ OpenTransformers - Experimental architecture research