OpenTransformer
/

purebit-transformer

OpenTransformer commited on Jan 18

Commit

bde71d6

verified ·

1 Parent(s): 4a6904f

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md ADDED Viewed

+# PureBit Transformer
+**A transformer that operates on raw binary bits instead of tokens.**
+## Architecture
+- **Vocab size**: 2 (just 0 and 1!)
+- **d_model**: 256
+- **Layers**: 6
+- **Heads**: 8
+- **Parameters**: ~18M
+## Training
+- Trained on raw UTF-8 bytes converted to bits
+- Best loss achieved: **0.6863** (random = 0.693)
+- Training data: ~70MB of text = 560M bits
+## Key Insight
+This explores whether transformers can learn at the bit level. Results show minimal learning beyond random - predicting individual bits is extremely hard without byte-level structure.
+## Usage
+```python
+import torch
+# Load checkpoint
+ckpt = torch.load('purebit_best_70mb.pt')
+print(f"Loss: {ckpt['loss']:.4f}")
+print(f"Bits seen: {ckpt['bits']:,}")
+# Model architecture in model.py
+```
+## Files
+- `purebit_best_70mb.pt` - Best checkpoint (loss 0.6863)
+- `model.py` - Model architecture
+- `train.py` - Training script
+- `infer.py` - Inference script
+## Author
+OpenTransformers - Experimental architecture research