OpenTransformer
/

purebit-transformer

Model card Files Files and versions

purebit-transformer / README.md

OpenTransformer's picture

OpenTransformer

Upload README.md with huggingface_hub

bde71d6 verified 12 days ago

|

history blame contribute delete

1.01 kB

	# PureBit Transformer

	A transformer that operates on raw binary bits instead of tokens.

	## Architecture
	- Vocab size: 2 (just 0 and 1!)
	- d_model: 256
	- Layers: 6
	- Heads: 8
	- Parameters: ~18M

	## Training
	- Trained on raw UTF-8 bytes converted to bits
	- Best loss achieved: 0.6863 (random = 0.693)
	- Training data: ~70MB of text = 560M bits

	## Key Insight
	This explores whether transformers can learn at the bit level. Results show minimal learning beyond random - predicting individual bits is extremely hard without byte-level structure.

	## Usage
	```python
	import torch

	# Load checkpoint
	ckpt = torch.load('purebit_best_70mb.pt')
	print(f"Loss: {ckpt['loss']:.4f}")
	print(f"Bits seen: {ckpt['bits']:,}")

	# Model architecture in model.py
	```

	## Files
	- `purebit_best_70mb.pt` - Best checkpoint (loss 0.6863)
	- `model.py` - Model architecture
	- `train.py` - Training script
	- `infer.py` - Inference script

	## Author
	OpenTransformers - Experimental architecture research