Pure Python GPT Model
A minimal GPT language model trained from scratch using only Python's standard library (no PyTorch/TensorFlow).
Model Description
This model implements the core GPT architecture:
- Decoder-only Transformer with causal (autoregressive) attention
- Multi-Head Self-Attention for capturing dependencies
- Feed-Forward MLP with ReLU activation
- RMSNorm for layer normalization
- Residual connections for stable training
Architecture
| Parameter | Value |
|---|---|
| Layers | 6 |
| Embedding Dimension | 192 |
| Attention Heads | 6 |
| Context Length | 256 |
| Vocabulary Size | 77 |
| Total Parameters | 112,256 |
Training
- Dataset: OpenWebText-10k subset
- Optimizer: Adam (β1=0.9, β2=0.999)
- Learning Rate: Cosine decay with warmup
- Training Steps: 2000
Usage
This model uses a custom pure-Python implementation. See the repository for the complete code.
# Load and generate
from model import gpt, generate
text = generate(prompt="Once upon a time", temperature=0.7)
print(text)
Limitations
Small model size (educational purposes)
Character-level tokenization (not BPE)
Limited training data and compute
Pure Python = slow inference
License
Apache 2.0
Acknowledgments
Based on Andrej Karpathy's educational implementations (micrograd, makemore, nanoGPT).
- Downloads last month
- 11