# nanoGPT Tutorial A step-by-step implementation of a tiny GPT model from scratch in pure PyTorch. ## What is this? This repository contains a complete, tutorial-style implementation of a small GPT (Generative Pre-trained Transformer) trained on tiny Shakespeare. Every line of code is commented to explain **what** it does and **why**. ## Files | File | Purpose | |------|---------| | `model.py` | The full GPT architecture: CausalSelfAttention, MLP, Block, GPT | | `prepare.py` | Data preparation: character-level tokenization, train/val split | | `train.py` | Training loop with AdamW, cosine LR schedule, and generation | | `input.txt` | The tiny Shakespeare dataset (~1.1M characters, 65 unique chars) | | `data.pt` | Preprocessed tensors (generated by `prepare.py`) | | `best.pt` | Best model checkpoint (generated by `train.py`) | ## Model Architecture ``` GPT( wte (Embedding): vocab_size -> n_embd (token embeddings) wpe (Embedding): block_size -> n_embd (position embeddings) h (6x Block): ln_1 (LayerNorm) attn (CausalSelfAttention: multi-head self-attention with causal mask) ln_2 (LayerNorm) mlp (MLP: expand 4x -> GELU -> project back) ln_f (LayerNorm) lm_head (Linear): n_embd -> vocab_size (next-token prediction) ) ``` **Key design choices:** - **Character-level vocabulary** — no tokenizer library needed - **Pre-LayerNorm** residuals — standard in modern transformers - **Weight tying** — shared weights between input embedding and output projection - **Causal (autoregressive) attention** — can only attend to past tokens ## How to Run ```bash # 1. Prepare data python prepare.py # 2. Train (requires GPU for speed, CPU works too) python train.py # 3. The model will print generated Shakespeare-style text at the end! ``` ## Training Details | Hyperparameter | Value | |---------------|-------| | Layers | 6 | | Heads | 6 | | Embedding dim | 384 | | Context length | 256 | | Batch size | 64 | | Training steps | 5,000 | | Optimizer | AdamW (β₁=0.9, β₂=0.95) | | Learning rate | 1e-3 (cosine decay to 1e-4) | | Warmup | 200 steps | | Gradient clip | 1.0 | ## Acknowledgments Based on Andrej Karpathy's legendary [build-nanogpt](https://github.com/karpathy/build-nanogpt) and [nanoGPT](https://github.com/karpathy/nanoGPT) repositories.