YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

nanoGPT Tutorial

A step-by-step implementation of a tiny GPT model from scratch in pure PyTorch.

What is this?

This repository contains a complete, tutorial-style implementation of a small GPT (Generative Pre-trained Transformer) trained on tiny Shakespeare. Every line of code is commented to explain what it does and why.

Files

File Purpose
model.py The full GPT architecture: CausalSelfAttention, MLP, Block, GPT
prepare.py Data preparation: character-level tokenization, train/val split
train.py Training loop with AdamW, cosine LR schedule, and generation
input.txt The tiny Shakespeare dataset (~1.1M characters, 65 unique chars)
data.pt Preprocessed tensors (generated by prepare.py)
best.pt Best model checkpoint (generated by train.py)

Model Architecture

GPT(
  wte (Embedding):      vocab_size -> n_embd  (token embeddings)
  wpe (Embedding):      block_size -> n_embd  (position embeddings)
  h   (6x Block):
    ln_1 (LayerNorm)
    attn (CausalSelfAttention: multi-head self-attention with causal mask)
    ln_2 (LayerNorm)
    mlp  (MLP: expand 4x -> GELU -> project back)
  ln_f (LayerNorm)
  lm_head (Linear):     n_embd -> vocab_size  (next-token prediction)
)

Key design choices:

  • Character-level vocabulary โ€” no tokenizer library needed
  • Pre-LayerNorm residuals โ€” standard in modern transformers
  • Weight tying โ€” shared weights between input embedding and output projection
  • Causal (autoregressive) attention โ€” can only attend to past tokens

How to Run

# 1. Prepare data
python prepare.py

# 2. Train (requires GPU for speed, CPU works too)
python train.py

# 3. The model will print generated Shakespeare-style text at the end!

Training Details

Hyperparameter Value
Layers 6
Heads 6
Embedding dim 384
Context length 256
Batch size 64
Training steps 5,000
Optimizer AdamW (ฮฒโ‚=0.9, ฮฒโ‚‚=0.95)
Learning rate 1e-3 (cosine decay to 1e-4)
Warmup 200 steps
Gradient clip 1.0

Acknowledgments

Based on Andrej Karpathy's legendary build-nanogpt and nanoGPT repositories.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support