gpt-2 / README.md
saiteja718's picture
Upload GPT-2 from-scratch checkpoint
ffa51e6 verified
metadata
license: mit
language:
  - en
tags:
  - gpt2
  - causal-lm
  - pytorch
  - text-generation
  - from-scratch
base_model: []
pipeline_tag: text-generation

GPT-2 (Trained from Scratch)

A GPT-2–style causal language model built and trained entirely from scratch in PyTorch — no pre-trained weights, no HuggingFace Trainer. Every component (multi-head attention with KV-cache, transformer blocks, weight-tying) was implemented by hand.


Model Details

Hyperparameter Value
Architecture GPT-2 (decoder-only transformer)
Layers 12
Attention heads 12
d_model 768
FFN hidden dim 3 072
Context length 1 024 tokens
Vocab size 50 257
Training steps 150 000
Tokens seen ~9.8 B
Tokenizer GPT-2 BPE (tiktoken)

Usage

With 🤗 Transformers

from transformers import AutoTokenizer
from model.hf_wrapper import GPT2ForCausalLM

model = GPT2ForCausalLM.from_pretrained("saiteja718/gpt2")
tokenizer = AutoTokenizer.from_pretrained("saiteja718/gpt2")

inputs = tokenizer("The capital of France is", return_tensors="pt")
logits = model(**inputs).logits

With the interactive inference script

Clone the repo and run:

git clone https://huggingface.co/saiteja718/gpt2
cd gpt2
pip install torch transformers tiktoken
python3 gpt2_infer.py --interactive

Implementation Highlights

  • Multi-head attention with a split KV-cache for efficient autoregressive decoding (prefill + decode loop)
  • Weight tying between the token embedding and the LM head
  • Top-k sampling with temperature for controllable text generation
  • Custom training loop with gradient clipping and cosine LR schedule

Example Output

Prompt: The capital of germany is
Output: The capital of germany is the country he first settled in, and soon the settlement
        of the British colonies as a result of his military service...

Limitations

  • Trained as a research/learning exercise — not fine-tuned on any instruction dataset
  • May produce factually incorrect or incoherent text
  • Context window limited to 1 024 tokens

Citation

If you use this model in your work, a shoutout is appreciated:

@misc{saiteja718-gpt2-scratch,
  author  = {saiteja718},
  title   = {GPT-2 Trained from Scratch},
  year    = {2025},
  url     = {https://huggingface.co/saiteja718/gpt2}
}