| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - gpt2 |
| - causal-lm |
| - pytorch |
| - text-generation |
| - from-scratch |
| base_model: [] |
| pipeline_tag: text-generation |
| --- |
| |
| # GPT-2 (Trained from Scratch) |
|
|
| A GPT-2–style causal language model built and trained **entirely from scratch** in PyTorch — no pre-trained weights, no HuggingFace Trainer. Every component (multi-head attention with KV-cache, transformer blocks, weight-tying) was implemented by hand. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Hyperparameter | Value | |
| |-----------------|-------------| |
| | Architecture | GPT-2 (decoder-only transformer) | |
| | Layers | 12 | |
| | Attention heads | 12 | |
| | d\_model | 768 | |
| | FFN hidden dim | 3 072 | |
| | Context length | 1 024 tokens | |
| | Vocab size | 50 257 | |
| | Training steps | 150 000 | |
| | Tokens seen | ~9.8 B | |
| | Tokenizer | GPT-2 BPE (tiktoken) | |
| |
| --- |
| |
| ## Usage |
| |
| ### With 🤗 Transformers |
| |
| ```python |
| from transformers import AutoTokenizer |
| from model.hf_wrapper import GPT2ForCausalLM |
|
|
| model = GPT2ForCausalLM.from_pretrained("saiteja718/gpt2") |
| tokenizer = AutoTokenizer.from_pretrained("saiteja718/gpt2") |
|
|
| inputs = tokenizer("The capital of France is", return_tensors="pt") |
| logits = model(**inputs).logits |
| ``` |
| |
| ### With the interactive inference script |
| |
| Clone the repo and run: |
| |
| ```bash |
| git clone https://huggingface.co/saiteja718/gpt2 |
| cd gpt2 |
| pip install torch transformers tiktoken |
| python3 gpt2_infer.py --interactive |
| ``` |
| |
| --- |
| |
| ## Implementation Highlights |
| |
| - **Multi-head attention** with a split KV-cache for efficient autoregressive decoding (prefill + decode loop) |
| - **Weight tying** between the token embedding and the LM head |
| - **Top-k sampling** with temperature for controllable text generation |
| - Custom training loop with gradient clipping and cosine LR schedule |
| |
| --- |
| |
| ## Example Output |
| |
| ``` |
| Prompt: The capital of germany is |
| Output: The capital of germany is the country he first settled in, and soon the settlement |
| of the British colonies as a result of his military service... |
| ``` |
| |
| --- |
| |
| ## Limitations |
| |
| - Trained as a research/learning exercise — not fine-tuned on any instruction dataset |
| - May produce factually incorrect or incoherent text |
| - Context window limited to 1 024 tokens |
| |
| --- |
| |
| ## Citation |
| |
| If you use this model in your work, a shoutout is appreciated: |
| |
| ```bibtex |
| @misc{saiteja718-gpt2-scratch, |
| author = {saiteja718}, |
| title = {GPT-2 Trained from Scratch}, |
| year = {2025}, |
| url = {https://huggingface.co/saiteja718/gpt2} |
| } |
| ``` |
| |