saiteja718
/

gpt-2

Text Generation

Model card Files Files and versions

gpt-2 / README.md

saiteja718's picture

Upload GPT-2 from-scratch checkpoint

ffa51e6 verified 26 days ago

|

history blame contribute delete

2.55 kB

	---
	license: mit
	language:
	- en
	tags:
	- gpt2
	- causal-lm
	- pytorch
	- text-generation
	- from-scratch
	base_model: []
	pipeline_tag: text-generation
	---

	# GPT-2 (Trained from Scratch)

	A GPT-2–style causal language model built and trained entirely from scratch in PyTorch — no pre-trained weights, no HuggingFace Trainer. Every component (multi-head attention with KV-cache, transformer blocks, weight-tying) was implemented by hand.

	---

	## Model Details

	\| Hyperparameter \| Value \|
	\|-----------------\|-------------\|
	\| Architecture \| GPT-2 (decoder-only transformer) \|
	\| Layers \| 12 \|
	\| Attention heads \| 12 \|
	\| d\_model \| 768 \|
	\| FFN hidden dim \| 3 072 \|
	\| Context length \| 1 024 tokens \|
	\| Vocab size \| 50 257 \|
	\| Training steps \| 150 000 \|
	\| Tokens seen \| ~9.8 B \|
	\| Tokenizer \| GPT-2 BPE (tiktoken) \|

	---

	## Usage

	### With 🤗 Transformers

	```python
	from transformers import AutoTokenizer
	from model.hf_wrapper import GPT2ForCausalLM

	model = GPT2ForCausalLM.from_pretrained("saiteja718/gpt2")
	tokenizer = AutoTokenizer.from_pretrained("saiteja718/gpt2")

	inputs = tokenizer("The capital of France is", return_tensors="pt")
	logits = model(**inputs).logits
	```

	### With the interactive inference script

	Clone the repo and run:

	```bash
	git clone https://huggingface.co/saiteja718/gpt2
	cd gpt2
	pip install torch transformers tiktoken
	python3 gpt2_infer.py --interactive
	```

	---

	## Implementation Highlights

	- Multi-head attention with a split KV-cache for efficient autoregressive decoding (prefill + decode loop)
	- Weight tying between the token embedding and the LM head
	- Top-k sampling with temperature for controllable text generation
	- Custom training loop with gradient clipping and cosine LR schedule

	---

	## Example Output

	```
	Prompt: The capital of germany is
	Output: The capital of germany is the country he first settled in, and soon the settlement
	of the British colonies as a result of his military service...
	```

	---

	## Limitations

	- Trained as a research/learning exercise — not fine-tuned on any instruction dataset
	- May produce factually incorrect or incoherent text
	- Context window limited to 1 024 tokens

	---

	## Citation

	If you use this model in your work, a shoutout is appreciated:

	```bibtex
	@misc{saiteja718-gpt2-scratch,
	author = {saiteja718},
	title = {GPT-2 Trained from Scratch},
	year = {2025},
	url = {https://huggingface.co/saiteja718/gpt2}
	}
	```