GPT-2 125M — TinyStories

A 125M parameter GPT-2 model trained from scratch on the TinyStoriesV2 (cleaned) dataset. Built as a learning project to understand PyTorch and transformer architectures deeply.

Model Details

Parameter	Value
Parameters	~125M
Vocabulary	50,257 (GPT-2 tiktoken)
Context Length	512
Embedding Dim	768
Attention Heads	12
Transformer Layers	12
Dropout	0.1
Activation	GELU

Architecture: Token + positional embeddings → Dropout → 12x Transformer blocks (pre-norm, residual connections) → LayerNorm → Linear output

Training

Metric	Value
Dataset	TinyStoriesV2 (cleaned)
Epochs	2
Batch Size	32
Learning Rate	3e-4
Final Train Loss	1.103
Final Val Loss	1.06
Hardware	NVIDIA H100 80GB

Usage

This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.

Setup

# Clone the repository with the model code
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync

See the GitHub repository for usage examples and the full API reference.

Limitations

Trained only on TinyStories — generates simple children's stories, not general text
No instruction tuning — does not follow prompts or answer questions
Only trained for 2 epochs — could benefit from more training
English only

Source Code

Full implementation: github.com/aryandeore/monday_morning_moral

Downloads last month: 3

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for 0rn0/gpt2-125m-tinystories

Finetunes

1 model

Dataset used to train 0rn0/gpt2-125m-tinystories

Collection including 0rn0/gpt2-125m-tinystories

Tiny Stories

Collection

30M and 125M GPT-2 models pre-trained and instruction fine tuned on TinyStories dataset. • 7 items • Updated Feb 18 • 1