GPT-2 125M β€” TinyStories

A 125M parameter GPT-2 model trained from scratch on the TinyStoriesV2 (cleaned) dataset. Built as a learning project to understand PyTorch and transformer architectures deeply.

Model Details

Parameter Value
Parameters ~125M
Vocabulary 50,257 (GPT-2 tiktoken)
Context Length 512
Embedding Dim 768
Attention Heads 12
Transformer Layers 12
Dropout 0.1
Activation GELU

Architecture: Token + positional embeddings β†’ Dropout β†’ 12x Transformer blocks (pre-norm, residual connections) β†’ LayerNorm β†’ Linear output

Training

Metric Value
Dataset TinyStoriesV2 (cleaned)
Epochs 2
Batch Size 32
Learning Rate 3e-4
Final Train Loss 1.103
Final Val Loss 1.06
Hardware NVIDIA H100 80GB

Usage

This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.

Setup

# Clone the repository with the model code
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync

See the GitHub repository for usage examples and the full API reference.

Limitations

  • Trained only on TinyStories β€” generates simple children's stories, not general text
  • No instruction tuning β€” does not follow prompts or answer questions
  • Only trained for 2 epochs β€” could benefit from more training
  • English only

Source Code

Full implementation: github.com/aryandeore/monday_morning_moral

Downloads last month
32
Inference Examples
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for 0rn0/gpt2-125m-tinystories

Finetunes
1 model

Dataset used to train 0rn0/gpt2-125m-tinystories

Collection including 0rn0/gpt2-125m-tinystories