GPT-2 30M — TinyStories

A 30M parameter GPT-2 model trained from scratch on the TinyStoriesV2 (cleaned) dataset. Built as a learning project to understand PyTorch and transformer architectures deeply.

Model Details

Parameter Value
Parameters ~49.5M (incl. embeddings)
Vocabulary 50,257 (GPT-2 tiktoken)
Context Length 512
Embedding Dim 384
Attention Heads 6
Transformer Layers 6
Dropout 0.1
Activation GELU

Architecture: Token + positional embeddings → Dropout → 6x Transformer blocks (pre-norm, residual connections) → LayerNorm → Linear output

Training

Metric Value
Dataset TinyStoriesV2 (cleaned)
Epochs 6
Batch Size 64
Learning Rate 5e-4
Final Train Loss 1.346
Final Val Loss 1.272
Final Perplexity 3.57
Training Time ~50 minutes
Hardware NVIDIA H100 80GB

Loss Curve

Epoch Train Loss Val Loss Perplexity
1 2.140 1.547 4.70
2 1.541 1.406 4.08
3 1.446 1.349 3.85
4 1.399 1.313 3.72
5 1.367 1.288 3.62
6 1.346 1.272 3.57

Usage

This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.

Setup

# Clone the repository with the model code
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync

See the GitHub repository for usage examples and the full API reference.

Limitations

  • Trained only on TinyStories — generates simple children's stories, not general text
  • No instruction tuning — does not follow prompts or answer questions
  • Small model — limited coherence over long sequences
  • English only

Source Code

Full implementation: github.com/aryandeore/monday_morning_moral

Downloads last month
3
Safetensors
Model size
58.9M params
Tensor type
F32
·
Inference Examples
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0rn0/gpt2-30m-tinystories

Finetunes
1 model

Dataset used to train 0rn0/gpt2-30m-tinystories

Collection including 0rn0/gpt2-30m-tinystories