Tiny Stories
Collection
30M and 125M GPT-2 models pre-trained and instruction fine tuned on TinyStories dataset.
β’
6 items
β’
Updated
β’
1
A 30M parameter GPT-2 model trained from scratch on the TinyStoriesV2 (cleaned) dataset. Built as a learning project to understand PyTorch and transformer architectures deeply.
| Parameter | Value |
|---|---|
| Parameters | ~49.5M (incl. embeddings) |
| Vocabulary | 50,257 (GPT-2 tiktoken) |
| Context Length | 512 |
| Embedding Dim | 384 |
| Attention Heads | 6 |
| Transformer Layers | 6 |
| Dropout | 0.1 |
| Activation | GELU |
Architecture: Token + positional embeddings β Dropout β 6x Transformer blocks (pre-norm, residual connections) β LayerNorm β Linear output
| Metric | Value |
|---|---|
| Dataset | TinyStoriesV2 (cleaned) |
| Epochs | 6 |
| Batch Size | 64 |
| Learning Rate | 5e-4 |
| Final Train Loss | 1.346 |
| Final Val Loss | 1.272 |
| Final Perplexity | 3.57 |
| Training Time | ~50 minutes |
| Hardware | NVIDIA H100 80GB |
| Epoch | Train Loss | Val Loss | Perplexity |
|---|---|---|---|
| 1 | 2.140 | 1.547 | 4.70 |
| 2 | 1.541 | 1.406 | 4.08 |
| 3 | 1.446 | 1.349 | 3.85 |
| 4 | 1.399 | 1.313 | 3.72 |
| 5 | 1.367 | 1.288 | 3.62 |
| 6 | 1.346 | 1.272 | 3.57 |
This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.
# Clone the repository with the model code
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync
See the GitHub repository for usage examples and the full API reference.
Full implementation: github.com/aryandeore/monday_morning_moral