Tiny Stories
Collection
30M and 125M GPT-2 models pre-trained and instruction fine tuned on TinyStories dataset.
β’
7 items
β’
Updated
β’
1
A 125M parameter GPT-2 model trained from scratch on the TinyStoriesV2 (cleaned) dataset. Built as a learning project to understand PyTorch and transformer architectures deeply.
| Parameter | Value |
|---|---|
| Parameters | ~125M |
| Vocabulary | 50,257 (GPT-2 tiktoken) |
| Context Length | 512 |
| Embedding Dim | 768 |
| Attention Heads | 12 |
| Transformer Layers | 12 |
| Dropout | 0.1 |
| Activation | GELU |
Architecture: Token + positional embeddings β Dropout β 12x Transformer blocks (pre-norm, residual connections) β LayerNorm β Linear output
| Metric | Value |
|---|---|
| Dataset | TinyStoriesV2 (cleaned) |
| Epochs | 2 |
| Batch Size | 32 |
| Learning Rate | 3e-4 |
| Final Train Loss | 1.103 |
| Final Val Loss | 1.06 |
| Hardware | NVIDIA H100 80GB |
This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.
# Clone the repository with the model code
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync
See the GitHub repository for usage examples and the full API reference.
Full implementation: github.com/aryandeore/monday_morning_moral