tinystories-transformer-80M

A transformer language model trained on the TinyStories dataset.

Model Details

  • Architecture: Transformer decoder (GPT-style)
  • Parameters: ~80M
  • Training Tokens: ~745 tokens
  • Dataset: TinyStories
  • Layers: 4
  • Attention Heads: 16
  • Hidden Size: 512
  • Feed-forward Size: 1344
  • Vocabulary Size: 10,000 (BPE)
  • Context Length: 256 tokens

Training Details

  • Batch Size: 320
  • Learning Rate: 0.001
  • Optimizer: AdamW
  • Training Steps: 745
  • Final Loss: 1.45

Usage

import torch
from cs336_basics.transformer import Transformer

# Load model
model = Transformer(...)  # Configure with parameters above
checkpoint = torch.load("pytorch_model.bin")
model.load_state_dict(checkpoint)

# Generate text
# (Add your inference code here)

Citation

@misc{tinystories_transformer_80M},
  author = {ashishshroti14},
  title = {tinystories-transformer-80M},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ashishshroti14/tinystories-transformer-80M}
}

Dataset Citation

@article{eldan2023tinystories,
  title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
  author={Eldan, Ronen and Li, Yuanzhi},
  journal={arXiv preprint arXiv:2305.07759},
  year={2023}
}
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ashishshroti14/tinystories-transformer-80M

Paper for ashishshroti14/tinystories-transformer-80M