Llama 2 15M β€” TinyStories

A 15M parameter Llama 2 model pretrained on the TinyStories dataset. Pretrained by Andrej Karpathy (stories15M checkpoint), uploaded here for easy loading and fine-tuning.

Model Details

Parameter Value
Architecture Llama 2 (RoPE, RMSNorm, SwiGLU, GQA)
Parameters 15.2M
Vocabulary 32,000 (SentencePiece)
Context Length 256
Embedding Dim 288
Attention Heads 6
KV Heads 6
Transformer Layers 6
Dropout 0.0
Activation SiLU (SwiGLU)

Architecture: Token embeddings β†’ Dropout β†’ 6x Transformer blocks (pre-norm RMSNorm, RoPE attention, SwiGLU FFN, residual connections) β†’ RMSNorm β†’ Linear output

Training

Metric Value
Dataset TinyStories
Iterations 298,000
Batch Size 128 x 4 grad accum = 512 effective
Learning Rate 5e-4
Optimizer AdamW (betas=0.9/0.95, weight_decay=0.1)
Precision bfloat16
Warmup 1,000 iterations
Val Loss 1.072
Val Perplexity 2.92

Sample Output

Once upon a time, there was a little boy named Timmy. Timmy loved to play in the sand at the beach. He would build big sandcastles and dig deep holes. One day, Timmy's mom took him to the doctor because he was feeling sick. The doctor said Timmy needed to rest in bed. Timmy's mom noticed that he had a thick book in his hand. She asked him what was inside. Timmy said he didn't know. His mom explained that the book was just a few days old and had gone to a faraway place. She told Timmy that he should take care of himself and rest. Timmy promised to take better care of himself. After a few days, Timmy felt much better. He went back to the beach and played in the sand. He made a big sandcastle and showed it to his mom. She was proud of him for taking care of himself. Timmy was happy that he...

Generated with temperature=0.8, top_k=40

Usage

This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.

Setup

git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync

Generate

import torch
from models.llama2 import Transformer
from sentencepiece import SentencePieceProcessor

# Load model
model = Transformer.from_pretrained("0rn0/llama2-15m-tinystories")
model.eval()

# Load tokenizer
sp = SentencePieceProcessor(model_file="tokenizer.model")

# Generate
prompt = "Once upon a time"
tokens = [sp.bos_id()] + sp.encode(prompt)
idx = torch.tensor([tokens])
output = model.generate(idx, max_new_tokens=200, temperature=0.8, top_k=40)
print(sp.decode(output[0].tolist()))

Limitations

  • Trained only on TinyStories β€” generates simple children's stories, not general text
  • No instruction tuning β€” does not follow prompts or answer questions
  • Small model β€” limited coherence over long sequences
  • English only

Credits

Source Code

Full implementation: github.com/aryandeore/monday_morning_moral

Downloads last month
-
Safetensors
Model size
15.2M params
Tensor type
F32
Β·
Inference Examples
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train 0rn0/llama2-15m-tinystories

Collection including 0rn0/llama2-15m-tinystories

Paper for 0rn0/llama2-15m-tinystories