Llama 2 15M — TinyStories

A 15M parameter Llama 2 model pretrained on the TinyStories dataset. Pretrained by Andrej Karpathy (stories15M checkpoint), uploaded here for easy loading and fine-tuning.

Model Details

Parameter	Value
Architecture	Llama 2 (RoPE, RMSNorm, SwiGLU, GQA)
Parameters	15.2M
Vocabulary	32,000 (SentencePiece)
Context Length	256
Embedding Dim	288
Attention Heads	6
KV Heads	6
Transformer Layers	6
Dropout	0.0
Activation	SiLU (SwiGLU)

Architecture: Token embeddings → Dropout → 6x Transformer blocks (pre-norm RMSNorm, RoPE attention, SwiGLU FFN, residual connections) → RMSNorm → Linear output

Training

Metric	Value
Dataset	TinyStories
Iterations	298,000
Batch Size	128 x 4 grad accum = 512 effective
Learning Rate	5e-4
Optimizer	AdamW (betas=0.9/0.95, weight_decay=0.1)
Precision	bfloat16
Warmup	1,000 iterations
Val Loss	1.072
Val Perplexity	2.92

Sample Output

Once upon a time, there was a little boy named Timmy. Timmy loved to play in the sand at the beach. He would build big sandcastles and dig deep holes. One day, Timmy's mom took him to the doctor because he was feeling sick. The doctor said Timmy needed to rest in bed. Timmy's mom noticed that he had a thick book in his hand. She asked him what was inside. Timmy said he didn't know. His mom explained that the book was just a few days old and had gone to a faraway place. She told Timmy that he should take care of himself and rest. Timmy promised to take better care of himself. After a few days, Timmy felt much better. He went back to the beach and played in the sand. He made a big sandcastle and showed it to his mom. She was proud of him for taking care of himself. Timmy was happy that he...

Generated with temperature=0.8, top_k=40

Usage

This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.

Setup

git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync

Generate

import torch
from models.llama2 import Transformer
from sentencepiece import SentencePieceProcessor

# Load model
model = Transformer.from_pretrained("0rn0/llama2-15m-tinystories")
model.eval()

# Load tokenizer
sp = SentencePieceProcessor(model_file="tokenizer.model")

# Generate
prompt = "Once upon a time"
tokens = [sp.bos_id()] + sp.encode(prompt)
idx = torch.tensor([tokens])
output = model.generate(idx, max_new_tokens=200, temperature=0.8, top_k=40)
print(sp.decode(output[0].tolist()))

Limitations

Trained only on TinyStories — generates simple children's stories, not general text
No instruction tuning — does not follow prompts or answer questions
Small model — limited coherence over long sequences
English only