Llama 2 15M β TinyStories
A 15M parameter Llama 2 model pretrained on the TinyStories dataset. Pretrained by Andrej Karpathy (stories15M checkpoint), uploaded here for easy loading and fine-tuning.
Model Details
| Parameter | Value |
|---|---|
| Architecture | Llama 2 (RoPE, RMSNorm, SwiGLU, GQA) |
| Parameters | 15.2M |
| Vocabulary | 32,000 (SentencePiece) |
| Context Length | 256 |
| Embedding Dim | 288 |
| Attention Heads | 6 |
| KV Heads | 6 |
| Transformer Layers | 6 |
| Dropout | 0.0 |
| Activation | SiLU (SwiGLU) |
Architecture: Token embeddings β Dropout β 6x Transformer blocks (pre-norm RMSNorm, RoPE attention, SwiGLU FFN, residual connections) β RMSNorm β Linear output
Training
| Metric | Value |
|---|---|
| Dataset | TinyStories |
| Iterations | 298,000 |
| Batch Size | 128 x 4 grad accum = 512 effective |
| Learning Rate | 5e-4 |
| Optimizer | AdamW (betas=0.9/0.95, weight_decay=0.1) |
| Precision | bfloat16 |
| Warmup | 1,000 iterations |
| Val Loss | 1.072 |
| Val Perplexity | 2.92 |
Sample Output
Once upon a time, there was a little boy named Timmy. Timmy loved to play in the sand at the beach. He would build big sandcastles and dig deep holes. One day, Timmy's mom took him to the doctor because he was feeling sick. The doctor said Timmy needed to rest in bed. Timmy's mom noticed that he had a thick book in his hand. She asked him what was inside. Timmy said he didn't know. His mom explained that the book was just a few days old and had gone to a faraway place. She told Timmy that he should take care of himself and rest. Timmy promised to take better care of himself. After a few days, Timmy felt much better. He went back to the beach and played in the sand. He made a big sandcastle and showed it to his mom. She was proud of him for taking care of himself. Timmy was happy that he...
Generated with temperature=0.8, top_k=40
Usage
This is a custom PyTorch model (not a transformers-compatible model). You need the source code from the GitHub repository to load it.
Setup
git clone https://github.com/aryandeore/monday_morning_moral.git
cd monday_morning_moral
uv sync
Generate
import torch
from models.llama2 import Transformer
from sentencepiece import SentencePieceProcessor
# Load model
model = Transformer.from_pretrained("0rn0/llama2-15m-tinystories")
model.eval()
# Load tokenizer
sp = SentencePieceProcessor(model_file="tokenizer.model")
# Generate
prompt = "Once upon a time"
tokens = [sp.bos_id()] + sp.encode(prompt)
idx = torch.tensor([tokens])
output = model.generate(idx, max_new_tokens=200, temperature=0.8, top_k=40)
print(sp.decode(output[0].tolist()))
Limitations
- Trained only on TinyStories β generates simple children's stories, not general text
- No instruction tuning β does not follow prompts or answer questions
- Small model β limited coherence over long sequences
- English only
Credits
- Model weights from karpathy/tinyllamas
- Architecture from llama2.c
- Dataset: TinyStories by Eldan & Li
Source Code
Full implementation: github.com/aryandeore/monday_morning_moral
- Downloads last month
- -