π tiny-stories-3M
A 50M parameter GPT-style Small Language Model built from scratch in PyTorch, trained on the TinyStories dataset.
Model Details
| Parameter | Value |
|---|---|
| Architecture | Decoder-only Transformer (GPT) |
| Parameters | ~50 Million |
| Layers | 6 |
| Attention Heads | 6 |
| Embedding Dim | 384 |
| Context Window | 128 tokens |
| Vocab Size | 50,257 (GPT-2 BPE) |
| Validation Loss | 2.390 |
Dataset
Trained on TinyStories β short children's stories generated by GPT-3.5 and GPT-4 (Microsoft Research).
How to Use
import tiktoken
import torch
enc = tiktoken.get_encoding("gpt2")
prompt = "Once upon a time there was a little dragon"
context = torch.tensor(enc.encode_ordinary(prompt)).unsqueeze(0)
with torch.no_grad():
output = model.generate(
context,
max_new_tokens=200,
temperature=0.8,
top_k=50
)
print(enc.decode(output.squeeze().tolist()))
Training Details
| Setting | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 1e-4 with cosine decay |
| Warmup Steps | 1,000 |
| Batch Size | 32 |
| Gradient Accumulation | 32 steps |
| Precision | Mixed (bfloat16) |
| Gradient Clipping | 0.5 |
Limitations
- Context window limited to 128 tokens
- English only
- Generates simple stories only β not for factual Q&A
- No instruction following
- Downloads last month
- 2
Dataset used to train nisha97/tiny-stories-slm-50M
Evaluation results
- Validation Lossself-reported2.390