TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper • 2305.07759 • Published • 45
An instruction-tuned Small Language Model trained entirely from scratch:
This model follows instructions across diverse tasks: answering questions, summarizing text, creative writing, classification, translation, and more. Give it a task and it responds:
Task: Explain what photosynthesis is in simple terms.
Answer:
Photosynthesis is the process by which plants convert sunlight, water, and
carbon dioxide into glucose and oxygen. It occurs in the chloroplasts of
plant cells and is essential for life on Earth...
Capabilities:
Limitations:
| Stage | Dataset | Size | Details |
|---|---|---|---|
| Pretraining | TinyStories | 2.1M stories | 70K iters, batch 32x512, lr=6e-4, cosine to 1e-5 |
| Instruction SFT | 300K Instructions | 300K examples | 3 epochs, batch 32, lr=1e-4, AdamW |
| Source | Count | Type |
|---|---|---|
| Alpaca | 52K | Stanford instruction-following |
| Dolly | 15K | Databricks human-authored |
| UltraChat | 80K | Multi-turn conversations |
| OpenAssistant | 33K | Human-generated QA |
| FLAN | 120K | Google's diverse NLP tasks |
pip install torch tiktoken huggingface_hub
python nanogpt_slm_tinystories_instruct_inference.py
ask() in your own code
from nanogpt_slm_tinystories_instruct_inference import ask
# Simple question
print(ask("What is the capital of France?"))
# With input context
print(ask(
instruction="Summarize the following text.",
input_text="Machine learning enables systems to learn from data..."
))
# Control generation
print(ask("Write a poem about the ocean.", temperature=1.0, top_k=100))
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/nanogpt-slm-tinystories-instruct",
filename="nanogpt_slm_tinystories_instruct.pth"
)
from nanogpt_slm_tinystories_instruct_inference import GPT, GPTConfig
config = GPTConfig()
model = GPT(config)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
The model uses the unified Task/Question/Answer format:
Task: {instruction}
Question:
{input} <- only if input is non-empty
Answer:
{response}
| Attribute | Value |
|---|---|
| Architecture | nanoGPT (GPT-2 small: 12 layers, 12 heads, 768 dim) |
| Parameters | 124.0M (unique, with weight tying) |
| Context length | 512 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| EOS token | <|endoftext|> (50256) -- clean response stopping |
ask() API Reference
ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)
| Parameter | Default | Description |
|---|---|---|
instruction |
(required) | The task instruction |
input_text |
"" |
Optional additional context |
max_tokens |
256 |
Maximum tokens to generate |
temperature |
0.7 |
0.0 = greedy, 0.7 = balanced, 1.5 = creative |
top_k |
40 |
Top-k filtering (None = no filtering) |
| File | Description |
|---|---|
nanogpt_slm_tinystories_instruct.pth |
Instruction fine-tuned weights |
nanogpt_slm_tinystories_instruct_inference.py |
Standalone inference script |
config.json |
Model + training configuration |
| Variant | Type | Repo |
|---|---|---|
| Pretrained (TinyStories) | Base | nishantup/nanogpt-pretrained-slm-tinystories-124m |
| This model | Instruction SFT | nishantup/nanogpt-slm-tinystories-instruct |
| Spam classifier | Classification | nishantup/nanogpt-slm-tinystories-classifier |
Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.