nanoGPT SLM -- 123.8M Parameter Children's Story Generator
A small language model trained entirely from scratch using a custom nanoGPT (GPT-2 small) implementation. Pretrained on the TinyStories dataset to generate short, coherent stories for young children.
What This Model Does
This model generates short children's stories suitable for 3-5 year olds. Give it the beginning of a story and it will continue writing in simple, age-appropriate language:
Input: "Once upon a time there was a little rabbit"
Output: "Once upon a time there was a little rabbit who lived in a big forest.
The rabbit loved to hop and play with his friends. One day, he
found a shiny red ball near the river..."
Capabilities:
- Generates coherent short stories (100-200 words)
- Uses simple vocabulary appropriate for young children
- Follows common story patterns (characters, conflict, resolution)
- Understands basic narrative structure (beginning, middle, end)
- Can continue from any story opening/prompt
Limitations:
- Stories are short (max 256 token context window)
- Limited to simple vocabulary and narrative structures
- No instruction-following ability (see fine-tuned variants below)
- May occasionally generate repetitive or nonsensical text
Training Dataset: TinyStories
| Attribute | Value |
|---|---|
| Dataset | TinyStories (Eldan & Li, 2023) |
| Description | Synthetic short stories generated by GPT-3.5/GPT-4, filtered for quality |
| Target audience | Children aged 3-5 years |
| Vocabulary | Words that a typical 3-4 year old would understand |
| Training stories | ~2,119,719 |
| Validation stories | ~21,990 |
| Total tokens | ~470M |
| Average story length | ~220 tokens |
| Topics | Animals, friendship, family, nature, adventure, sharing, kindness |
The TinyStories dataset was specifically designed to study whether small language models can learn coherent language generation when trained on high-quality, simple text.
Quick Start
Option 1: Run directly (downloads model + generates sample stories - predefined prompts)
# Download `nanogpt_slm_pretrained_inference.py` in working directory
!pip install torch tiktoken huggingface_hub
!python nanogpt_slm_pretrained_inference.py
Option 2: Import and use in your own code to generate your own desired children short stories
# !pip install torch tiktoken huggingface_hub
# Download `nanogpt_slm_pretrained_inference.py` in working directory
# Method 1
from nanogpt_slm_pretrained_inference import tell_story, ask, generate_text
# Generate a children's story
# story = tell_story("Once upon a time there was a little kitten")
story = tell_story(input("Enter a story prompt (e.g., 'Once upon a time there was a little kitten'): ").strip())
print(story)
print("--------------------")
# Method 2
# Simple text completion
# print(ask("The friendly dragon lived in"))
print(ask(input("Enter a prompt for text completion (e.g., 'The friendly dragon lived in'): ").strip()))
print("--------------------")
# Method 3
# Fine-grained control
print(generate_text(
"A girl named Lily went to the park", ## Manually add your desired prompt here
max_tokens=150, # story length (Max=200)
temperature=0.8, # 0.01=predictable, 0.8=balanced, 1.5=creative
top_k=40 # sampling diversity
))
print("--------------------")
Load weights manually and visualize the model architecture
from huggingface_hub import hf_hub_download
import torch
model_path = hf_hub_download(
repo_id="nishantup/nanogpt-slm-tinystories-124m",
filename="nanogpt_slm_tinystories_best.pth"
)
from nanogpt_slm_pretrained_inference import GPT, GPTKV, GPTConfig
config = GPTConfig()
model = GPTKV(config) # KV-cache enabled for fast generation
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
Model Architecture
| Attribute | Value |
|---|---|
| Architecture | nanoGPT (GPT-2 small, 12 layers, 12 heads, 768 dim) |
| Parameters | 123.8M (unique, with weight tying) |
| Context length | 256 tokens |
| Tokenizer | tiktoken GPT-2 BPE (50,257 tokens) |
| Weight tying | Yes (token embeddings = LM head) |
| Attention | Flash Attention when available, causal mask |
| Normalization | Pre-norm (LayerNorm before attention/MLP) |
| Activation | GELU |
| KV Cache | GPTKV variant included for O(1) per-token decode |
Training Details
| Attribute | Value |
|---|---|
| Hardware | Google Colab Pro (NVIDIA A100 40GB) |
| Iterations | 22,900 |
| Effective batch size | 256 sequences (64 x 4 grad accum) |
| Tokens per step | 65,536 (256 x 256) |
| Total tokens seen | |
| Optimizer | AdamW (lr=6e-4, betas=(0.9, 0.95), wd=0.1) |
| LR schedule | Linear warmup (2000 steps) + cosine decay to 6e-5 |
| Precision | bfloat16 (A100) |
| Gradient clipping | max_norm=1.0 |
Files
| File | Description |
|---|---|
nanogpt_slm_tinystories_best.pth |
Pretrained model weights (best validation loss) |
nanogpt_slm_pretrained_inference.py |
Standalone inference script with KV cache |
config.json |
Model configuration and training details |
API Reference
tell_story(beginning, max_tokens=250, temperature=0.8, top_k=40)
Generate a children's story from an opening line. Best for creative story generation.
ask(prompt, max_tokens=200, temperature=0.8, top_k=40)
General text completion. Alias for generate_text().
generate_text(prompt, max_tokens=200, temperature=0.8, top_k=40)
Low-level text generation with full parameter control.
| Parameter | Default | Description |
|---|---|---|
prompt / beginning |
(required) | Text to continue from |
max_tokens |
200 / 250 |
Maximum tokens to generate |
temperature |
0.8 |
0.01 = predictable, 0.8 = balanced, 1.5 = wild |
top_k |
40 |
Top-k filtering (None = no filtering) |
Example Outputs
Prompt: "Once upon a time there was a little bear"
Once upon a time there was a little bear who lived in a big forest. The bear loved to play with his friends. One sunny day, he went for a walk and found a beautiful flower. He picked it up and brought it home to show his mama...
Prompt: "The princess looked out her window and saw"
The princess looked out her window and saw a big rainbow in the sky. She was so happy! She ran outside to get a closer look. A little bird flew down and sat on her hand. "Hello!" said the princess...
Fine-tuned Variants
| Variant | Type | Repo |
|---|---|---|
| This model | Pretrained (TinyStories) | nishantup/nanogpt-slm-tinystories-124m |
| Instruction-tuned (nanoGPT) | SFT | nishantup/nanogpt-slm-instruct |
| Spam classifier (nanoGPT) | Classification | nishantup/nanogpt-slm-classifier |
| Instruction-tuned (Raschka) | SFT | nishantup/gpt2-slm-instruct |
Citation
If you use this model, please cite the TinyStories paper:
Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.
Notes
- Trained completely from scratch (no pretrained initialization)
- Uses KV cache (GPTKV) for O(1) per-token decode during inference
- Weight tying between token embeddings (wte) and LM head (lm_head)
- Architecture follows Karpathy's nanoGPT implementation
- Downloads last month
- -