nanoGPT SLM -- 123.8M Parameter Children's Story Generator

A small language model trained entirely from scratch using a custom nanoGPT (GPT-2 small) implementation. Pretrained on the TinyStories dataset to generate short, coherent stories for young children.

What This Model Does

This model generates short children's stories suitable for 3-5 year olds. Give it the beginning of a story and it will continue writing in simple, age-appropriate language:

Input:  "Once upon a time there was a little rabbit"
Output: "Once upon a time there was a little rabbit who lived in a big forest.
         The rabbit loved to hop and play with his friends. One day, he
         found a shiny red ball near the river..."

Capabilities:

  • Generates coherent short stories (100-200 words)
  • Uses simple vocabulary appropriate for young children
  • Follows common story patterns (characters, conflict, resolution)
  • Understands basic narrative structure (beginning, middle, end)
  • Can continue from any story opening/prompt

Limitations:

  • Stories are short (max 256 token context window)
  • Limited to simple vocabulary and narrative structures
  • No instruction-following ability (see fine-tuned variants below)
  • May occasionally generate repetitive or nonsensical text

Training Dataset: TinyStories

Attribute Value
Dataset TinyStories (Eldan & Li, 2023)
Description Synthetic short stories generated by GPT-3.5/GPT-4, filtered for quality
Target audience Children aged 3-5 years
Vocabulary Words that a typical 3-4 year old would understand
Training stories ~2,119,719
Validation stories ~21,990
Total tokens ~470M
Average story length ~220 tokens
Topics Animals, friendship, family, nature, adventure, sharing, kindness

The TinyStories dataset was specifically designed to study whether small language models can learn coherent language generation when trained on high-quality, simple text.

Quick Start

Option 1: Run directly (downloads model + generates sample stories - predefined prompts)

# Download `nanogpt_slm_pretrained_inference.py` in working directory

!pip install torch tiktoken huggingface_hub
!python nanogpt_slm_pretrained_inference.py  

Option 2: Import and use in your own code to generate your own desired children short stories

# !pip install torch tiktoken huggingface_hub
# Download `nanogpt_slm_pretrained_inference.py` in working directory

# Method 1
from nanogpt_slm_pretrained_inference import tell_story, ask, generate_text

# Generate a children's story
# story = tell_story("Once upon a time there was a little kitten")
story = tell_story(input("Enter a story prompt (e.g., 'Once upon a time there was a little kitten'): ").strip())
print(story)
print("--------------------")

# Method 2
# Simple text completion
# print(ask("The friendly dragon lived in"))
print(ask(input("Enter a prompt for text completion (e.g., 'The friendly dragon lived in'): ").strip()))
print("--------------------")

# Method 3
# Fine-grained control
print(generate_text(
    "A girl named Lily went to the park",  ## Manually add your desired prompt here
    max_tokens=150,     # story length (Max=200)
    temperature=0.8,    # 0.01=predictable, 0.8=balanced, 1.5=creative
    top_k=40            # sampling diversity
))
print("--------------------")

Load weights manually and visualize the model architecture

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/nanogpt-slm-tinystories-124m",
    filename="nanogpt_slm_tinystories_best.pth"
)

from nanogpt_slm_pretrained_inference import GPT, GPTKV, GPTConfig

config = GPTConfig()
model = GPTKV(config)  # KV-cache enabled for fast generation
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Model Architecture

Attribute Value
Architecture nanoGPT (GPT-2 small, 12 layers, 12 heads, 768 dim)
Parameters 123.8M (unique, with weight tying)
Context length 256 tokens
Tokenizer tiktoken GPT-2 BPE (50,257 tokens)
Weight tying Yes (token embeddings = LM head)
Attention Flash Attention when available, causal mask
Normalization Pre-norm (LayerNorm before attention/MLP)
Activation GELU
KV Cache GPTKV variant included for O(1) per-token decode

Training Details

Attribute Value
Hardware Google Colab Pro (NVIDIA A100 40GB)
Iterations 22,900
Effective batch size 256 sequences (64 x 4 grad accum)
Tokens per step 65,536 (256 x 256)
Total tokens seen 375M (0.8 epochs)
Optimizer AdamW (lr=6e-4, betas=(0.9, 0.95), wd=0.1)
LR schedule Linear warmup (2000 steps) + cosine decay to 6e-5
Precision bfloat16 (A100)
Gradient clipping max_norm=1.0

Files

File Description
nanogpt_slm_tinystories_best.pth Pretrained model weights (best validation loss)
nanogpt_slm_pretrained_inference.py Standalone inference script with KV cache
config.json Model configuration and training details

API Reference

tell_story(beginning, max_tokens=250, temperature=0.8, top_k=40)

Generate a children's story from an opening line. Best for creative story generation.

ask(prompt, max_tokens=200, temperature=0.8, top_k=40)

General text completion. Alias for generate_text().

generate_text(prompt, max_tokens=200, temperature=0.8, top_k=40)

Low-level text generation with full parameter control.

Parameter Default Description
prompt / beginning (required) Text to continue from
max_tokens 200 / 250 Maximum tokens to generate
temperature 0.8 0.01 = predictable, 0.8 = balanced, 1.5 = wild
top_k 40 Top-k filtering (None = no filtering)

Example Outputs

Prompt: "Once upon a time there was a little bear"

Once upon a time there was a little bear who lived in a big forest. The bear loved to play with his friends. One sunny day, he went for a walk and found a beautiful flower. He picked it up and brought it home to show his mama...

Prompt: "The princess looked out her window and saw"

The princess looked out her window and saw a big rainbow in the sky. She was so happy! She ran outside to get a closer look. A little bird flew down and sat on her hand. "Hello!" said the princess...

Fine-tuned Variants

Variant Type Repo
This model Pretrained (TinyStories) nishantup/nanogpt-slm-tinystories-124m
Instruction-tuned (nanoGPT) SFT nishantup/nanogpt-slm-instruct
Spam classifier (nanoGPT) Classification nishantup/nanogpt-slm-classifier
Instruction-tuned (Raschka) SFT nishantup/gpt2-slm-instruct

Citation

If you use this model, please cite the TinyStories paper:

Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.

Notes

  • Trained completely from scratch (no pretrained initialization)
  • Uses KV cache (GPTKV) for O(1) per-token decode during inference
  • Weight tying between token embeddings (wte) and LM head (lm_head)
  • Architecture follows Karpathy's nanoGPT implementation
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nishantup/nanogpt-slm-tinystories-124m

Paper for nishantup/nanogpt-slm-tinystories-124m