nanoGPT SLM TinyStories Instruct -- 124.0M Parameters

An instruction-tuned Small Language Model trained entirely from scratch:

  1. Pretrained on TinyStories (2.1M children's stories, 70K iterations)
  2. Instruction fine-tuned (SFT) on a 300K multi-source instruction dataset

What This Model Does

This model follows instructions across diverse tasks: answering questions, summarizing text, creative writing, classification, translation, and more. Give it a task and it responds:

Task: Explain what photosynthesis is in simple terms.

Answer:
Photosynthesis is the process by which plants convert sunlight, water, and
carbon dioxide into glucose and oxygen. It occurs in the chloroplasts of
plant cells and is essential for life on Earth...

Capabilities:

  • Answers factual questions
  • Summarizes text
  • Writes creative content (poems, stories, descriptions)
  • Classifies text (sentiment, category)
  • Generates lists and structured content
  • Explains concepts in simple terms

Limitations:

  • 512-token context window limits response length
  • Trained on children's stories base -- may default to simple language
  • Not as capable as larger instruction-tuned models
  • English only

Training Pipeline

Stage Dataset Size Details
Pretraining TinyStories 2.1M stories 70K iters, batch 32x512, lr=6e-4, cosine to 1e-5
Instruction SFT 300K Instructions 300K examples 3 epochs, batch 32, lr=1e-4, AdamW

Instruction Dataset Sources

Source Count Type
Alpaca 52K Stanford instruction-following
Dolly 15K Databricks human-authored
UltraChat 80K Multi-turn conversations
OpenAssistant 33K Human-generated QA
FLAN 120K Google's diverse NLP tasks

Quick Start

Option 1: Run directly

pip install torch tiktoken huggingface_hub
python nanogpt_slm_tinystories_instruct_inference.py

Option 2: Import and use ask() in your own code

from nanogpt_slm_tinystories_instruct_inference import ask

# Simple question
print(ask("What is the capital of France?"))

# With input context
print(ask(
    instruction="Summarize the following text.",
    input_text="Machine learning enables systems to learn from data..."
))

# Control generation
print(ask("Write a poem about the ocean.", temperature=1.0, top_k=100))

Option 3: Load weights manually

from huggingface_hub import hf_hub_download
import torch

model_path = hf_hub_download(
    repo_id="nishantup/nanogpt-slm-tinystories-instruct",
    filename="nanogpt_slm_tinystories_instruct.pth"
)

from nanogpt_slm_tinystories_instruct_inference import GPT, GPTConfig
config = GPTConfig()
model = GPT(config)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()

Prompt Format

The model uses the unified Task/Question/Answer format:

Task: {instruction}

Question:
{input}          <- only if input is non-empty

Answer:
{response}

Model Architecture

Attribute Value
Architecture nanoGPT (GPT-2 small: 12 layers, 12 heads, 768 dim)
Parameters 124.0M (unique, with weight tying)
Context length 512 tokens
Tokenizer tiktoken GPT-2 BPE (50,257 tokens)
EOS token <|endoftext|> (50256) -- clean response stopping

ask() API Reference

ask(instruction, input_text="", max_tokens=256, temperature=0.7, top_k=40)
Parameter Default Description
instruction (required) The task instruction
input_text "" Optional additional context
max_tokens 256 Maximum tokens to generate
temperature 0.7 0.0 = greedy, 0.7 = balanced, 1.5 = creative
top_k 40 Top-k filtering (None = no filtering)

Files

File Description
nanogpt_slm_tinystories_instruct.pth Instruction fine-tuned weights
nanogpt_slm_tinystories_instruct_inference.py Standalone inference script
config.json Model + training configuration

Related Models (Vizuara SLM Family)

Variant Type Repo
Pretrained (TinyStories) Base nishantup/nanogpt-pretrained-slm-tinystories-124m
This model Instruction SFT nishantup/nanogpt-slm-tinystories-instruct
Spam classifier Classification nishantup/nanogpt-slm-tinystories-classifier

Citation

Eldan, R., & Li, Y. (2023). TinyStories: How Small Can Language Models Be
and Still Speak Coherent English? arXiv preprint arXiv:2305.07759.

Author

Dr. Nishant Upadhyay

Downloads last month
1,578
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train nishantup/nanogpt-slm-tinystories-instruct

Paper for nishantup/nanogpt-slm-tinystories-instruct